KBS: Fast Forward

Code generation on cruise control.

On today's episode: a lot of code. The previous work prepared the project codebase to guide agents in generating good quality code, which is what we did.

Update, Resolve, Archive Commands

After the Kb_service module was broken apart, the coding agent easily generated full verticals for update, resolve, and archive commands. The process I'm following:

Related commit: 0eb5584 — feat: add update, resolve, and archive commands

Design Document

In anticipation of more complicated functionality, I wanted a step before the implementation plan, not focused so closely on implementation, and instead centering different approaches. Thus prompts/activities/design-document.md. For several of the next features I make use of the design document, but I am not happy with the results and plan to figure out a different process.

The issue with the design document part of the process is that it focuses the agent on producing a single punchline. Despite the instructions to consider alternative solutions and discuss design decisions, the agent doesn't research or explore nearly enough and just drills down on producing the overall document. I think for the future I will break up the process into distilling requirements and detailing the background/status quo, then a next step of listing research paths and open questions, then a next step of producing documents for all of those, and then a final step of synthesizing all of this into a document.

Related commit: 15f876f — feat: add design-document activity prompt

Relations, Deferred Tests, Parsing Refactor

Relations between notes and todos are an even more complicated vertical, requiring domain models, repository logic, a service, and a command. The coding agent handled it with ease. We also worked to implement the integration tests that were imagined during the add subcommand but deferred, and then refactored the codebase to separate out the domain model parsing logic shared by a number of the services.

Additionally I made a documentation upkeep maintenance prompt to be run periodically to sweep the docs for broken references, missing concepts, etc.

Related commits:

SQLite ↔ JSONL Synchronization

Probably the most complex change we've tackled so far, a full vertical down to a new serialization format, an intermediate snapshot representation of the database, code to rebuild the database, and two subcommands (flush and rebuild). The design document was handy here, helping to remove some of the complexity I had assumed needed to be built, but also not fully satisfactory, as discussed above, and I had to push on doing more research and investigation.

Related commit: 81498af — feat: add SQLite-JSONL two-way synchronization

Unused Symbol Checking

Something I've observed a few times is that coding agents consider code functions in the context of the agent's current semantic purpose. If the purpose of a change is to move logic around, the agent will delete the "old" function and make the "new" one. However, the agent is not always "thinking" this way, sometimes due to the structure of the conversation, it considers the change to simply be an addition, adding the "new" function and leaving the "old" around. To deal with this, and start building out some more automated guidance for the agents to follow, we built an unused symbol checker, a dead-code analyzer.

We explored a couple other options and ended up settling on a Python script to start up the OCaml language server, query for symbols, query for references to those symbols, and shut the server back down. With this in hand we deleted ~5 functions that agents had forgotten.

Related commit: f3cfc04 — feat: add unused-export checker and remove dead code

Fixes and Testing

Something I've noticed with the product requirements document and having the agent drive what functionality to implement, is that it's keying off the already existing structure in the document — the numbered use cases. It is not doing a good job of seeing what exists in the system and adjusting the chosen functionality to cover the gaps. For example, when we implemented the show command, relations had not been implemented yet, so they were skipped. When relations were picked up to be implemented, the agent ignored show entirely, requiring this later fix to display relations when show-ing. Similarly, the --json flag is mentioned a couple times in the document, but I guess due to its horizontal nature, it never got picked up, resulting in the below commit adding it to all commands. I've attempted to address this specific feature gap with guidance, we'll see if it's effective.

The TypeId bug is another interesting agent blindspot. Because of the random nature of the TypeId values, both the unit tests and integration tests failed to identify the fact that the suffixes of all the TypeIds were the same. They have a prefix derived from the current time, so they never collided during testing, and the assertions all needed to be generic enough to pass. I think some guidance work is needed here in the future. This bug was found because I started bootstrapping bs development with itself 🎉, working the --json feature as notes and todos. The resulting .kbases.jsonl showed weirdly similar ids.

I wanted some more integration tests focused on the whole lifecycle, to help cover cross-command defects. We implemented them, as well as an adjustment to the architecture guidance.

Related commits:

Minor New Features

Last of this long coding session, I added some coding agent oriented functionality, and worked an ergonomics issue I watched the coding agent make. During bs init, we made the application install a short helper in the project's AGENTS.md; this will probably be tweaked over time, I'm not sure how well the limited example performs. And the second feature was to make bs close an alias for bs resolve after I watched the coding agent "guess" the close command, fail, look up the --help, then succeed with resolve.

Related commits:

Collected Future Work

Aside from these ideas for future work, I think we're nearing the end of this experiment. All of the basic functionality is implemented and the program is usable by coding agents. I will have to pause at some point and do an in-depth, by-hand code review to be able to come to any conclusions about automatic software development while maintaining quality.