Re: 90 Percent
Armin Ronacher has written a new post on coding agents: 90 percent. It largely speaks to experiences I've had at work.
There are no weird files that shouldn’t belong there, no duplicate implementations, and no emojis all over the place. The comments still follow the style I want and, crucially, often aren’t there. I pay close attention to the fundamentals of system architecture, code layout, and database interaction. I’m incredibly opinionated. As a result, there are certain things I don’t let the AI do. I know it won’t reach the point where I could sign off on a commit. That’s why it’s not 100%.
Same. I like separations between API / business logic / DB objects, and the agents largely will not implement this on their own. I have preferences around id generation, unit test structuring, input validation, and the list goes on. I have to feed examples of all of these to the agent or adjust them manually, so for me it's not clear that 100% will be reached any time soon.
I began in the traditional way: system design, schema, architecture. At this state I don’t let the AI write, but I loop it in AI as a kind of rubber duck. The back-and-forth helps me see mistakes, even if I don’t need or trust the answers.
Any large enough change to be done with a coding agent basically requires an implementation spec, an alignment document, to make re-prompting easy. I wrote the last one with the help of an agent, as well as had different agents review it for completeness and correctness.
For AI-generated or AI-supported code, I now end up with a stack that looks something like something I often wanted, but was too hard to do by hand:
- Raw SQL: ...
- OpenAPI first: ...
I haven't come around to raw SQL yet, we use JPA inside of Kotlin, and its query to entity mappings have been decent enough. OpenAPI though, I find it very easy and convenient to drive the service REST interface from an OpenAPI spec using the agent, and receive both client and server implementations via the code generator.
Today I use Claude Code and Codex. Each has strengths, but the constant is Codex for code review after PRs. It’s very good at that. Claude is indispensable still when debugging and needing a lot of tool access (eg: why do I have a deadlock, why is there corrupted data in the database etc.). The working together of the two is where it’s most magical. Claude might find the data, Codex might understand it better.
This is my experience as well. Claude Sonnet is the workhorse that will churn through tasks (mostly) as described, and run tools to check its work or debug. OpenAI O3 or Gpt-5 are usually more clever, able to fix more complex issues, as well as better at reviewing other work, but slower, and seemingly stubborn in just choosing not to use tools to check their (usually correct) work.
It is easy to create systems that appear to behave correctly but have unclear runtime behavior when relying on agents. For instance, the AI doesn’t fully comprehend threading or goroutines. If you don’t keep the bad decisions at bay early it, you won’t be able to operate it in a stable manner later.
Agents also operate on conventional wisdom from the internet and in tern do things I would never do myself. It loves to use dependencies (particularly outdated ones). It loves to swallow errors and take away all tracebacks. I’d rather uphold strong invariants and let code crash loudly when they fail, than hide problems. If you don’t fight this, you end up with opaque, unobservable systems.
Agreed. Unfortunately for the folks who dream of swarms of agents building out a full feature set in parallel, at current performance levels, agent work often needs to be linear. They're good at following pre-existing examples, which means anything you have to manually add into the codebase becomes a dependency for future work. The agents are simply not good enough at making architectural decisions, or library dependency decisions, or not suppress exceptions, and love to just follow conventional wisdom (the average) from the internet. All of which requires some pre-work in the form of ensuring patterns are set up properly before expecting the agent to deliver similar code for different features.
At the same time, for me, AI doesn’t own the code. I still review every line, shape the architecture, and carry the responsibility for how it runs in production. But the sheer volume of what I now let an agent generate would have been unthinkable even six months ago.
We emphasize this to junior engineers at work — you still carry the responsibility for the code you produce and how it runs in production.
That said, none of this removes the need to actually be a good engineer. If you let the AI take over without judgment, you’ll end up with brittle systems and painful surprises (data loss, security holes, unscalable software). The tools are powerful, but they don’t absolve you of responsibility.
This is where I'm at also. The tools are powerful, but you are still responsible — you must still be a good engineer who can make good decisions. At current performance levels, the coding agents simply cannot make consistently good decisions at every level they are being required to consider. Writing code involves making a ton of small, medium, and large decisions, depending on the scope of the worked abstraction — the coding agents can make good decisions when they only need to make a few of them and have been given guidance. Somebody (the software engineer) still needs to be considering all of these decisions in order to provide the right guidance to keep the coding agents on productive track.