What I Got Wrong Building With AI

The Confidence Trap

The content management system worked. The essay I wrote about building it resonated with people. For a few weeks I carried around a particular kind of satisfaction, the kind that comes from having shown, not argued, that a senior technologist could still make something real with his own hands.

The narrative was clean. A CTO who had not written production code in years sat down with an AI coding assistant and, in a handful of focused sessions, shipped a working publishing platform to a small Linux server in Denmark. It runs today. Readers read things on it. The argument almost made itself. Experience matters more than ever. AI makes it usable again. The builder can come back.

All of that was true. It was also incomplete.

The content management system was a forgiving system. The domain is well understood. Content, templates, a database, a rendering pipeline, a deployment. The requirements lived in my head, had lived there for years, and the feedback loop between what I wanted and what the machine produced was short enough that we could converge by conversation. Each decision was small enough to hold. When the shape of something felt wrong, I could see the wrongness immediately, and Claude Code and I could try something else before the next coffee got cold.

What I took from that experience was a lesson about tools and experience. What I also took from it, and did not examine carefully enough, was a feeling. The feeling was that this now scaled. That the same dynamic would hold at the next level of complexity, and the level after that.

So I picked the next level of complexity.

An applicant tracking system. Not a demo, not a side project: a real platform, for real users, with real stakes. Multi-tenancy, role-based access, an event-sourced pipeline engine, a skills taxonomy, an AI scoring module under EU AI Act scrutiny, a React frontend talking to a .NET 9 modular monolith on Azure. A system with regulatory constraints, interdependent subsystems, and no forgiveness for architectural sloppiness. The tools were the same. The builder was the same. The confidence had compounded.

The confidence was the problem. I did not yet know that a confidence you have earned in one kind of system can betray you in another.

When Speed Becomes the Problem

The applicant tracking system had a specification. This matters, because the most obvious explanation for what went wrong, that I had skipped the preparation, would be wrong. The work was done. There was a substantial document: architectural principles, a high-level component diagram, five integration flows to the ERP, a multi-entity model, a multi-tenancy model, non-functional requirements, a sketched data model. Fourteen functional modules, each described as a list of user stories with acceptance criteria. By every conventional measure, this project was well specified.

What it did not have was architecture. Not really.

It had the outside of architecture: the principles, the diagram, the cross-cutting concerns. What it did not have was the inside. The bounded contexts, the composition model, the event contracts between modules, the runtime assumptions one module was allowed to make about another. The diagram had eight boxes, and the specification said almost nothing about what was meant to live inside any of them, or about how they were meant to compose when the system was actually running.

And it did not have sequencing. Not in the sense that matters. Fourteen modules, fourteen parallel streams of user stories, all equally weighted, all apparently ready to be built at once.

I had the what. I did not really have the how. I did not have the when at all.

The approach felt modern. I set up a swarm of agents and gave each of them a module to own. One on the pipeline engine. One on the skills taxonomy. One on the candidate service. One on the AI scoring layer. They ran in parallel. They were productive. I felt like I was conducting something, a quiet orchestra of specialists turning my specs into running code while I moved from module to module, reviewing, nudging, integrating.

For a while this worked. Each module, considered alone, was fine. The pipeline engine passed its tests. The skills taxonomy did what a skills taxonomy should do. The AI service returned scored candidates with the right explanatory metadata. Green checkmarks accumulated, and the green checkmarks felt like progress.

Then the modules met each other.

The pipeline engine needed the skills module, which needed the candidate service, which needed the tenant context, which had been wired subtly differently in each of the three agents that had touched it. The recruitment workflow needed to observe events from four different subsystems, each with its own tacit assumptions about when events fired, in what order, and with what payload shape. None of this was visible in the specification, because the specification described features, and the problem was not in any feature. The problem was between them. The space between them was exactly what the parallel-build approach had refused to model.

So I did what a builder with AI at hand will do. I wrote adapters. Wrappers. Translation layers. Small patches where the assumptions did not line up. Each patch was a reasonable piece of code. Each patch technically worked. Each patch also added coupling, and coupling compounds. The next patch was harder to write because the system it sat inside had more rules. The one after that harder still. I was generating code at a rate I would have found impossible two years ago, and the system was getting less coherent with every session.

There is a moment, in a project going wrong this way, when you realize you are no longer building. You are patching. The distinction is not visible in the output; the editor still shows code appearing, tests still pass, features still demo. The distinction is visible in the direction of travel. A system being built is getting more coherent over time. A system being patched is getting less coherent, and the patches exist to hold off the incoherence for another week.

I stopped.

Not abandoned. Stopped. There is a difference. Abandonment is giving up on the outcome. Stopping is refusing to keep paying interest on a debt you have no plan to repay. Every new line of code in that codebase was making the eventual correction more expensive, not less, and the correction was coming either way. The only question was whether I would choose its timing or have it chosen for me.

The speed that had been a superpower in the content management system had become, in this system, the mechanism by which the architecture was quietly destroyed. The agents were building faster than the shape of the thing could absorb. And because the specification did not tell them what order to build in, or where the boundaries were, or what not to build yet, they optimized locally and diverged globally.

This failure mode scales. Any organization that treats feature decomposition as system decomposition will manufacture incoherence at machine pace.

They were doing exactly what I had asked them to do. What I had asked them to do was wrong.

The Shape Before the Code

The third project was chosen for its unforgiveness.

After the applicant tracking system, I did not want an easier problem. I wanted the opposite. A domain where every transaction has a legal counterpart, where numbers must reconcile, where "close enough" is a synonym for "wrong." If the content management system had been forgiving and the applicant tracking system had been complex in ways I had not modeled, this one would be complex and unforgiving in exactly the places the previous project had failed.

This time I started with a thesis. Speed would be subordinate to the shape of the thing.

An accounting engine for organizations that had outgrown consumer bookkeeping. General ledger, accounts receivable, accounts payable as the core. Inventory, manufacturing, project accounting, payroll explicitly out of scope for a first version. A target market described in enough detail that every later question of the form "how complex should this be?" had a document to answer it. Before anything else, I knew what the thing was and what it was not.

Then I specified the architecture. Before any code.

I decided what the engine was. A state machine interpreter with a type-safe DSL. Guards on transitions. Actions on transitions. Event emission on transitions. Composition across machines, so that a transition in one could cause a transition in another. Formally, a 7-tuple extension of the classical finite state machine. Multitenancy by schema isolation. A system-wide audit log with total ordering. These were not features to be discovered during implementation. They were the shape of the thing, and the shape came first.

Then I decomposed the build into milestones.

The first milestone was the engine core, alone. Not integrated with anything. Not persisting anything. Just the interpreter, the DSL, the guards, the actions, the emit clauses, and the tests that proved each of these worked. The agents' job for that milestone was the engine, and only the engine. Everything else was explicitly out of scope, and the specification said so.

The second milestone added persistence. The engine now had memory. A relational schema, migrations, an event table with total ordering. The agents did not have to invent where state should live, because the first milestone had already established that state was a thing the engine had. They were adding durability to an existing shape.

The third milestone introduced composition. Cross-machine events, so that posting an invoice could cause a general ledger machine to transition, so that receiving a payment could settle an invoice machine. Coupling between machines was now event-based and explicit, rather than implicit and everywhere at once.

The fourth milestone was the first real domain. Accounts receivable: invoices, payments, credit notes, write-offs, each modeled as a state machine running on the engine built in the first three. Because the engine already existed, the domain modeling could be pure domain modeling. No one had to invent infrastructure.

Two more milestones followed: introspection, so that any machine could be queried for its state and history, and a second domain, accounts payable, chosen deliberately to stress the architecture. Both shipped. The architecture held.

Each milestone had its own specification document, its own plan, its own data model, its own API contract, its own requirements checklist, its own quickstart. I wrote them before I wrote any code for the milestone. The agents worked on one milestone at a time. No forward references. No speculative architecture. No features that belonged to milestone four appearing, half-finished, in milestone two.

What changed was not the tools and not the builder. What changed was that the agents were never confused about scope. They never had to guess whether something belonged in the current milestone or a later one, because the specification told them. They never had to invent architectural decisions on the fly, because the architectural decisions had already been made. The specification was not describing a destination. It was drawing a road, with markers, and the agents were staying on it.

The feeling was different too. In the applicant tracking system I had spent my time arbitrating between modules and writing patches. In the accounting engine I spent my time checking that the agents had executed against a target I had set. I had stopped catching up to what the tools were doing. I was in front of the work, not behind it.

All six milestones shipped. The system works. More importantly, it works in a way I trust.

And here is the insight I did not have before, the one I did not get from reading about AI-assisted development but from doing one version badly and the next version well. The specification in an AI-assisted project is not documentation for humans. It is the control surface for machines. It tells the agents what to build, which is the obvious part. It also tells them, and this is the part that matters, what not to build yet. That negative space, the explicit absence of forward references, is the thing that keeps the architecture intact while the code gets written faster than any human could review it.

Without it, the agents will build you what you asked for, very quickly, in a way that cannot be lived with.

The Craft Was Always the Point

Three projects, the same person, the same tools, three different outcomes. A content site that wrote itself. An applicant tracking system that buried itself in its own output. An accounting engine that held together because the architecture was specified before the implementation was.

The variable was not the technology.

The prevailing story about AI-assisted development is a story about speed. How many lines per hour. How many features per sprint. How quickly the prototype becomes production. That framing is not just incomplete. It is, in a way I did not see clearly before the applicant tracking system, actively dangerous. Speed without structure does not produce software. It produces volume. And volume of code, in a system that does not know its own shape, is just another word for debt being accumulated at machine pace. Even the agents cannot keep up with it.

What these three projects taught me is that the craft of software has not changed. It has been compressed.

Architecture still matters. Sequencing still matters. Holding the whole system in your head, sensing that a design will not hold up two milestones from now, knowing which piece must exist before the next piece has anywhere to stand, these have always been the hard parts of building software. They still are. Writing good code was never easy either. Holding the right abstractions in your head while turning intent into working syntax is serious craft, and a competent developer in flow is doing something impressive. It turns out machines happen to be very good at that particular kind of difficulty, better than humans in throughput if not in judgment. What AI has done is take over the parts of the work where it has a structural advantage, and by doing so it has changed the proportions. The thinking work was always there, always necessary, always the hardest part to get right. It just used to be one layer of the craft among several. Now it is the layer that the rest of the craft waits on.

For senior technical leaders reading this, that is good news, even if it does not sound like it at first. The skills you accumulated over decades, the ability to decompose a problem into the right pieces in the right order, the pattern recognition that tells you a design will fail under load before you have the load to prove it, the taste that tells you a boundary is in the wrong place, those skills are now the bottleneck. And the bottleneck, in any system, is the thing worth the most. AI has not made experience obsolete. It has made experience the scarce resource.

Three projects since February. The first one worked because the shape of it was small enough to live in my head. The second one broke because I did not know that the shape was the thing I most needed to externalize. The third one held because I finally wrote the shape down before I let the tools build against it.

Sometimes you have to do the thing the wrong way before you can see why the right way was right all along.

What I Got Wrong Building With AI

The Confidence Trap

When Speed Becomes the Problem

The Shape Before the Code

The Craft Was Always the Point

You might also like

When the Architect Picks Up the Hammer

The Decline of the Org Chart