The Method That Replaces Spec-Driven Development

I have been saying Spec-Driven Development is a problem, and why. This is the first time I show the fix, and it ends somewhere the saying-so never did, on the client's invoice.

If you do not have a Medium subscription, use this link.

SDD took shape because vibe coding wasn't working out. You described a goal, you got a block of code back, it looked right, but it didn't quite work. GitHub said this themselves when they shipped spec-kit in September 2025: models are "exceptional at pattern completion, but not at mind reading." The reach was for the spec. Write down what you want, in detail, up front, and the agent stops guessing. Someone wrote a few commands, GitHub being GitHub, spec-kit took over the conversation, and a dozen others arrived behind it: Kiro from AWS, Tessl, BMAD, Agent OS. The category went mainstream in under a year.

One correction, because careful readers will check. Spec-driven development is not new, and the people who looked closely said so a long time ago. Larman and Basili documented in IEEE Computer in 2003 that iterative development goes back to the 1950s, and the single-pass, document-driven ideal was doubted from the start, even by Royce, the man usually blamed for it. A year later, a paper at XP 2004, Ostroff, Makalsky, and Paige, "Agile Specification-Driven Development", argued not for big upfront specs but the opposite: that a "complete" specification is a flawed ideal and the spec should emerge as tests and contracts. They could not write the word complete without putting it in quotes. The modern industrial use predates the AI wave by years. Vibe coding's failure did not invent SDD. It made SDD go mainstream. And the man who lit the fuse, Andrej Karpathy, coined vibe coding, not SDD. He is the bridge between the two eras, the one who later said the vibe era was ending. The easy version of this story credits him with both, and it is wrong.

Here is the version that is right, and that nobody is telling. Almost no one is even aware of it, because they are still struggling with SDD.

The SDD leaves holes, and the agent fills them

The whole problem with SDD is that people write the spec however they want. It holds the what. It holds the how. It holds whatever and however an engineer can write it. Every engineer fills it with whatever was in their head that morning. Then the agents, Claude Code, Codex, and Droids, read the spec and make decisions, because there are gaping holes in it. The holes are not a defect in the spec.

These are two human gaps, not a tooling problem. The first: engineers do not take the time to learn what a spec actually is. We assume we know because we have written specs for 20 years. We have not written this kind. The second: the methods going viral were not built for the place most of us work. A method that scales beautifully in a forty-minute YouTube demo on a greenfield app is not a method that scales across a real enterprise codebase with twelve teams, a decade of decisions baked in, and a compliance reviewer who needs to sleep at night. The marketing layer of this field is selling demos as methods. They are not the same thing, and the gap between them is paid for downstream.

What OpenAI's Symphony spec actually proves my point

Look at what OpenAI published with Symphony in April 2026. Symphony is an open spec for orchestrating agents, and the core of it is a single spec file, 2,169 lines, eighteen sections, written in formal must-and-should language. The level of depth shows exactly what we are asking people to do, and exactly what they cannot do.

If anyone could write all of that into one SPEC.md upfront, then yes, it works. That is not sarcasm; it is the literal truth. A complete, unambiguous, two-thousand-line spec produces good software. OpenAI spent about six months building an internal tool under one hard rule: no human-written code, every line generated by Codex. They got it working. Only after it worked did they distill the spec out of the running system. Then they had Codex build the reference implementation in Elixir, in one shot, and separately implement the same spec in five other languages, TypeScript, Go, Rust, Java, and Python, for the express purpose of shaking the ambiguities loose. The deep, RFC-grade spec is retrospective documentation of software that already ran, and that is not my spin on it; it is OpenAI's own account of the order in which things happened.

So the spec works if you can write it up front. And that is the trap nobody tells you, because you cannot write it upfront, and the one organization that produced a spec that good produced it last, not first, by reverse-engineering software that was already alive. The industry is selling the output of that process as if it were the method.

I did this to myself

I ran into this on my own work. The spec was good, but the agent still drifted because I had stepped out of the loop and let it fill in the parts I had not thought through. I called it drift for a while. It was not a drift. Drift is the word you reach for when you do not want to say you left. It cost me three days of rework to undo code that should never have been written. I run 150 to 200 million tokens a day working with my agents, and at current Opus rates, three days of that come to about $985, real money I spent making the problem and then paying a second time to unmake it. That was the cheapest version of this failure in the whole piece, my own money on my own machine. The spec being good is exactly why this matters: a good spec does not hold on its own; it holds because someone stays in the loop while it runs, and that time, I didn't.

ICE: the three crafts and the loop they run in

The frame is ICE: Intent, Context, Expectations. This piece defines the first one because Intent is the primitive everything else hangs off of. Context and Expectations are real crafts too, but they are for a later piece; trying to define all three at once is how you end up back at the single bloated document this is meant to replace. So Intent now, Context and Expectations later.

Take the simplest example. A business scenario: "A user wants to buy a red shoe for under $90." The intent is to buy a red shoe for under $90, the outcome the human wants, with nothing in it about which stack or which service builds it. Intent is the first-class primitive of the whole method, and it is not one sentence. Five things together are what make something an intent: a description of what is wanted, the constraints around it, the scenarios under which it fails, the scenarios under which it succeeds, and the connections, the links to the other intents it touches, so a change here is traceable to everything it affects. Miss any of the five, and you are back in a hole the agent fills for you.

If you do one thing Monday, do this. Take one real outcome you are about to ship this week, not the system, one outcome: the red shoe under $90. Write the five parts for just that. What is wanted is a red shoe that the buyer can actually buy for under $90. The constraints, their size, in stock, and deliverable to them. The failure scenarios return a $140 shoe, an out-of-stock shoe, or a non-red shoe. The success scenario: the buyer adds an affordable red shoe to the cart and checks out. The connections, anything that touches price, inventory, or checkout, because a change there changes this. Then the test: hand it to someone who was not in your head and ask them where the agent would still have to guess. Every place they point is a hole you were about to let fill. Close those, not the whole system. That is the first move, and it costs an hour, not a methodology rollout.

Expectations

These are the parts people would call the spec, and I deliberately do not call it the spec. It is the boundary: the scenarios under which the result counts as done, the scenarios under which it has failed, and the limits it must stay inside, written in terms the user would recognize rather than in implementation language. Keeping it as its own craft, owned by the same human who owns the intent, is the whole fix; the moment the definition of done drifts away from the person who wanted the outcome, the agent starts deciding 'done' for them.

Context

is the how: the tech, the existing system, and the constraints of the codebase, the shoe gets built in. It should come from your harness and be fed progressively as needed, rather than dumped into a single wall at the start. Everything else, the model, the prompting, the orchestration, the harness used to actually run the loop, is mechanics.

The crafts are only pieces until you see them move, so this is the loop we have to build.

The human gives two things: the intent and the expectations that say what is meant by what is done and what the result must not be. Those feed the agentic coding loop, where the harness does the work, pulls the context, codes it, and validates against the expectations; if they are not met, it goes again, and it keeps going until they are. Finally, once it's done, the code merges. The human owns the intent and the expectations, and never leaves them; the harness owns the loop and is never asked to invent what the human wanted.

That is the whole point of ICE. SDD breaks because it asks one document, written one way by whoever is holding the keyboard, to serve as the intent, the definition of done, the workflow, and the context all at once, with the gaps between them left to the agent's discretion. It is also worth being clear that the harness is not the method: spec-kit, BMAD, and the prompt-and-workflow crowd are harnesses, useful harnesses, but still only harnesses, while ICE is the method that decides what the work is before the harness ever handles it. Adopt the harness without the method, which is the default today because the harness is the part with a download button, and you get the same failure I paid for in three days of rework, only at a scale with a client's name on it.

I call it IDSD, intent-driven software development, where declaring outcomes and letting the machine determine implementation is the normal way of working rather than the aspiration.

Some will say this is still spec by another name, and they are right about the files and wrong about what changed. These are all still markdown, intent, expectations, and context, and the format was never the problem. What changed is what each file is and who owns it. There is no spec craft: the thing that used to be a sprawling spec is now expectations, a short boundary written by the person who wanted the outcome, not guessed at by whoever held the keyboard. Context is how the agentic tools do their job; they are owned by the harness, not improvised in the same paragraph as the user's wish. One file written; however, the keyboard-holder felt that morning was the thing that broke, and intent, plus an explicit set of expectations, handed to a harness that owns the build, is a different thing entirely.

IDSD is at least 2 levels above where the industry is sitting today

I showed the implementation to my team. Not the theory, the real thing. I put up how I write an intent for a consumer use case, then how I write one that an agent is meant to consume, along with the constraints, failure scenarios, and links back to every other intent it touches. I watched it land. You can read a room by the second slide, and this room had gone quiet in a way that is not agreeable.

Ira said it out loud. She is my lead consultant, eighteen years in. "The teams haven't adopted SDD yet," she said. "It's barely a year old, and most of them are still pretending to do it. Now you want them to write intent and expectations as separate crafts, with traceability between them, and do it for agents too. This is two rungs above where the teams actually are."

Then Nyra. She had let the room move on, the way she does. "One minute." Soft, like it was a small thing. "The teams not being ready is not the risk. The risk is the day an agent writes ten thousand lines that look right, and nobody owns the part that explains what 'right' means. We are not asking them to learn a harder spec. We are asking them to keep doing the one judgment this method makes very easy to skip." She had just said the whole argument back to me in four sentences, and turned Ira's objection into the reason to do it rather than the reason not to.

That is the synthesis I walked out with. Ira is right that it is too advanced for where the teams are. Nyra is right that the answer is not to wait until they are ready; it is to make the one thing that must not be skipped impossible to skip. The rest of this is me trying to build that.

What keeps me up

Here is the worry, said plainly. It is a real fear, and I do not have it fully solved.

When my teams work on projects, they will miss critical parts of the system. This is Nyra's ten thousand lines that look right, the thing she put her finger on quietly before I had words for it. Not because they are not good. Because it is now trivially easy to generate an enormous amount of code, and the easier generation gets, the more confidently you can be wrong at volume. When METR ran a controlled trial on this in 2025, experienced developers were measurably slower with AI and walked away certain that it had made them faster. Being wrong while feeling fast is the whole failure in one sentence. We will spend time and money producing the wrong thing well. That scares the shit out of me, and I have the receipts to prove it's earned. That is my own three days of rework again, much bigger, and on a client's bill.

The genuine answer, the one I believe and do not fully live, is that you have to be involved at every step. Part of the team, while the work happens, not the reviewer who arrives at the end to bless a diff already too large to truly read. I have called this out as a core metric, presence in the loop, not approval at the gate. I do it fairly well. I cannot do it all the time. The part of me that wants to trust the agent and step away is the same part that bought the three days of rework. I know the failure intimately because I am a repeat customer.

Who actually pays

This is not, in the end, a software engineering story. It is an economics story, and the people at the bottom of it never read a spec in their lives.

There are two ways to build, and the industry knows it even when it does not say it. You write code, slow to author and cheap to run, because computing has been commoditized for years. Or you lean on agents, fast to author and expensive to run. Not because tokens got expensive; the per-token price keeps falling. Expensive because an agent left to fill the gaps burns so many more tokens per finished outcome that the cost of the result climbs even while the sticker price drops, and a startling number of those tokens go into being confidently wrong before anyone notices.

Broken spec-driven development sits on the expensive side of that line, and it does not stay there quietly. It inflates the cost of building software, and that inflation is not absorbed by the vendor selling the method or the influencer demoing it.

SDD breaks because we asked humans to do the one thing humans cannot do: specify everything before it exists. It breaks because the methods sold to do it were built for demos. None of that cost goes away. It moves down the line, and this is where it lands. The client who never read a spec, and the people the client serves, who never chose this and were not in the room when it broke.

Here's what I am doing about it on my own team. We are moving past SDD. We are going to run IDSD: the intent and the expectations written with real discipline, the context held to exactly what we want, and then the agents do the work. I am a bit scared of it. I am doing it anyway because it has to be done and because the principle behind it is dogfooding. You run your own method on yourself before it ever reaches a client's bill. The flywheel only compounds if it spins, and it only spins if someone is willing to take the first turn. This is how it starts.

So the question is not whether your agents can do the work. They can. The question is whether you have the discipline to own the intent and the expectations yourself, and the nerve to stay in the loop while the machine runs, on the day it would be so much easier to step out and bless the diff. That day is the whole game. What do you do on it?

The IDD vs SDD series (all free, no paywall)

This is one connected argument. Wherever you started, here is the whole arc:

The Trap SDD Is Setting, why the discipline collapses once building gets cheap
Spec-Driven Development Isn't Broken. It Collapsed., the deeper failure under the method
The Method That Replaces Spec-Driven Development: IDSD, intent, context, expectations
The Anatomy of Intent from ICE, what intent actually is
Spec-Driven Development Is Breaking the Fifty-Year-Old Iron Triangle, the new triangle that replaces it

One note. "Ira" is not her real name. The conversation happened the way I wrote it. I changed what could identify her, not what she said.

The Method That Replaces Spec-Driven Development — IDSD

I have been saying Spec-Driven Development is a problem, and why. This is the first time I show the fix, and it ends somewhere the…