Spec-Driven Development: When the Spec Becomes the Source Code
Posted on: 5/25/2026 2:18:07 PM
Table of contents
- 1. Why "vibe coding" collapses at scale
- 2. What is Spec-Driven Development?
- 3. The four-phase workflow: Specify → Plan → Tasks → Implement
- 4. The project "constitution": invariant constraints
- 5. What goes into a good spec?
- 6. The 2026 SDD tooling ecosystem
- 7. SDD meets Multi-Agent: Coordinator → Implementor → Verifier
- 8. When NOT to use a spec
- 9. The Project Management view: the spec is a contract, not a procedure
- 10. Conclusion
You type "build me a login feature," the agent emits 600 working lines in five minutes — and then you spend three days figuring out what it actually does, patching holes you never asked for, and discovering it misread half your intent. That is the real cost of vibe coding at scale. In 2026, with agents fluent enough to write thousands of lines in one shot, the bottleneck is no longer code-generation speed but intent-communication speed. Spec-Driven Development (SDD) is the answer: invert the flow so that the specification — not the code — becomes the source of truth.
1. Why "vibe coding" collapses at scale
Vibe coding — typing a vague prompt and letting the agent infer the rest — is wonderful for a weekend prototype. But in a real codebase it exposes a structural flaw: the intent lives in your head, while the agent only sees the few loose sentences you just typed. The gap between "what you meant" and "what you said" is exactly where bugs, scope creep, and technical debt breed.
The problem worsens as agents get stronger. An agent will happily generate 600 lines on top of a wrong assumption from the very first sentence — and because the code runs, you do not notice until it is too late. This is the productivity paradox: the faster the agent, the further an error in intent propagates before anyone can hit the brakes.
⚠️ "House of cards code"
When prompts are throwaway and never recorded, every bug fix is a new prompt stacked on the old pile. The codebase becomes a house of cards: touch one part and another collapses, and nobody — not even the agent — still knows why the code looks the way it does. The specification evaporates the moment the prompt scrolls out of the chat window.
2. What is Spec-Driven Development?
SDD inverts the traditional relationship between documentation and code. In the old way, code is the truth and docs are the perpetually-stale afterthought. In SDD, a structured specification (usually Markdown checked into the repo) is the source of truth, and code is the output generated from it. The spec is not a draft thrown away once the code ships — it is a living artifact that gets reviewed, versioned, and serves as the contract between humans and agents.
The key difference from "writing thorough docs" is that the spec is written with enough precision for an agent to execute, and the process forces you to separate three questions that vibe coding blends together:
- WHAT & WHY — desired outcomes, scope boundaries, success criteria (the Specify phase).
- HOW — architectural decisions, tech stack, constraints (the Plan phase).
- IN WHAT ORDER — decomposition into testable units an agent can own (the Tasks phase).
💡 A simple test
When should you write a spec? If you'd be annoyed to have the agent interpret the requirements differently than you meant — write the spec. If you could fix a wrong output with a quick follow-up prompt — just prompt directly, skip the spec. SDD is not meant to wrap every line of code, only the parts where ambiguity is expensive.
3. The four-phase workflow: Specify → Plan → Tasks → Implement
This is the backbone of SDD, most clearly standardized by GitHub Spec Kit through four commands. Each phase produces an artifact that humans review before moving to the next — those are your oversight checkpoints.
flowchart LR
A[/specify/
WHAT & WHY] --> B[/plan/
HOW]
B --> C[/tasks/
IN WHAT ORDER]
C --> D[/implement/
Agent executes]
A -. review .-> A1{Right intent?}
B -. review .-> B1{Fits constraints?}
C -. review .-> C1{Small enough?}
D --> E[Code + Tests]
E -. verify against .-> A
style A fill:#e94560,stroke:#fff,color:#fff
style D fill:#16213e,stroke:#fff,color:#fff
style E fill:#4CAF50,stroke:#fff,color:#fff
- Specify — Describe the problem in business language: outcomes, who uses it, success criteria, edge cases. No technology mentioned yet.
- Plan — Translate the spec into architectural decisions: stack, data model, integration points, non-functional constraints (security, performance).
- Tasks — Decompose the plan into small, independent, testable units. Not "build authentication" but "create a user-registration endpoint, validate email format, return 409 on duplicate."
- Implement — The agent codes each task within the agreed constraints. Because each task is small with clear criteria, the output is easy to verify and to fix locally.
4. The project "constitution": invariant constraints
Spec Kit introduces a valuable concept: the constitution — a set of unchanging project principles loaded into every phase. This is where you record the things the agent must never violate: "always write tests first," "no network calls in the domain layer," "every API must paginate," "GDPR compliance for personal data."
Unlike loose prompts that get forgotten between turns, the constitution is a standing constraint. It turns the implicit conventions in a senior engineer's head into written law the agent must follow on every task.
# Constitution — Payments Project
## Invariant Principles
1. Every DB schema change ships with a rollback-able migration.
2. The domain layer MUST NOT depend on frameworks or I/O.
3. All monetary amounts use decimal, never float.
4. Every write endpoint must be idempotent via Idempotency-Key.
5. Tests first, code second. No merge that lowers coverage.
5. What goes into a good spec?
Per 2026 guidance (Addy Osmani, Thoughtworks, Spec Kit), a spec usable by an agent needs to define six elements. Leave any out, and the agent will "fill in the blank" itself — that is where it drifts from your intent.
| Element | Answers | Example |
|---|---|---|
| Outcomes | What does success look like? | User resets password via email in < 2 minutes |
| Scope boundaries | What is OUT of scope? | No social login in this version |
| Constraints | Technical/business limits? | Token expires in 15 min, send mail via SES |
| Prior decisions | What is already settled? | PostgreSQL is in use, no new database |
| Task breakdown | How is it split? | 5 independent tasks, each testable alone |
| Verification | How do we know it's right? | Acceptance criteria + test case per task |
Decompose tasks TDD-style for the agent
Each task should be something you can implement and test in isolation, almost like test-driven development. Instead of "build authentication," write: "create a POST /register endpoint taking email + password, validate email format, hash the password with Argon2, return 201 with user_id, return 409 if the email already exists." The more concrete the task and the clearer the acceptance criteria, the fewer chances the agent has to "get creative" in the wrong place.
6. The 2026 SDD tooling ecosystem
SDD now has a mature toolset split into two groups: static-spec (lock requirements upfront, then generate code) and living-spec (keep docs synchronized with code as agents work). Here are the standouts.
| Tool | Type | Strengths |
|---|---|---|
| GitHub Spec Kit | Open-source CLI | Standard 4 commands specify/plan/tasks/implement; supports 29 agents (Claude Code, Copilot, Gemini CLI, Cursor...); tool-neutral |
| Amazon Kiro | Agentic IDE | Built-in Spec Mode; Auto-router picks the optimal model per task; puts the spec at the center of the lifecycle |
| BMAD-METHOD | Agent framework | Specialized agent roles (analyst, PM, architect, dev) simulating a software team |
| OpenSpec | Living-spec | Keeps a living spec in sync with the code; good when the spec evolves continuously |
💡 Choose the tool by team profile
Teams just starting out who want to stay tool-neutral → Spec Kit (just Markdown scaffolding + commands, plugged into your existing agent). Teams wanting a seamless IDE experience → Kiro. Teams that want to simulate a multi-role process → BMAD. The key point: the tool is only a shell; the real value lies in the discipline of separating Specify/Plan/Tasks.
7. SDD meets Multi-Agent: Coordinator → Implementor → Verifier
The most powerful yet underused pattern in SDD is to never let an agent grade its own work. Once the spec has cleanly split tasks, you can assign roles to multiple agents: a Coordinator reads the spec and delegates, Implementors build individual tasks, and an independent Verifier checks the result against the acceptance criteria before marking it done.
sequenceDiagram
participant S as Spec + Tasks
participant C as Coordinator
participant I as Implementor Agents
participant V as Verifier Agent
participant H as Human
S->>C: Task list + acceptance criteria
C->>I: Delegate independent tasks
I->>V: Code + tests per task
V->>V: Check against the spec
alt Meets criteria
V->>C: PASS, mark complete
else Fails
V->>I: FAIL + specific reason
I->>V: Fix and resubmit
end
C->>H: Report progress against the spec
Because every decision refers back to the spec, disagreements between agents are resolved by re-reading the specification rather than guessing. The spec becomes the shared "referee" — impossible in vibe coding, where every prompt is its own private truth.
8. When NOT to use a spec
SDD is not free: writing and reviewing a spec is an upfront time cost. Applying it to everything turns into bureaucracy. Use the following heuristic to decide:
flowchart TD
A[New piece of work] --> B{Would ambiguity be costly?}
B -- Yes --> C{Do many people/systems
depend on the result?}
B -- No --> P[Prompt directly, skip the spec]
C -- Yes --> S[Write a full spec]
C -- No --> D{Will it be reworked often?}
D -- Yes --> S
D -- No --> M[Light spec: a few bullet points]
style S fill:#e94560,stroke:#fff,color:#fff
style P fill:#4CAF50,stroke:#fff,color:#fff
style M fill:#2c3e50,stroke:#fff,color:#fff
A small CSS patch, a one-off script, a throwaway experiment — do not spec. A payments module, a shared schema change, a feature many teams depend on — definitely spec. SDD gives you the choice of how much formality to apply, instead of forever swinging between the "vibe" extreme and the "40-page document" extreme.
9. The Project Management view: the spec is a contract, not a procedure
SDD is not only an engineering technique — it reshapes how a team collaborates with agents. For Tech Leads and PMs, the spec delivers three things vibe coding destroys:
- Traceability of intent: every line of code traces back to a task, every task to an outcome in the spec. When someone asks "why does the code do this?", the answer is in the repo, not in a closed chat window.
- Review at the right layer: instead of reviewing 600 generated lines, you review the spec — where a wrong sentence is cheaper than a thousand wrong lines. This shifts quality control upstream.
- A unit of work for agents: a task in SDD is the agent's "ticket" — estimable, testable, parallelizable. A PM plans a sprint for a hybrid human + agent team just as they would for a human one.
💡 A mental model for PMs / Tech Leads
Treat the agent as an extremely fast engineering team that takes instructions literally. You would not hand a new outsourcing team a one-liner "go build login" and expect the right result. You write requirements, boundaries, acceptance criteria. SDD is the codified version of that practice — except the recipient reads and executes in minutes, so the quality of the input (the spec) determines almost all of the quality of the output.
10. Conclusion
In 2026, the edge does not go to the team that "types prompts fastest," but to the one that communicates intent most clearly to ever-stronger agents. Mature Spec-Driven Development means:
- Treating the spec as the source of truth, with code as the generated output — not the other way around.
- Separating Specify → Plan → Tasks → Implement, reviewing at each phase rather than the final code.
- Pinning principles with a constitution, decomposing work into testable tasks TDD-style.
- Using a Verifier separate from the Implementor so no agent grades its own work.
- Knowing when NOT to spec — a spec is an investment, not a ritual.
Vibe coding asks "can the agent write code?" Spec-driven asks a harder, more important question: "have I expressed what I want clearly enough that anyone — human or machine — would build the right thing?" That question has always been the heart of software engineering; agents simply make it more urgent than ever.
References
- GitHub — Spec Kit: Toolkit for Spec-Driven Development
- Kiro — Bring engineering rigor to agentic development
- Thoughtworks — Spec-driven development: unpacking a key new AI-assisted engineering practice
- Addy Osmani — How to write a good spec for AI agents
- MarkTechPost — Meet GitHub Spec Kit
- Towards Data Science — From Vibe Coding to Spec-Driven Development
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.