Spec-Driven Development: When the Spec Becomes the Source Code

Posted on: 5/25/2026 2:18:07 PM

You type "build me a login feature," the agent emits 600 working lines in five minutes — and then you spend three days figuring out what it actually does, patching holes you never asked for, and discovering it misread half your intent. That is the real cost of vibe coding at scale. In 2026, with agents fluent enough to write thousands of lines in one shot, the bottleneck is no longer code-generation speed but intent-communication speed. Spec-Driven Development (SDD) is the answer: invert the flow so that the specification — not the code — becomes the source of truth.

4Core phases: Specify → Plan → Tasks → Implement
93k+GitHub stars on Spec Kit as of May 2026
29Coding agents Spec Kit integrates with
2 daysAWS cut a feature from 2 weeks using Kiro Spec Mode

1. Why "vibe coding" collapses at scale

Vibe coding — typing a vague prompt and letting the agent infer the rest — is wonderful for a weekend prototype. But in a real codebase it exposes a structural flaw: the intent lives in your head, while the agent only sees the few loose sentences you just typed. The gap between "what you meant" and "what you said" is exactly where bugs, scope creep, and technical debt breed.

The problem worsens as agents get stronger. An agent will happily generate 600 lines on top of a wrong assumption from the very first sentence — and because the code runs, you do not notice until it is too late. This is the productivity paradox: the faster the agent, the further an error in intent propagates before anyone can hit the brakes.

⚠️ "House of cards code"

When prompts are throwaway and never recorded, every bug fix is a new prompt stacked on the old pile. The codebase becomes a house of cards: touch one part and another collapses, and nobody — not even the agent — still knows why the code looks the way it does. The specification evaporates the moment the prompt scrolls out of the chat window.

2. What is Spec-Driven Development?

SDD inverts the traditional relationship between documentation and code. In the old way, code is the truth and docs are the perpetually-stale afterthought. In SDD, a structured specification (usually Markdown checked into the repo) is the source of truth, and code is the output generated from it. The spec is not a draft thrown away once the code ships — it is a living artifact that gets reviewed, versioned, and serves as the contract between humans and agents.

The key difference from "writing thorough docs" is that the spec is written with enough precision for an agent to execute, and the process forces you to separate three questions that vibe coding blends together:

  • WHAT & WHY — desired outcomes, scope boundaries, success criteria (the Specify phase).
  • HOW — architectural decisions, tech stack, constraints (the Plan phase).
  • IN WHAT ORDER — decomposition into testable units an agent can own (the Tasks phase).

💡 A simple test

When should you write a spec? If you'd be annoyed to have the agent interpret the requirements differently than you meant — write the spec. If you could fix a wrong output with a quick follow-up prompt — just prompt directly, skip the spec. SDD is not meant to wrap every line of code, only the parts where ambiguity is expensive.

3. The four-phase workflow: Specify → Plan → Tasks → Implement

This is the backbone of SDD, most clearly standardized by GitHub Spec Kit through four commands. Each phase produces an artifact that humans review before moving to the next — those are your oversight checkpoints.

flowchart LR
    A[/specify/
WHAT & WHY] --> B[/plan/
HOW] B --> C[/tasks/
IN WHAT ORDER] C --> D[/implement/
Agent executes] A -. review .-> A1{Right intent?} B -. review .-> B1{Fits constraints?} C -. review .-> C1{Small enough?} D --> E[Code + Tests] E -. verify against .-> A style A fill:#e94560,stroke:#fff,color:#fff style D fill:#16213e,stroke:#fff,color:#fff style E fill:#4CAF50,stroke:#fff,color:#fff
The four phases of SDD: every dashed arrow is a point where a human approves before proceeding.
  1. Specify — Describe the problem in business language: outcomes, who uses it, success criteria, edge cases. No technology mentioned yet.
  2. Plan — Translate the spec into architectural decisions: stack, data model, integration points, non-functional constraints (security, performance).
  3. Tasks — Decompose the plan into small, independent, testable units. Not "build authentication" but "create a user-registration endpoint, validate email format, return 409 on duplicate."
  4. Implement — The agent codes each task within the agreed constraints. Because each task is small with clear criteria, the output is easy to verify and to fix locally.

4. The project "constitution": invariant constraints

Spec Kit introduces a valuable concept: the constitution — a set of unchanging project principles loaded into every phase. This is where you record the things the agent must never violate: "always write tests first," "no network calls in the domain layer," "every API must paginate," "GDPR compliance for personal data."

Unlike loose prompts that get forgotten between turns, the constitution is a standing constraint. It turns the implicit conventions in a senior engineer's head into written law the agent must follow on every task.

# Constitution — Payments Project

## Invariant Principles
1. Every DB schema change ships with a rollback-able migration.
2. The domain layer MUST NOT depend on frameworks or I/O.
3. All monetary amounts use decimal, never float.
4. Every write endpoint must be idempotent via Idempotency-Key.
5. Tests first, code second. No merge that lowers coverage.

5. What goes into a good spec?

Per 2026 guidance (Addy Osmani, Thoughtworks, Spec Kit), a spec usable by an agent needs to define six elements. Leave any out, and the agent will "fill in the blank" itself — that is where it drifts from your intent.

ElementAnswersExample
OutcomesWhat does success look like?User resets password via email in < 2 minutes
Scope boundariesWhat is OUT of scope?No social login in this version
ConstraintsTechnical/business limits?Token expires in 15 min, send mail via SES
Prior decisionsWhat is already settled?PostgreSQL is in use, no new database
Task breakdownHow is it split?5 independent tasks, each testable alone
VerificationHow do we know it's right?Acceptance criteria + test case per task

Decompose tasks TDD-style for the agent

Each task should be something you can implement and test in isolation, almost like test-driven development. Instead of "build authentication," write: "create a POST /register endpoint taking email + password, validate email format, hash the password with Argon2, return 201 with user_id, return 409 if the email already exists." The more concrete the task and the clearer the acceptance criteria, the fewer chances the agent has to "get creative" in the wrong place.

6. The 2026 SDD tooling ecosystem

SDD now has a mature toolset split into two groups: static-spec (lock requirements upfront, then generate code) and living-spec (keep docs synchronized with code as agents work). Here are the standouts.

ToolTypeStrengths
GitHub Spec KitOpen-source CLIStandard 4 commands specify/plan/tasks/implement; supports 29 agents (Claude Code, Copilot, Gemini CLI, Cursor...); tool-neutral
Amazon KiroAgentic IDEBuilt-in Spec Mode; Auto-router picks the optimal model per task; puts the spec at the center of the lifecycle
BMAD-METHODAgent frameworkSpecialized agent roles (analyst, PM, architect, dev) simulating a software team
OpenSpecLiving-specKeeps a living spec in sync with the code; good when the spec evolves continuously

💡 Choose the tool by team profile

Teams just starting out who want to stay tool-neutral → Spec Kit (just Markdown scaffolding + commands, plugged into your existing agent). Teams wanting a seamless IDE experience → Kiro. Teams that want to simulate a multi-role process → BMAD. The key point: the tool is only a shell; the real value lies in the discipline of separating Specify/Plan/Tasks.

7. SDD meets Multi-Agent: Coordinator → Implementor → Verifier

The most powerful yet underused pattern in SDD is to never let an agent grade its own work. Once the spec has cleanly split tasks, you can assign roles to multiple agents: a Coordinator reads the spec and delegates, Implementors build individual tasks, and an independent Verifier checks the result against the acceptance criteria before marking it done.

sequenceDiagram
    participant S as Spec + Tasks
    participant C as Coordinator
    participant I as Implementor Agents
    participant V as Verifier Agent
    participant H as Human
    S->>C: Task list + acceptance criteria
    C->>I: Delegate independent tasks
    I->>V: Code + tests per task
    V->>V: Check against the spec
    alt Meets criteria
        V->>C: PASS, mark complete
    else Fails
        V->>I: FAIL + specific reason
        I->>V: Fix and resubmit
    end
    C->>H: Report progress against the spec
Separating Implementor and Verifier: the checking agent must differ from the building one — only a spec match yields a PASS.

Because every decision refers back to the spec, disagreements between agents are resolved by re-reading the specification rather than guessing. The spec becomes the shared "referee" — impossible in vibe coding, where every prompt is its own private truth.

8. When NOT to use a spec

SDD is not free: writing and reviewing a spec is an upfront time cost. Applying it to everything turns into bureaucracy. Use the following heuristic to decide:

flowchart TD
    A[New piece of work] --> B{Would ambiguity be costly?}
    B -- Yes --> C{Do many people/systems
depend on the result?} B -- No --> P[Prompt directly, skip the spec] C -- Yes --> S[Write a full spec] C -- No --> D{Will it be reworked often?} D -- Yes --> S D -- No --> M[Light spec: a few bullet points] style S fill:#e94560,stroke:#fff,color:#fff style P fill:#4CAF50,stroke:#fff,color:#fff style M fill:#2c3e50,stroke:#fff,color:#fff
A spec is an investment: only spend when the cost of ambiguity exceeds the cost of writing the spec.

A small CSS patch, a one-off script, a throwaway experiment — do not spec. A payments module, a shared schema change, a feature many teams depend on — definitely spec. SDD gives you the choice of how much formality to apply, instead of forever swinging between the "vibe" extreme and the "40-page document" extreme.

9. The Project Management view: the spec is a contract, not a procedure

SDD is not only an engineering technique — it reshapes how a team collaborates with agents. For Tech Leads and PMs, the spec delivers three things vibe coding destroys:

  • Traceability of intent: every line of code traces back to a task, every task to an outcome in the spec. When someone asks "why does the code do this?", the answer is in the repo, not in a closed chat window.
  • Review at the right layer: instead of reviewing 600 generated lines, you review the spec — where a wrong sentence is cheaper than a thousand wrong lines. This shifts quality control upstream.
  • A unit of work for agents: a task in SDD is the agent's "ticket" — estimable, testable, parallelizable. A PM plans a sprint for a hybrid human + agent team just as they would for a human one.

💡 A mental model for PMs / Tech Leads

Treat the agent as an extremely fast engineering team that takes instructions literally. You would not hand a new outsourcing team a one-liner "go build login" and expect the right result. You write requirements, boundaries, acceptance criteria. SDD is the codified version of that practice — except the recipient reads and executes in minutes, so the quality of the input (the spec) determines almost all of the quality of the output.

10. Conclusion

In 2026, the edge does not go to the team that "types prompts fastest," but to the one that communicates intent most clearly to ever-stronger agents. Mature Spec-Driven Development means:

  • Treating the spec as the source of truth, with code as the generated output — not the other way around.
  • Separating Specify → Plan → Tasks → Implement, reviewing at each phase rather than the final code.
  • Pinning principles with a constitution, decomposing work into testable tasks TDD-style.
  • Using a Verifier separate from the Implementor so no agent grades its own work.
  • Knowing when NOT to spec — a spec is an investment, not a ritual.

Vibe coding asks "can the agent write code?" Spec-driven asks a harder, more important question: "have I expressed what I want clearly enough that anyone — human or machine — would build the right thing?" That question has always been the heart of software engineering; agents simply make it more urgent than ever.

References