Spec-Driven Development

Recommended effort: xhigh for design and audit phases; medium for implement, quick mode, and status checks.

Structured development workflow with adaptive depth. Right ceremony for the right scope.

Workflow

specify --> design* --> tasks* --> implement --> verify --> audit --> done
  ^______________________________________|  (verify after each task)

Adaptive pipeline: Specify and Implement always run; Design and Tasks auto-skip when scope is small enough. Verify runs after every task/range and marks AC checkboxes. Implement finishes at to-review; Audit validates Goals and Success Criteria, then transitions to done. Validate (UAT) is on-demand and can reprove any [x].

Context Loading Strategy

Base load:

.agents/project.md (context, if exists)
Current feature spec.md

On-demand:

.agents/codebase/*.md (brownfield)
.agents/knowledge.md (cross-feature decisions and gotchas)
decisions.md (designing or implementing from user decisions)
design.md (implementing)
tasks.md (implementing)
research/*.md (new technologies)

Never simultaneous:

Multiple feature specs
Multiple codebase docs

Artifact Structure Authority

Templates in templates/ are the canonical source of truth for every artifact's structure. Existing artifacts in .artifacts/ may be stale, predate skill updates, or have been authored before current conventions -- they are context, NEVER structural reference.

Load order when creating any artifact:

Load the relevant template from templates/ first
Only then read existing artifacts (spec.md, design.md, tasks.md, etc.) for domain context, prior decisions, or cross-feature continuity

If an existing artifact's structure diverges from the template, follow the template. Do not propagate legacy structure.

Triggers

Feature-Level (auto-sized)

Trigger Pattern	Reference
Create new feature, specify feature	specify.md
From PRD, extract from document, use this PRD	specify.md (via @file.md)
Modify feature, improve feature	specify.md (brownfield)
Discuss feature, capture context, how should this work	discuss.md
Create technical design, plan feature	design.md
Research technology, cache research	research.md
Create tasks	tasks.md
Implement task, execute task	implement.md
Verify implementation, check adherence, verify code	verify.md
Audit feature, validate goals, audit goals and success criteria	audit.md
Validate, UAT, manual testing, test manually	validate.md
Quick fix, quick task, quick mode, start quick mode, small change, bug fix	quick-mode.md
List features, show status	status-specs.md

Guidelines

Trigger Pattern	Reference
How to write specs	spec-writing.md
How to decompose tasks	tasks.md
Codebase exploration	codebase-exploration.md
Research patterns	research.md
Baseline discovery	baseline-discovery.md
Extract from PRD/docs	doc-extraction.md
Coding principles	coding-principles.md
Status workflow, when to update status	status-workflow.md
Knowledge format, Codebase Feedback format	knowledge.md

Notes:

deep-verify.md is not a direct trigger. It is loaded by verify.md during Step 5 (Code Correctness).
baseline-discovery.md is not a direct trigger. It is loaded by specify.md Step 8 for brownfield features.

Cross-References

specify.md --------> discuss.md (when gray areas detected)
specify.md --------> quick-mode.md (when Small scope)
specify.md --------> design.md (when Large/Complex, spec complete)
specify.md --------> implement.md (when Medium, skip design/tasks)
design.md ---------> tasks.md (when Large/Complex)
design.md ---------> research.md (if new tech)
tasks.md ----------> implement.md
implement.md ------> coding-principles.md (loaded before coding)
implement.md ------> verify.md (after every task/range)
verify.md --------> deep-verify.md (code correctness analysis)
verify.md --------> spec.md (marks AC [x] on pass, reverts on regression)
implement.md ------> audit.md (after to-review, validates Goals/Success)
audit.md ---------> spec.md (marks Goals/Success [x], transitions done)
implement.md ------> validate.md (on-demand UAT, any scope)
validate.md ------> audit.md (re-run required after UAT reproves any [x])
implement.md ------> tasks.md (safety valve: >5 inline steps)
specify.md --------> baseline-discovery.md (brownfield features)
design.md ---------> project-index (prompts integrate feedback after Step 8)
implement.md ------> project-index (prompts integrate feedback after Step 10)

Auto-Sizing

Complexity determines depth, not a fixed pipeline. Before starting any feature, assess its scope and apply only what's needed:

Scope	What	Specify	Design	Tasks	Implement
Small	≤3 files, one sentence, no user-facing feature	Quick mode -- skip pipeline entirely	-	-	-
Medium	Canonical pattern, ≤10 tasks, no novel architectural decisions	Spec (brief)	Skip -- explore inline	Skip -- steps implicit	Implement + verify per step
Large	Novel architectural decisions, or >10 tasks, or pattern new to this codebase	Full spec + requirement IDs	Full design	Full breakdown + dependencies	Implement + verify per task
Complex	Ambiguity in problem itself, or new domain to the user	Full spec + discuss gray areas	Research + full design	Breakdown + parallel design	Implement + verify per task

Medium vs Large, resolving the gray zone:

Multi-file is not Large. Touching 4-6 files does not upgrade a canonical pattern to Large -- the count of files is incidental. The question is whether the feature requires an architectural decision the reader of the spec could not have predicted from the feature description alone.

Dark-mode toggle (localStorage + system preference + CSS vars) -- Medium. Canonical pattern, no novel decision. Files touched is incidental.
Add "remember me" checkbox to existing login -- Medium. Pattern known, scope bounded.
Add role-based access control to an app without any prior auth model -- Large. Novel decision: where does role live (JWT vs DB lookup), how does enforcement layer work.
Build offline-first sync with conflict resolution, no prior CRDT experience -- Complex. Ambiguity in the problem itself (LWW vs CRDT vs event sourcing), new domain.

If you find yourself reaching for design.md because the feature is "multi-component," pause: if every file you will touch is an obvious consequence of the feature description, you are in Medium territory. Design.md exists to capture decisions a peer reviewer could not reconstruct from the spec -- if there are no such decisions, design.md is ceremony.

Rules:

Specify and Implement are always required -- you always need to know WHAT and DO it
Design is skipped when the change is straightforward (no architectural decisions, no new patterns)
Tasks is skipped when there are ≤3 obvious steps (they become implicit in Implement)
Discuss is triggered within Specify only when the agent detects ambiguous gray areas that need user input
Verify runs after every task/range -- checks design adherence, pattern adherence, code correctness (tooling-aware deep analysis), and visual adherence (optional); also marks AC [x] in spec.md on pass
Audit runs before done -- validates Goals and Success Criteria against evidence, marks their [x], and transitions to-review -> done; mandatory for every .artifacts/features/ feature (Medium/Large/Complex)
Validate (UAT) is on-demand -- user requests it when they want to manually test, any scope; may revert any [x] if user reproves
Quick mode is the express lane -- for bug fixes, config changes, and small tweaks (no audit needed)
Verification is continuous -- quality gates and acceptance criteria run after each task or range, never deferred to the end

Safety valve: Even when Tasks is skipped, Implement ALWAYS starts by listing atomic steps inline (see implement.md). If that listing reveals >5 steps or complex dependencies, STOP and create a formal tasks.md -- the Tasks phase was wrongly skipped.

Project Structure

.artifacts/
├── features/
│   └── {ID}-{name}/
│       ├── spec.md       # WHAT: Requirements (always created)
│       ├── decisions.md  # WHY: Decisions on gray areas (only when discuss triggered)
│       ├── design.md     # HOW: Architecture (only for Large/Complex)
│       ├── tasks.md      # WHEN: Tasks (only for Large/Complex)
│       └── designs/      # Visual references (optional)
├── quick/                # Quick mode tasks
│   └── NNN-{slug}/
│       └── task.md       # Includes completion fields (patterns_discovered, follow_up)
└── research/             # Research cache (reusable across features)
    └── {topic}.md

Project context:

.agents/
├── project.md            # Project context (project-index)
├── codebase/             # Codebase analysis (project-index)
└── knowledge.md          # Cross-feature decisions and gotchas (spec-driven)

Note: .agents/codebase/ is generated by the project-index skill. .agents/knowledge.md is owned by spec-driven -- it accumulates cross-feature decisions, gotchas, and queues codebase discoveries in a ## Codebase Feedback section for project-index to integrate. project-index reads knowledge.md for context and consumes the Codebase Feedback queue on demand (/project-index integrate feedback), but never rewrites Decisions or Gotchas. spec-driven is the sole writer to knowledge.md; project-index is the sole writer to .agents/codebase/*.md and .agents/project.md. If .agents/ doesn't exist, Specify suggests running project-index for better context (especially for brownfield projects). All feature artifacts stay within .artifacts/.

Templates

Context	Template
Feature spec	spec.md
Discuss context	decisions.md
Technical design	design.md
Task breakdown	tasks.md
Quick task	quick-task.md
Codebase exploration	exploration.md
Research cache	research.md
Session dump	session-dump.md

Knowledge Verification Chain

When researching, designing, or making any technical decision, follow this chain in strict order. Never skip steps.

Step 1: Codebase      -> check existing code, conventions, and patterns already in use
Step 2: Project docs  -> README, docs/, inline comments, .agents/codebase/
Step 3: Context7 MCP  -> resolve library ID, then query for current API/patterns
Step 4: Web search    -> official docs, reputable sources, community patterns
Step 5: Flag or ask   -> state partial reasoning tagged "verify", or ask user for direction

Rules:

Never skip to Step 5 if Steps 1-4 are available
Step 5 output is never presented as fact -- either flagged as uncertain or framed as a direction question to the user
NEVER assume or fabricate. If the chain does not resolve an answer, say "I don't know" and ask the user for direction. Inventing APIs, patterns, or behaviors causes cascading failures across design -> tasks -> implementation. Uncertainty is always preferable to fabrication.

Guidelines

DO:

Separate content by purpose: spec=WHAT (goals, stories, ACs), design=HOW, tasks=WHEN
Follow status flow: draft -> ready -> in-progress -> to-review -> done
Use sequential Feature IDs (001, 002)
Reuse research cache across features (.artifacts/research/)
Consume .agents/ for project context and codebase info (optional -- use if exists)
Queue codebase discoveries to .agents/knowledge.md ## Codebase Feedback section -- project-index integrates them into codebase/*.md on demand
Record cross-feature decisions and gotchas in .agents/knowledge.md during design and implement
Auto-size depth based on complexity -- skip phases that add no value
Run verify after each task or range -- design adherence, pattern adherence, visual (if references exist)

DON'T:

Reuse Feature IDs from previous features
Mix spec, design, and task content in a single file
Skip status transitions (e.g., jumping from draft to done)
Create feature-specific research files outside .artifacts/research/
Generate .agents/ content from scratch (that's project-index's responsibility)
Force full pipeline on small/medium changes -- respect auto-sizing
Assume or fabricate when information is unavailable -- follow Knowledge Verification Chain
Defer verification to the end -- verify runs per task/range, not as a final batch
Loop indefinitely on verify findings -- escape after 3 failed fix attempts

Phase Transitions

Each phase (specify, design, tasks, implement) should run in a clean context window. A polluted window (used for early phases and then for implementation) grows large and increases hallucination risk. Within design, research (Step 5) and codebase exploration (Step 6) may dispatch to sub-agents -- their disk artifacts are the handoff back to design.

Between phases:

finish phase -> append to session dump -> clean window -> start next phase
                                                           ...
                                                         (more phases)
                                                           ...
                                         end of session -> wrap-up (reads dump)

Complete the current phase and write its artifacts to disk
Append session context to .artifacts/.session-dump.md -- a near-complete dump of what happened (decisions, discoveries, blockers, open items, phase completed, next phase). Each phase appends, building a cumulative record
Clear the context window
Start the next phase in a clean window, loading only the artifacts it needs

The session dump is ephemeral -- wrap-up reads it at end of session to compose notes, then the file is disposable. It is not a project artifact.

Sub-agent dispatch:

When activities run in full form (Auto-Sizing decides), they can dispatch to sub-agents for context isolation. Disk artifacts are the handoff -- sub-agents don't return findings through the context. Inline forms (quick mode, Medium scope) run without dispatch.

Research sub-agents -- one per unknown topic, write to .artifacts/research/{topic}.md (design.md Step 5; multi-topic in a single dispatch turn).
Codebase exploration sub-agent -- one per design phase, runs the multi-phase exploration end to end, writes per templates/exploration.md (design.md Step 6).
Implement sub-agent -- one per user invocation (T001 / range / S001 / --all), owns Steps 7-8 (per-task implement + verify + mark [x]). Main agent dispatches once and resumes at Step 9 (implement.md Step 5).

Research and exploration sub-agents in design.md run in the same dispatch turn (independent). The implement sub-agent runs after design/tasks artifacts exist.

Compact Instructions

Heavy phases write a mid-phase checkpoint to disk before autocompact can fire (design.md Step 9a). If autocompact fires before that checkpoint runs, preserve:

Current phase and step number
Feature ID and path (.artifacts/features/{ID}-{name}/)
Open decisions not yet captured in any artifact
Acceptance criteria check status (which [x] are marked and which are not)
Path to session dump if already written (.artifacts/.session-dump.md)

Drop:

Raw file contents read during exploration (already on disk)
Intermediate research output already written to .artifacts/research/
Verification output already committed as [x] in tasks.md

Error Handling

No .artifacts/: Create it (features/ and research/ are created on demand)
Spec not found: List available features
Open questions blocking architecture: Resolve before planning (trigger discuss)
Design not found: Suggest design before tasks (or skip if Medium scope)
Tasks not found: Suggest tasks before implement (or skip if Medium scope)
Scope misjudged: Safety valve catches it -- redirect to appropriate phase

spec-driven

Spec-Driven Development

Workflow

Context Loading Strategy

Artifact Structure Authority

Triggers

Feature-Level (auto-sized)

Guidelines

Cross-References

Auto-Sizing

Project Structure

Templates

Knowledge Verification Chain

Guidelines

Phase Transitions

Compact Instructions

Error Handling

More from adeonir/agent-skills

git-helpers

docs-writer

design-builder

debug-tools

project-index

session-notes