tdd
TDD Enforcement Skill
Priorities
Correctness > Test Coverage > Implementation Speed
Goal
Enforce Test-Driven Development as a process, not just a presence check. Agents left unconstrained write all tests first, then all code (horizontal slicing). This produces tests coupled to implementation that break on refactor. The 4-phase workflow below enforces vertical TDD cycles where each test drives one slice of implementation.
Configuration
Read tdd: from CLAUDE.md:
tdd: strict # strict | soft | off
Check both CLAUDE.md and .claude/CLAUDE.md. Default: off.
Modes
Strict (tdd: strict)
Full 4-phase workflow with blocking gates. Planning phase requires user confirmation via AskUserQuestion before coding. Each phase has explicit entry/exit criteria.
Escape when: Markdown-only changes, config changes, or when mocking exceeds the change's complexity. Present AskUserQuestion with: (1) Write test first, (2) Prototype escape with justification. Log escapes.
Soft (tdd: soft)
Full 4-phase workflow guidance but no blocking gates. Planning phase generates and presents the plan but proceeds immediately. Warns on deviations (no test, horizontal slicing) but does not stop. Summarize untested items after completion.
Off (tdd: off)
No TDD checks. Standard implementation flow.
Anti-Pattern: Horizontal Slicing
WRONG (horizontal): RIGHT (vertical):
┌──────────────────────┐ ┌──────────────────────┐
│ Write ALL tests │ │ Test 1 → Impl 1 │
│ test1, test2, test3 │ │ ✓ GREEN │
├──────────────────────┤ ├──────────────────────┤
│ Write ALL code │ │ Test 2 → Impl 2 │
│ impl1, impl2, impl3 │ │ ✓ GREEN │
├──────────────────────┤ ├──────────────────────┤
│ Hope they match │ │ Test 3 → Impl 3 │
│ ✗ Tests are brittle │ │ ✓ GREEN │
└──────────────────────┘ └──────────────────────┘
Horizontal slicing fails because tests written without implementation are based on imagined APIs. They couple to guessed method signatures, mock internal modules, and break on any structural change.
4-Phase Workflow
Phase 1: Planning
Entry: Task received with TDD mode strict or soft.
- Identify the public interface changes needed
- List behaviors to test, ordered by priority (core path first, edge cases later)
- Strict mode: Present interface and behavior list via AskUserQuestion for user confirmation before proceeding
- Soft mode: Present the plan, proceed immediately
Load reference: Glob("**/tdd/references/interface-design.md", path: "~/.claude/plugins")
Exit: Confirmed behavior list. Proceed to Tracer Bullet.
Non-interactive: If no user response within the current turn, proceed with best judgment and log skipped confirmation.
Phase 2: Tracer Bullet
Prove one end-to-end path using Red-Green-Refactor:
- Write ONE test for the highest-priority behavior
- Verify it FAILS (RED) — a passing test before implementation is a false positive
- Write minimal implementation to make it pass (GREEN)
- Run per-cycle checklist (below)
- Strict mode: Pause after tracer bullet passes. Present results via AskUserQuestion to confirm before incremental loop.
Load references: Glob("**/tdd/references/mocking.md", path: "~/.claude/plugins"), Glob("**/tdd/references/test-quality.md", path: "~/.claude/plugins")
Exit: One test passing, end-to-end path proven.
Phase 3: Incremental Loop
For each remaining behavior from the plan:
- Write ONE test (RED)
- Write minimal code to pass (GREEN)
- Run per-cycle checklist
- Repeat
Do NOT write the next test until the current cycle is GREEN and the checklist passes.
Phase 4: Refactor
Only enter when ALL tests are GREEN.
- Look for refactoring candidates (duplication, long methods, shallow modules)
- Make ONE structural change at a time
- Run tests after each change — must stay GREEN
- If tests break, revert the refactor
Load reference: Glob("**/tdd/references/refactoring.md", path: "~/.claude/plugins")
Exit: Clean code, all tests GREEN. Commit test and implementation together.
Per-Cycle Checklist
After every RED-GREEN pair, verify:
- Behavior, not implementation: Test describes WHAT the system does, not HOW
- Public interface only: Test uses the same API as production callers
- Survives refactor: Would this test break if internals changed but behavior stayed the same?
- Minimal code: Implementation is the simplest thing that passes — no speculative features
- No horizontal drift: Did you write only ONE test before implementing? If you wrote multiple, STOP and revert to one.
- No type theater: Does this test verify something the type system doesn't already guarantee? If the only way it could fail is a type error, delete it.
Pre-Existing RED State
If existing tests are already failing when you start: note them and proceed with new behavior only. Do not attempt to fix pre-existing failures unless that is the task.
Test Discovery
Search patterns: __tests__/[filename].test.ts, [filename].test.ts, [filename].spec.ts, test/[filename].test.ts, tests/[filename].test.ts.
Arguments
$ARGUMENTS
More from iamladi/cautious-computing-machine--sdlc-plugin
codex
Use when the user asks to run Codex CLI (codex exec, codex resume) or references OpenAI Codex for code analysis, refactoring, or automated editing. Resolves the latest flagship model from the model registry.
9gemini
Use when the user asks to run Gemini CLI for code review, plan review, or big context (>200k) processing. Ideal for comprehensive analysis requiring large context windows. Resolves the latest flagship model from the model registry.
7interview
Interview me about anything in depth
7x-search
Search X/Twitter for real-time developer discourse, product feedback, community sentiment, and expert opinions. Use when user says "x search", "search x for", "search twitter for", "what are people saying about", or needs recent X discourse for context (library releases, API changes, product launches, industry discussion). Also use when researching a library, framework, API, or product to supplement web search with real-time community signal — e.g. "research Bun", "what do devs think of Hono", "is Turso production-ready".
1judgment-eval
Evaluates agent judgment quality through scenario-based testing in-conversation. Use when the user wants to test, validate, or stress-test an agent, skill, or command definition — e.g. "test this agent", "evaluate this skill", "does this prompt handle edge cases", "check this agent's judgment", or after writing or modifying any agent/skill/command .md file.
1update-models
Re-resolve the model registry by querying OpenAI Codex cache, Google AI API, and Oracle CLI. Use when models feel stale or after a major model release.
1