tdd

Installation
SKILL.md

Code Forge — TDD

Test-Driven Development enforcement for any code change, with built-in code analysis.

When to Use

  • Writing code outside of code-forge:impl workflow (ad-hoc changes, quick fixes)
  • Adding tests to existing code that lacks coverage
  • Implementing test cases from a spec-forge:test-cases document
  • Any new feature, bug fix, or behavior change that needs test discipline

Note: code-forge:impl already enforces TDD internally. This skill is for work outside that workflow.

Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST.

No exceptions. Not for "simple" changes. Not for "obvious" fixes. Not when under time pressure.

Step 0: Determine Mode

Examine the arguments to determine the operating mode:

Argument Mode Behavior
@docs/.../test-cases.md Driven Mode Read test cases document, implement each case via TDD
@src/services/payment.ts or specific code path Auto-Analysis Mode Analyze specified code, design cases, implement via TDD
Feature name or description (e.g., "add validation to user signup") Standalone Mode Classic TDD — write tests for the described change
Empty (no arguments) Auto-Analysis Mode Scan project for coverage gaps, design cases, implement

Driven Mode — Implementing from Test Cases Document

When a test-cases.md file is provided (generated by spec-forge:test-cases):

D.1 Read and Parse

  1. Read the test-cases document
  2. Extract all test cases (TC-MODULE-NNN entries)
  3. Identify which are already implemented (check existing test files for matching test names/IDs)
  4. Filter to unimplemented cases
  5. Sort by priority: P0 first, then P1, then P2

D.2 Confirm Scope

Present to user:

  • "{N} test cases found, {X} already implemented, {Y} remaining"
  • "Implement: (A) all remaining, (B) P0 only, (C) P0 + P1, (D) specific modules?"

D.3 Implement Loop

For each test case in scope:

  1. Read the case — extract preconditions, steps, expected result, not-expected, test infra
  2. Set up test infrastructure — if Test Infra is "Real DB", configure TestContainers or test database; if "Mock external", set up mock for the specified third-party service; if "Temp dir", create temp directory; if "N/A", no special setup needed
  3. RED — Write a failing test that matches the case specification
    • Test name should include TC ID: test("TC-AUTH-001: create user with valid email returns 201", ...)
    • Preconditions become test setup (seed data, auth context, config)
    • Steps become test actions
    • Expected result becomes assertions
    • "Not Expected" becomes negative assertions where applicable
  4. VERIFY RED — Run the test, confirm it fails correctly
  5. GREEN — Write minimal production code to make it pass (if the code already exists and passes, the case was already covered — note and move on)
  6. VERIFY GREEN — Run all tests, confirm clean pass
  7. REFACTOR — Clean up if needed
  8. Report — "TC-AUTH-001: DONE (test passes, implementation complete)"

D.4 Progress Tracking

After each case, display progress:

TDD Progress: {completed}/{total} ({percentage}%)
  [x] TC-AUTH-001: Create user with valid email (P0) — DONE
  [x] TC-AUTH-010: Create user with duplicate email rejected (P0) — DONE
  [ ] TC-AUTH-011: Create user with invalid email format (P1) — next
  [ ] TC-AUTH-030: Create user should NOT bypass email validation (P1)

Ask: "Continue with next case, skip, or pause?"

D.5 Completion

After all cases are implemented:

  • Run full test suite
  • Report: total cases implemented, all tests passing, coverage change
  • Suggest: "Run /code-forge:verify to confirm completion"

Auto-Analysis Mode — Scan and Test

When the user points to code or says "help me write tests" without a test-cases document.

Iron Rule: Auto-Analysis uses the SAME full analysis as spec-forge:test-cases. The only difference is the output — auto-analysis produces code directly instead of a document. The analysis quality must be identical.

A.0 Full Test Case Analysis (same as spec-forge:test-cases Steps 1-5)

Execute the complete spec-forge:test-cases analysis pipeline. The full workflow is defined in the spec-forge test-cases-generation skill (spec-forge/skills/test-cases-generation/SKILL.md). The essential steps are inlined below — follow them exactly:

Step 1 — Determine Input Mode and Project Profile

  • Determine input mode: Scan / Code / Spec (from user arguments)
  • Detect project profile: Web API / CLI Tool / Frontend App / AI Agent / Data Pipeline / Function Library / SDK
  • Detect: has database? has auth? has external APIs?
  • Output explicit profile with rationale

Step 2 — Deep Scan and Extract (Four Layers)

  • Use the language-specific deep extraction strategy (Python / TypeScript / Go / Rust / Java)
  • Extract ALL testable units across four layers:
    • Interface: public API surface, type contracts, trait/interface boundaries
    • Logic: branch paths, error chains, state transitions, validation rules
    • Architecture: module structure, layer boundaries, dependency direction
    • Relationships: call graphs, data flow, event propagation, trait implementations
  • Scan existing tests to determine current coverage
  • Run scan verification (file coverage ≥ 90%, module tree completeness, re-export tracking)
  • Produce structured Functional Inventory with all four layers per unit

Step 3 — Detect Dimensions

  • Apply built-in dimensions: Coverage Depth (L1/L2/L3)
  • Auto-detect project-specific dimensions (Auth Context, Trigger Mode, Input Source, etc.)

Step 4 — Confirm Scope with User

  • Present Profile confirmation: "I detected this as {profile} ({rationale}). Correct?"
  • Present scope: "{N} testable units, {X} have tests, {Y} don't. Cover: all / uncovered / specific modules?"
  • Present detected dimensions for confirmation
  • Ask for business rules the code can't reveal

Step 5 — Design Test Cases

  • Per testable unit, generate at minimum:
    • 1 × L1 (Happy Path)
    • 2 × L2 (Boundary + Error)
    • 1 × L3 (Negative — what should NOT happen)
  • For interacting units: pairwise combination cases (L1 both succeed + L2 one fails + L3 should not combine)
  • For auto-detected dimensions: cross with coverage depth using risk-based prioritization
  • Apply conditional sections:
    • Data Integrity cases (only if project has database)
    • Security cases (only if project has auth or handles user input)
    • Performance cases (only if project has latency/throughput requirements)
  • Assign priorities: P0 (critical path) / P1 (important) / P2 (nice-to-have)
  • Build coverage matrix internally: unit × depth, dimension coverage, combination coverage, gap analysis

Result: A complete set of structured test cases in memory — identical quality to what spec-forge:test-cases would produce as a document.

A.1 Optional: Save Test Cases Document

Ask the user: "Save the test cases as docs/{feature}/test-cases.md for future reference? (Y/n)"

  • If yes → write the document following the spec-forge:test-cases template, then continue to A.2
  • If no → keep in memory, continue to A.2

A.2 Implement via TDD

For each test case (sorted by priority: P0 → P1 → P2), follow the same TDD cycle as Driven Mode:

  1. Read the case — extract preconditions, steps, expected result, not-expected, test infra
  2. Set up test infrastructure — if Test Infra is "Real DB", configure TestContainers; if "Mock external", set up mock; if "Temp dir", create temp directory; if "N/A", no setup
  3. RED — Write a failing test matching the case specification
    • Test name should include TC ID: test("TC-AUTH-001: create user with valid email returns 201", ...)
    • Preconditions → test setup; Steps → test actions; Expected result → assertions; Not Expected → negative assertions
  4. VERIFY RED — Run the test, confirm it fails correctly
  5. GREEN — Write minimal production code to make it pass
  6. VERIFY GREEN — Run all tests, confirm clean pass
  7. REFACTOR — Clean up if needed
  8. Report — "TC-AUTH-001: DONE"

A.3 Progress Tracking

After each case, display progress (same format as Driven Mode D.4):

TDD Progress: {completed}/{total} ({percentage}%)
  [x] TC-AUTH-001: Create user with valid email (P0) — DONE
  [x] TC-AUTH-010: Duplicate email rejected (P0) — DONE
  [ ] TC-AUTH-011: Invalid email format (P1) — next

Ask: "Continue with next case, skip, or pause?"

A.4 Completion

After all cases are implemented:

  • Run full test suite
  • Report: total cases implemented, all tests passing, coverage statistics
  • If test cases were saved to file (A.1), report the file path
  • Suggest: "Run /code-forge:verify to confirm completion"

Standalone Mode — Classic TDD

For ad-hoc changes where the user describes what to build or fix:

Workflow

RED (write failing test) → VERIFY RED → GREEN (minimal code) → VERIFY GREEN → REFACTOR → REPEAT

The Cycle

Complete each phase fully before moving to the next.

1. RED — Write a Failing Test

  • One minimal test showing the desired behavior
  • Clear, descriptive test name
  • Use real code, not mocks (unless unavoidable: external APIs, time-dependent behavior)
  • One behavior per test

2. VERIFY RED — Watch It Fail (MANDATORY)

Run the test. Confirm:

  • It fails (not errors)
  • The failure message describes the missing behavior
  • It fails because the feature is missing, not because of typos or setup issues

If the test passes: you're testing existing behavior. Rewrite the test. If the test errors: fix the error, re-run until it fails correctly.

3. GREEN — Write Minimal Code

  • Simplest code that makes the test pass
  • No extra features, no "while I'm here" improvements
  • No premature abstractions — three similar lines beats a premature helper

4. VERIFY GREEN — Watch It Pass (MANDATORY)

Run the test. Confirm:

  • The new test passes
  • All other tests still pass
  • Output is clean (no warnings, no errors)

If the new test fails: fix the code, not the test. If other tests fail: fix them now, before proceeding.

5. REFACTOR — Clean Up (After Green Only)

  • Remove duplication, improve names, extract helpers
  • Keep all tests green throughout
  • Do NOT add new behavior during refactor

6. REPEAT

Go back to Step 1 for the next behavior.

Decision Rules

If you're about to... Instead... Why
Write production code without a test STOP — write the failing test first Tests written after implementation pass immediately and prove nothing
Skip testing because the change is "simple" Write the test — it will be quick if it's truly simple Simple code has the sneakiest bugs (off-by-one, null edge cases)
Apply a quick fix without a regression test Write the test, then fix Untested fixes become permanent regressions
Continue with code that wasn't test-driven Consider rewriting test-first Sunk cost — untested code is a liability regardless of time spent

External Dependency Rules

Principle: test your own dependencies for real; only mock what you don't control.

Your Dependency Approach
Own database Real DB (TestContainers, test instance, SQLite in-memory)
Own file system Real temp directory
Own cache / message queue Real (TestContainers, embedded)
External third-party API Mock / stub acceptable
Non-deterministic input (time, random) Inject controlled values
  • For projects without a database or external I/O: most tests are pure unit tests — no special infra needed
  • For write operations: verify state after the operation (DB query / file check / store assertion)

Example

Task: Add isPalindrome(str) function

1. RED — Write test:
   test("isPalindrome returns true for 'racecar'", () => {
     expect(isPalindrome("racecar")).toBe(true);
   });

2. VERIFY RED — Run: npm test
   ✗ ReferenceError: isPalindrome is not defined    ← fails correctly

3. GREEN — Minimal code:
   function isPalindrome(str) {
     return str === str.split("").reverse().join("");
   }

4. VERIFY GREEN — Run: npm test
   ✓ isPalindrome returns true for 'racecar'        ← passes
   42 passed, 0 failed

5. REFACTOR — (no changes needed)

6. REPEAT — next test: edge case with empty string

Test runner detection: Check package.json scripts, pytest.ini, Cargo.toml, go.mod, or Makefile for the project's test command before starting the cycle. Use the same runner consistently.

Verification Checklist

Before claiming work is complete:

  • Every new function/method has at least one test
  • Watched each test fail before implementing
  • Each test failed for the expected reason (not errors)
  • Wrote minimal code per test (no gold-plating)
  • All tests pass with clean output
  • Edge cases and error paths covered
  • Mocks used only when unavoidable
  • Database-touching tests use real database

When Stuck

  • Test too complicated to write → design is too complicated, simplify first
  • Must mock everything → code is too coupled, extract interfaces
  • Test setup is huge → extract test helpers or fixtures
  • No test-cases document and unsure what to test → run /spec-forge:test-cases first to generate a structured case set
Related skills
Installs
9
GitHub Stars
3
First Seen
Mar 1, 2026