test-driven-development
Test-Driven Development
Overview
TDD enforces the RED-GREEN-REFACTOR cycle as an unbreakable discipline: write a failing test, make it pass with minimal code, then clean up. This skill prevents untested production code from ever existing and ensures every line of implementation is driven by a verified requirement.
Announce at start: "I'm using the test-driven-development skill with the RED-GREEN-REFACTOR cycle."
Iron Law
┌─────────────────────────────────────────────────────────────────┐
│ HARD-GATE: NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST │
│ │
│ This is non-negotiable. There are no exceptions. If you are │
│ writing production code and there is no failing test demanding │
│ that code, you are violating this skill. STOP immediately │
│ and write the test first. │
└─────────────────────────────────────────────────────────────────┘
Phase 1: RED (Write a Failing Test)
Goal: Write exactly ONE test that fails for the right reason.
Actions
- Identify the smallest unit of behavior to implement next
- Write a test that asserts that behavior exists
- Run the test suite — confirm the new test FAILS
- Read the failure message — confirm it fails for the RIGHT reason (missing functionality, not syntax error or import error)
- If it fails for the wrong reason, fix the test until it fails correctly
STOP — HARD-GATE: Do NOT proceed to GREEN until:
- Test is written and saved
- Test suite has been run
- New test fails
- Failure reason is correct (tests the intended behavior)
Phase 2: GREEN (Make It Pass)
Goal: Write the MINIMUM production code to make the failing test pass.
Actions
- Write only enough code to make the failing test pass
- Do NOT refactor. Do NOT clean up. Do NOT optimize
- Hardcode values if that makes the test pass — that is fine
- Run the full test suite
- ALL tests must pass (not just the new one)
STOP — HARD-GATE: Do NOT proceed to REFACTOR until:
- Production code is written
- Full test suite has been run
- ALL tests pass (new and existing)
- No more code was written than necessary
Phase 3: REFACTOR (Clean Up)
Goal: Improve code quality without changing behavior.
Actions
- Look for duplication, poor naming, long methods, code smells
- Make ONE refactoring change at a time
- Run the full test suite after EACH change
- If any test fails, undo the refactoring immediately
- Continue until the code is clean
STOP — HARD-GATE: Do NOT proceed to next RED until:
- Code is clean and readable
- All tests still pass after refactoring
- No behavior was changed during refactoring
HARD-GATE Enforcement
┌─────────────────────────────────────────────────────────────┐
│ HARD-GATE: PHASE COMPLETION CHECK │
│ │
│ Before moving to next phase, ALL items in the │
│ STOP MARKER checklist must be satisfied. │
│ │
│ If ANY item is not satisfied: │
│ → STOP │
│ → Complete the missing item │
│ → Re-verify ALL items │
│ → ONLY THEN proceed │
└─────────────────────────────────────────────────────────────┘
Watch Mode Discipline
After every change to any file (test or production), run the relevant test suite. No exceptions.
| Action | Run Tests? | Expected Result |
|---|---|---|
| Write a test | Yes | Failure (RED) |
| Write production code | Yes | Pass (GREEN) |
| Refactor code | Yes | Pass (still GREEN) |
| Any other edit | Yes | No regressions |
If your test runner supports watch mode, use it. If not, run tests manually after every save.
Decision Table: Test Type Selection
| Behavior Being Tested | Test Type | Framework Example |
|---|---|---|
| Pure function logic | Unit test | Vitest, pytest, cargo test |
| API endpoint request/response | Integration test | Supertest, httpx |
| Database query correctness | Integration test | Testcontainers |
| UI component rendering | Unit test | React Testing Library |
| Full user workflow | E2E test | Playwright |
| Error handling path | Unit test | Vitest, pytest |
Example Cycle
Requirement: "Users can register with email and password"
Behavior List:
1. Registration with valid email and password succeeds
2. Registration fails if email is empty
3. Registration fails if password is too short
4. Registration fails if email is already taken
Cycle 1 - Behavior 1:
RED: test_registration_with_valid_email_and_password_succeeds → FAIL (no register function)
GREEN: def register(email, password): return User(email=email) → PASS
REFACTOR: rename variable for clarity → PASS
Cycle 2 - Behavior 2:
RED: test_registration_fails_if_email_is_empty → FAIL (no validation)
GREEN: add if not email: raise ValueError → PASS
REFACTOR: extract validation to separate method → PASS
...continue for each behavior...
Checklist: Starting a New Feature with TDD
- Understand the requirement fully before writing any code
- Break the requirement into a list of specific behaviors
- Order behaviors from simplest to most complex
- Create a task for the first behavior
- Enter RED phase: write failing test for first behavior
- Enter GREEN phase: write minimal code to pass
- Enter REFACTOR phase: clean up
- Create task for next behavior, repeat from step 5
- After all behaviors are implemented, run full test suite
- Invoke
verification-before-completionbefore claiming done
Test Quality Standards
Each test must be:
| Standard | Definition |
|---|---|
| Fast | Milliseconds, not seconds |
| Isolated | No shared state between tests, no test ordering dependencies |
| Repeatable | Same result every time, no flakiness |
| Self-validating | Pass or fail, no manual interpretation needed |
| Timely | Written before the production code (that is the whole point) |
Each test should:
- Test ONE behavior or scenario
- Have a descriptive name that explains the scenario and expected outcome
- Follow Arrange-Act-Assert (or Given-When-Then) structure
- Use the minimum setup necessary
- Assert outcomes, not implementation details
Anti-Patterns / Common Mistakes
| Anti-Pattern | Why It Is Wrong | Correct Approach |
|---|---|---|
| Writing production code first | Defeats the purpose of TDD; tests shaped to pass | Write the test first, always |
| Writing multiple tests before any code | Batch testing defeats incremental design | One test, one cycle |
| Test passes on first run | Either test is wrong or behavior already exists | Investigate before proceeding |
| Spending >5 minutes in GREEN | Writing too much code at once | Simplify; make test more specific |
| Modifying tests to match code | Tests specify behavior; code must match tests | Fix the code, not the test |
| Skipping REFACTOR phase | Technical debt accumulates rapidly | Refactor every cycle |
| Not running tests after every change | Regressions go unnoticed | Run tests after every save |
Rationalization Prevention
| Excuse | Reality |
|---|---|
| "It's just a small change" | Small changes cause production outages. Test it. |
| "I'll write the tests after" | You will not. And if you do, they will be weaker because they were shaped to pass, not to specify. |
| "This is just a refactor" | Refactors change behavior more often than you think. The test suite proves they do not. |
| "I know this works" | You do not. You think you do. The test proves it. |
| "Tests would slow me down" | Debugging without tests slows you down 10x more. |
| "This code is too simple to test" | If it is too simple to test, it is too simple to get wrong — so the test will be trivial to write. Write it. |
| "I can't test this because of dependencies" | Then your design has a coupling problem. Fix the design. |
| "The test would be harder to write than the code" | That means you do not understand the requirements well enough. The test forces you to clarify. |
| "I'll just manually verify it" | Manual verification is not repeatable, not documented, and not trustworthy. |
| "This is throwaway/prototype code" | Prototype code has a habit of becoming production code. Test it now or regret it later. |
| "The framework makes it hard to test" | Use the framework's testing utilities, or isolate your logic from the framework. |
| "I'm under time pressure" | TDD is faster over any timeline longer than 20 minutes. The pressure is exactly why you need it. |
Red Flags
If you observe any of these, STOP and reassess:
| Red Flag | What It Means | Action |
|---|---|---|
| Writing production code with no failing test | Immediate violation | Stop. Write the test. |
| Test passes immediately on first run | Test is wrong or behavior exists | Investigate before proceeding |
| More than 5 minutes in GREEN phase | Writing too much code | Simplify. Make test more specific. |
| Refactoring changes behavior | Test coverage has a gap | Add missing tests |
| Tests modified to pass | Requirements inverted | Fix code to match tests |
| Multiple tests before any production code | Batch testing defeats purpose | One test at a time |
| Test suite not run after a change | Regressions invisible | Run tests. Always. Every time. |
Integration Points
| Skill | Relationship |
|---|---|
verification-before-completion |
MUST be invoked before claiming any TDD work is complete |
systematic-debugging |
When a test fails unexpectedly during REFACTOR, switch to debugging |
code-review |
After completing a feature via TDD, review the test suite for completeness |
acceptance-testing |
Acceptance criteria drive the behavior list for TDD cycles |
planning |
Plan breaks features into behaviors suitable for TDD cycles |
testing-strategy |
Strategy defines frameworks; TDD defines the cycle |
Test Types in TDD
| Type | Scope | Speed | When to Write |
|---|---|---|---|
| Unit (Primary) | Individual functions, methods, classes | Milliseconds | RED phase for every behavior |
| Integration (Secondary) | Component interactions | Seconds | After unit tests cover individual behaviors |
| E2E (Tertiary) | Complete user workflows | Seconds-minutes | Critical paths after unit and integration are solid |
Skill Type
RIGID — The RED-GREEN-REFACTOR cycle is mandatory and cannot be reordered, skipped, or combined. Every phase has a HARD-GATE that must be satisfied before proceeding. No production code without a failing test first.