tdd
Pragmatic Test-Driven Development
Implement features using vertical tracer bullets: one test → one implementation → repeat. Tests describe WHAT the system does, not HOW it does it.
Philosophy
Good tests exercise real code paths through public interfaces. They read like specifications: "visitor is bucketed consistently across sessions" tells you exactly what capability exists. These tests survive refactors because they don't care about internals.
Bad tests mock internal collaborators, test private methods, or verify implementation details. Warning sign: test breaks when you refactor, but behavior hasn't changed.
Pragmatic mode (default): test-first for core logic and boundary contracts. Implementation-first for UI and glue code, but still write tests for complex state management. Always write tests — the question is only whether they come before or after the code.
Process
1. Discover project context
Read domain documentation relevant to the work:
docs/contexts/<name>/CONTEXT.md— invariants and contracts to verifydocs/UBIQUITOUS_LANGUAGE.md— use domain terms in test names
Also discover the project's test infrastructure:
- Find existing test files to understand patterns and conventions
- Identify the test runner and configuration (package.json scripts, vitest/jest config)
- Find test utilities, factories, or helpers already in use
2. Plan the tests
Before writing any code:
- Identify behaviors to test — not implementation steps. Frame as: "A [actor] can [action] resulting in [observable outcome]"
- Classify each behavior:
- Core logic / domain rules → test-first (RED → GREEN)
- Context boundary contracts → test-first (RED → GREEN)
- API endpoints with non-trivial logic → test-first
- UI components / glue code → implement first, test after if complex
- Trivial wiring → skip tests
- Identify what to mock — ONLY at system boundaries:
- External APIs (Stripe, OpenAI, etc.) → mock
- Databases → prefer test database or in-memory substitute; mock only if no alternative
- Time, randomness → mock
- Your own modules → NEVER mock. Test through the real code.
- List behaviors in priority order
Present the plan to the user. Confirm which behaviors to test and the testing approach.
3. Tracer bullet
Write ONE test that confirms ONE thing about the system — the simplest possible end-to-end path:
RED: Write test → test fails (confirms test infrastructure works)
GREEN: Write minimal code to pass → test passes
This proves the path works. Don't optimize, don't handle edge cases yet.
4. Incremental loop
For each remaining behavior:
RED: Write next test → fails
GREEN: Minimal code to pass → passes
Rules:
- One test at a time — don't write the next test until the current one passes
- Only enough code to pass the current test — don't anticipate future tests
- Keep tests focused on observable behavior through public interfaces
- Run all tests after each GREEN to catch regressions
5. Refactor
After a group of related tests pass, look for refactoring opportunities:
- Extract duplication (in code, not in tests — test duplication is OK)
- Deepen modules (move complexity behind simpler interfaces)
- Apply SOLID principles where they emerge naturally
- Consider what the new code reveals about existing code
Never refactor while RED. Get to GREEN first, then refactor.
Run all tests after each refactor step.
6. Update domain documentation
If the implementation revealed new invariants, clarified contracts, or surfaced terminology changes:
- Update
docs/contexts/<name>/CONTEXT.md - Update
docs/UBIQUITOUS_LANGUAGE.md - Consider an ADR if a hard-to-reverse, surprising trade-off was made
Checklist per cycle
[ ] Test name describes behavior in domain language (not implementation)
[ ] Test uses public interface only
[ ] Test would survive internal refactor
[ ] Test mocks only at system boundaries (external APIs, DB, time)
[ ] Code is minimal for this test — no speculative features
[ ] All existing tests still pass
When to mock
| Dependency | Approach |
|---|---|
| External API (Stripe, OpenAI) | Mock — you don't control it |
| Database | Prefer test DB or in-memory substitute. Mock as last resort. |
| Time / randomness | Mock — tests must be deterministic |
| File system | Usually mock. Use temp dirs if testing file I/O specifically. |
| Your own modules | NEVER mock. Test through the real code path. |
| Internal collaborators | NEVER mock. If you need to mock your own code, the interface needs redesigning. |
Anti-patterns
- Horizontal slicing: writing all tests first, then all implementation. Produces tests that test imagined behavior, not actual behavior.
- Testing implementation: verifying internal state, mocking internal modules, testing private methods.
- Over-testing: testing every getter, every trivial function. Focus on behaviors that matter.
- Test-after-forget: writing code without tests and "planning to add them later." In pragmatic mode, UI can be implementation-first, but tests should follow immediately.
Related skills
/slice— break work into slices before implementing with TDD/spec— create the spec that defines what to build/domain— stress-test the design against the domain model
More from agentivestack/skills
architect
Surface architectural friction in the codebase and design better module interfaces. Uses parallel sub-agents to explore radically different interface designs. Based on "deep modules" philosophy — small interfaces hiding significant complexity. Use when code feels tangled, hard to test, or hard to navigate.
12slice
Break a spec, plan, or feature into independently-implementable vertical slices (tracer bullets). Each slice cuts through all layers end-to-end. Outputs a task list or GitHub issues. Use after /spec to plan implementation order.
11qa
Interactive QA session where you describe bugs conversationally and the agent files durable GitHub issues using the project's domain language. Explores the codebase in the background for context. Use when reporting bugs, doing QA testing, or filing issues from observations.
11spec
Interview the user to produce a feature spec (PRD), grounded in the project's domain model, context map, and ubiquitous language. Outputs a structured spec document. Use when starting a new feature, enhancement, or significant change.
11