tdd
TDD
Purpose
Use strict red-green-refactor in small, behavior-focused increments. Write a failing test before writing the code that makes it pass.
When To Use
- Implementing new behaviour.
- Modifying existing behaviour.
- Fixing a bug (reproduce the bug with a test before fixing).
- Anything that could potentially alter publicly observable behaviour.
Do not use for purely declarative changes (configuration, styling, documentation, or static content).
Interaction Contract
- Align with the user before the first test in a sequence.
- Check in at decision boundaries: interface changes, new behavior branch, test strategy shifts, or non-trivial refactors.
- If the next micro-step is obvious and low risk, continue without pausing, then report progress after up to 3 red-green-refactor loops.
Non-Negotiable Rules
- Write one failing test for one observable effect.
- Write only the implementation needed to pass that test.
- Refactor only when all tests are green.
- Repeat.
Workflow
Before Each Test
- Confirm the external behavior and interface change.
- Confirm the single effect this test will verify.
- Identify opportunities for deep modules with small surface area.
- Reduce public methods.
- Simplify parameters.
- Hide complexity internally.
- Confirm with the user when crossing a decision boundary; otherwise proceed and report.
Red-Green-Refactor Loop
- Red: Add one test for one effect; verify it fails for the right reason.
- Green: Implement only what is needed to pass that test.
- Verify green: Run relevant tests and confirm they pass.
- Refactor (green only):
- Remove duplication.
- Inline unnecessary indirection.
- Keep interfaces small and implementations deep.
- Improve naming and structure without changing behavior.
- Run tests after every refactor change.
Good vs Bad Tests
Guiding principle:
The more your tests resemble the way your software is used, the more confidence they can give you.
What to Do
- Test behavior users and callers care about.
- Use public interfaces.
- Write tests that survive internal refactors.
- Describe the outcome, not how the outcome is achieved.
- Separate test code into Arrange-Act-Assert blocks.
- Write one test per behavior.
- Use real internal collaborators whenever possible.
- Write tests that are descriptive and self-contained — duplication is tolerable if it makes the test easier to understand.
- See the new test fail — if you never see it fail, you cannot know it's testing the right thing.
What to Avoid
- Writing a large batch of tests up front instead of iterating one failing test at a time.
- Writing a test that you never see fail: it's probably worthless.
- Writing tests after the implementation.
- Packing multiple unrelated assertions under the same test
- Implementing behavior not required by the current failing test.
- Testing implementation details instead of publicly observable behavior (e.g. private methods or internal state).
- Mocking internal collaborators.
- Asserting how many times a function is called or the exact call order.
- Writing tests that break after internal refactors that do not change behavior.
- Naming tests according to how instead of what.
- Asserting through side channels instead of the interface under test.
- Relying on snapshot tests: snapshots lack specificity and are easy to update by mistake.
- Skipping or commenting tests to make them pass.
- Changing a failing test expectation to match (incorrect) production output.
- Saying "All tests pass" without having actually run the tests.
Mocking Guidelines
- Mock only at system boundaries. For external HTTP APIs, mock at the outermost layer using boundary tools such as
msw(JavaScript/TypeScript) orresponses(Python). - Prefer real databases when tests can run against them.
- Inject dependencies for non-deterministic behavior (time, randomness); prefer simple dependency injection.
- Mock the file system only when necessary.
- NEVER mock internal collaborators.
- NEVER mock the function under test.
Quick Checklist
- Confirm external behavior and interface change.
- Choose one next effect to test.
- Add one failing test and verify the failure reason.
- Implement the minimum code to pass.
- Run relevant tests and confirm green.
- Refactor on green only.
- Re-run tests after each refactor step.
- Repeat for the next behavior.
More from goblindegook/skills
grill-me
Use when user wants to stress-test an idea or plan, review a design, or says "grill me".
10rca
Use when asked for RCA, 5-Whys, postmortem, causal-tree analysis, or help identifying root causes for incidents, defects, outages, delays, or quality regressions.
10roundtable
Review a project or feature from the point of view of a set of personas. Use this when the user asks for multi-perspective critique — "review from the POV of X, Y, Z", "what would a [role] think of this?", "roundtable review", "get different perspectives on this". Each persona reads the actual source code before commenting, then all personas discuss together to agree on a prioritised top 5 list. If no personas are provided, ask the user and offer suggestions.
4