TDD

Purpose

Use strict red-green-refactor in small, behavior-focused increments. Write a failing test before writing the code that makes it pass.

Do not use for purely declarative changes (configuration, styling, documentation, or static content).

Align with the user before the first test in a sequence.
Check in at decision boundaries: interface changes, new behavior branch, test strategy shifts, or non-trivial refactors.
If the next micro-step is obvious and low risk, continue without pausing, then report progress after up to 3 red-green-refactor loops.

Confirm the external behavior and interface change.
Confirm the single effect this test will verify.
Identify opportunities for deep modules with small surface area.
- Reduce public methods.
- Simplify parameters.
- Hide complexity internally.
Confirm with the user when crossing a decision boundary; otherwise proceed and report.

Red: Add one test for one effect; verify it fails for the right reason.
Green: Implement only what is needed to pass that test.
Verify green: Run relevant tests and confirm they pass.
Refactor (green only):
- Remove duplication.
- Inline unnecessary indirection.
- Keep interfaces small and implementations deep.
- Improve naming and structure without changing behavior.
Run tests after every refactor change.

Guiding principle:

The more your tests resemble the way your software is used, the more confidence they can give you.

Test behavior users and callers care about.
Use public interfaces.
Write tests that survive internal refactors.
Describe the outcome, not how the outcome is achieved.
Separate test code into Arrange-Act-Assert blocks.
Write one test per behavior.
Use real internal collaborators whenever possible.
Write tests that are descriptive and self-contained — duplication is tolerable if it makes the test easier to understand.
See the new test fail — if you never see it fail, you cannot know it's testing the right thing.

Writing a large batch of tests up front instead of iterating one failing test at a time.
Writing a test that you never see fail: it's probably worthless.
Writing tests after the implementation.
Packing multiple unrelated assertions under the same test
Implementing behavior not required by the current failing test.
Testing implementation details instead of publicly observable behavior (e.g. private methods or internal state).
Mocking internal collaborators.
Asserting how many times a function is called or the exact call order.
Writing tests that break after internal refactors that do not change behavior.
Naming tests according to how instead of what.
Asserting through side channels instead of the interface under test.
Relying on snapshot tests: snapshots lack specificity and are easy to update by mistake.
Skipping or commenting tests to make them pass.
Changing a failing test expectation to match (incorrect) production output.
Saying "All tests pass" without having actually run the tests.

Mock only at system boundaries. For external HTTP APIs, mock at the outermost layer using boundary tools such as msw (JavaScript/TypeScript) or responses (Python).
Prefer real databases when tests can run against them.
Inject dependencies for non-deterministic behavior (time, randomness); prefer simple dependency injection.
Mock the file system only when necessary.
NEVER mock internal collaborators.
NEVER mock the function under test.