Test Doubles — Classify, Design & Create

Based on Martin Fowler's "Mocks Aren't Stubs" and Gerard Meszaros's xUnit Patterns taxonomy. Language-agnostic — works with any programming language and testing framework.

Test doubles replace real collaborators in tests. There are five distinct types, and choosing the wrong one is the most common cause of brittle, hard-to-maintain test suites. This guide helps you pick the right type, build it idiomatically, and avoid common traps.

Step 1: Understand Context

Before recommending or writing anything, establish these facts:

What are you testing? — the system under test (SUT)
What dependency needs replacing? — the collaborator
What language and test framework? — for idiomatic code
Why does this need a double? — slow, nondeterministic, not yet built, hard to trigger edge cases
What matters to verify? — resulting state, or that specific interactions occurred

Routing:

User has existing test double code to classify → skip to Step 3
User wants to refactor existing tests → skip to Step 5
User needs a new test double → continue to Step 2

Step 2: Select the Right Test Double

Walk through this decision tree to pick the right type:

Does the test need the dependency to DO anything?
│
├─ NO → DUMMY
│       The dependency fills a constructor/method signature
│       but the test never actually calls it.
│
└─ YES
   │
   Does the test need a WORKING implementation (not just canned answers)?
   │
   ├─ YES → FAKE
   │        e.g., in-memory database, local filesystem, fake HTTP server.
   │        Has real business logic, just with shortcuts for speed/simplicity.
   │
   └─ NO
      │
      What does the test need to VERIFY?
      │
      ├─ RESULTING STATE (what happened) → STUB
      │   Returns canned responses so the SUT can run.
      │   The test asserts on the SUT's output/state — not on the stub.
      │
      └─ INTERACTIONS (what was called)
         │
         Do you need verification to happen AUTOMATICALLY
         (fail if unexpected call / missing expected call)?
         │
         ├─ YES → MOCK
         │        Pre-programmed with expectations.
         │        Verification happens on the mock object itself.
         │
         └─ NO → SPY
                  Records calls for later inspection.
                  You assert on the spy's recorded data in the test.

Quick Reference

Type	Has logic?	Returns data?	Records calls?	Auto-verifies?	Use when...
Dummy	No	No	No	No	Filling signatures — the dep is never touched
Fake	Yes	Yes (real)	No	No	Need working behavior, real thing too slow/complex
Stub	No	Yes (canned)	No	No	SUT needs specific input, you verify SUT state
Spy	No	Yes (canned)	Yes	No	Need to check what was sent to a collaborator
Mock	No	Yes (canned)	Yes	Yes	Need strict interaction verification up front

State vs Behavior Verification

This is the fundamental fork:

State verification — run the SUT, then check its output or final state. Stubs and Fakes support this. The test doesn't care HOW the SUT achieved the result, only WHAT the result is. This produces tests that survive refactoring because they're decoupled from internal implementation.
Behavior verification — check that the SUT called specific methods with specific arguments. Mocks and Spies support this. The test is coupled to how the SUT works internally, which means changes to implementation can break tests even when behavior is unchanged.

Default to state verification. It produces more resilient tests. Use behavior verification only when the interaction IS the important thing — sending an email, publishing a message, writing an audit log, calling an external API. In those cases, the side effect is the behavior you care about, and there's no resulting state to check on the SUT.

Step 3: Classify Existing Test Doubles

When a user shows existing code and asks what kind of test double it is, classify by usage pattern, not by what the framework calls it or what the class is named:

Has real business logic (even simplified)? → Fake
Passed around but never called in the test? → Dummy
Returns canned data AND the test asserts on recorded calls? → Spy
Returns canned data AND has pre-programmed expectations that auto-verify? → Mock
Returns canned data AND the test only asserts on SUT state? → Stub

Most testing frameworks blur these lines. A Jest jest.fn() can act as a stub, spy, or mock depending on how you use it. A Mockito mock() is technically a stub until you call verify(). Classify by usage, not by the framework's terminology.

After classifying, explain:

What the double actually is vs what it's named/called
Whether that type is appropriate for what the test is trying to verify
If misclassified or misused, suggest the correct approach

Step 4: Generate the Test Double

When writing test double code:

Use idiomatic patterns for the user's language and framework
Prefer hand-written doubles for Dummies, Fakes, and simple Stubs — they're easier to understand and maintain than framework-generated ones
Use framework doubles (Mockito, unittest.mock, etc.) for Spies and Mocks where the framework makes recording/verifying cleaner
Show the complete test that uses the double, not just the double in isolation — the test is where the verification strategy becomes clear
Add a brief comment explaining why this type was chosen

After generating, explain:

Why this type of test double is the right choice here
What verification strategy the test uses (state vs behavior)
What would change if a different type were used (and why that would be worse)

For detailed multi-language examples and framework-specific patterns, read references/taxonomy-guide.md.

Step 5: Refactor Existing Tests

When improving existing test code, first identify whether the codebase follows the classical (Detroit) or mockist (London) TDD style — this determines what "correct" looks like. If tests mock every collaborator and verify all interactions, that's the mockist style. If they use real objects and only double awkward dependencies, that's classical. Name the style explicitly in your diagnosis so the user understands the tradeoff their codebase is making. When the user's problem is brittleness from over-mocking, explain that shifting toward classical style (state verification, real objects where practical) is the remedy — and frame it as a spectrum, not an all-or-nothing switch.

Diagnose these common anti-patterns:

Anti-Pattern 1: Mock Overuse (the most common problem)

Symptom: Tests break when you refactor internals, even though external behavior is unchanged.

Why it happens: Using mocks (behavior verification) where stubs (state verification) would suffice. Every expect(mock).toHaveBeenCalledWith(...) is a coupling point to implementation.

Fix: Replace mocks with stubs. Assert on the SUT's outputs rather than on which methods it called internally. Keep behavior verification only for genuine side effects (notifications, events, external API calls).

Anti-Pattern 2: Testing the Mock

Symptom: Test passes but doesn't actually verify anything useful about the SUT.

Why it happens: The mock's canned response is what gets asserted — you're verifying your own setup, not the SUT's logic.

Fix: Assertions should target the SUT's behavior. If the test would pass with any canned response, it's not testing anything real.

Anti-Pattern 3: Missing Fakes for Complex Collaborators

Symptom: Dozens of stubs with increasingly elaborate canned responses that are hard to maintain.

Why it happens: The collaborator has complex stateful behavior that can't be realistically represented by a few canned return values.

Fix: Write a Fake with simplified but real logic (e.g., InMemoryUserRepository backed by a dictionary/map). Fakes need their own tests, but they pay for themselves when many tests share them.

Anti-Pattern 4: Dummy Disguised as Stub

Symptom: A stub with canned responses, but the test never exercises the code path that uses them.

Why it happens: Someone set up a stub "just in case" when a null/empty dummy would suffice.

Fix: Replace with a Dummy. Simpler setup, clearer test intent.

Anti-Pattern 5: Spy on Everything

Symptom: Every test records and asserts on every single interaction.

Why it happens: Defaulting to spies/mocks out of habit rather than asking "what does this test actually need to verify?"

Fix: For each assertion, ask: "does this test care about the interaction, or the result?" If the result, use a stub and assert on state.

Classical vs Mockist TDD

These are two philosophies about when and how to use test doubles. Understanding which style a codebase follows helps you make consistent recommendations.

Factor	Classical (Detroit school)	Mockist (London school)
Default double	Real objects	Mocks for all collaborators
When to use doubles	Only for awkward collaborators	Always
Verification	State	Behavior
Test isolation	SUT + real collaborators	SUT only
Fixture setup	Can be complex (Object Mother pattern)	Always simple (just mocks)
Refactoring resilience	High — tests survive internal changes	Lower — coupled to interactions
Design feedback	Less direct	Guides toward small, focused interfaces
TDD direction	Middle-out (domain model first)	Outside-in (UI layer inward)

Neither is universally better. Classical is the safer default for teams without a strong established preference — it produces tests that are more resilient to refactoring. Mockist shines during outside-in TDD where you're designing collaborator interfaces top-down and want the tests to drive interface discovery.

When refactoring an existing test suite, match the team's existing style unless they're explicitly asking to switch approaches.

Reference

For detailed per-type definitions with multi-language code examples (Python, TypeScript, Java, Go, Ruby, Swift, C#), framework-specific mapping tables, and extended anti-pattern analysis, read references/taxonomy-guide.md.

test-doubles