test-first

Installation
SKILL.md

Test-First Development

Overview

Write the test first. Observe it fail. Write the minimum code to pass.

Core principle: If you did not watch the test fail, you have no evidence it validates the correct behavior.

No exceptions. No workarounds. No shortcuts.

When to Use

Required:

  • New features
  • Bug fixes
  • Refactoring
  • Behavior modifications

Exceptions (confirm with your human partner):

  • Disposable prototypes
  • Generated code
  • Configuration files

Tempted to think "skip test-first just this once"? Stop. That is rationalization.

The Prime Directive

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

Wrote code before the test? Delete it. Start over.

Zero tolerance:

  • Do not keep it as "reference material"
  • Do not "adjust" it while writing tests
  • Do not glance at it
  • Delete means delete

Build fresh from tests. Full stop.

Red-Green-Refactor

digraph tdd_loop {
    rankdir=LR;
    red [label="RED\nWrite failing test", shape=box, style=filled, fillcolor="#ffcccc"];
    check_red [label="Fails\ncorrectly?", shape=diamond];
    green [label="GREEN\nMinimal code", shape=box, style=filled, fillcolor="#ccffcc"];
    check_green [label="All tests\npass?", shape=diamond];
    refactor [label="REFACTOR\nClean up", shape=box, style=filled, fillcolor="#ccccff"];
    next [label="Next", shape=ellipse];

    red -> check_red;
    check_red -> green [label="yes"];
    check_red -> red [label="wrong\nfailure"];
    green -> check_green;
    check_green -> refactor [label="yes"];
    check_green -> green [label="no"];
    refactor -> check_green [label="stay\ngreen"];
    check_green -> next;
    next -> red;
}

RED - Write the Failing Test

Write one minimal test that describes the expected behavior.

const outcome = await withRetry(unstableOp);

expect(outcome).toBe('done'); expect(callCount).toBe(3); });

Descriptive name, tests actual behavior, single concern
</Good>

<Bad>
```typescript
test('retry works', async () => {
  const spy = jest.fn()
    .mockRejectedValueOnce(new Error())
    .mockRejectedValueOnce(new Error())
    .mockResolvedValueOnce('done');
  await withRetry(spy);
  expect(spy).toHaveBeenCalledTimes(3);
});

Vague name, validates mock interactions not real behavior

Criteria:

  • One behavior per test
  • Descriptive name
  • Uses real code (mocks only when unavoidable)

Verify RED - Watch It Fail

MANDATORY. Never skip.

npm test path/to/test.test.ts

Confirm:

  • Test fails (not errors out)
  • Failure message matches expectations
  • Fails because the feature is missing (not due to typos)

Test passes? You are testing existing behavior. Rewrite the test.

Test errors? Fix the error, re-run until it fails correctly.

GREEN - Minimal Code

Write the simplest possible code that makes the test pass.

Do not add features, refactor unrelated code, or "improve" beyond what the test requires.

Verify GREEN - Watch It Pass

MANDATORY.

npm test path/to/test.test.ts

Confirm:

  • Test passes
  • Other tests still pass
  • Output is clean (no errors, no warnings)

Test fails? Fix the code, not the test.

Other tests fail? Fix them now.

REFACTOR - Clean Up

Only after green:

  • Eliminate duplication
  • Improve naming
  • Extract helpers

Keep tests green. Do not add new behavior.

Repeat

Next failing test for the next behavior.

Quality Standards

Quality Good Bad
Focused One thing. "and" in the name? Split it. test('validates email and domain and whitespace')
Descriptive Name explains the behavior test('test1')
Intentional Demonstrates the desired API Obscures what the code should do

Why Sequence Matters

"I'll write tests afterward to confirm it works"

Tests written after code pass immediately. Passing immediately proves nothing:

  • May validate the wrong thing
  • May verify implementation details instead of behavior
  • May miss edge cases you overlooked
  • You never watched it catch the failure

Test-first forces you to observe the failure, proving the test actually verifies something.

"I already manually tested all the edge cases"

Manual testing is ad hoc. You believe you covered everything but:

  • No record of what you tested
  • Cannot rerun when code changes
  • Easy to skip cases under pressure
  • "It worked when I tried it" is not comprehensive

Automated tests are systematic. They execute identically every time.

"Deleting X hours of work is wasteful"

Sunk cost fallacy. The time is already spent. Your choice now:

  • Delete and rebuild with test-first (X more hours, high confidence)
  • Keep it and bolt on tests (30 min, low confidence, probable defects)

The real waste is keeping code you cannot trust. Working code without legitimate tests is technical debt.

"Test-first is dogmatic, being pragmatic means adapting"

Test-first IS pragmatic:

  • Catches defects before commit (faster than debugging later)
  • Prevents regressions (tests catch breakage immediately)
  • Documents behavior (tests demonstrate how code works)
  • Enables refactoring (change freely, tests catch problems)

"Pragmatic" shortcuts = debugging in production = slower.

"Tests after achieve the same goals - it's about the spirit not the ritual"

No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"

Tests-after are biased by your implementation. You verify what you built, not what is required. You check the edge cases you remember, not ones you discover.

Tests-first force edge case discovery before implementing. Tests-after verify you remembered everything (you did not).

30 minutes of tests after is not test-first. You get coverage, but lose proof that tests work.

Cognitive Traps

Rationalization What Is Actually True
"Too simple to test" Simple code breaks. A test takes 30 seconds.
"I'll test afterward" Tests passing immediately prove nothing.
"Tests after achieve the same thing" Tests-after = "what does this do?" Tests-first = "what should this do?"
"I already tested manually" Ad hoc is not systematic. No record, cannot rerun.
"Deleting X hours is wasteful" Sunk cost fallacy. Keeping unverified code is technical debt.
"Keep as reference, write tests first" You will adapt it. That is testing after. Delete means delete.
"Need to explore first" Fine. Discard the exploration, then start with test-first.
"Hard to test = I should skip it" Listen to the test. Hard to test = hard to use. Simplify the design.
"Test-first will slow me down" Test-first is faster than debugging. Pragmatic = test-first.
"Manual testing is faster" Manual testing does not prove edge cases. You retest every change.
"Existing code has no tests" You are improving it. Add tests for what exists.

Guardrails - STOP and Start Over

  • Code before test
  • Test after implementation
  • Test passes immediately
  • Cannot explain why the test failed
  • Tests added "later"
  • Rationalizing "just this once"
  • "I already tested it manually"
  • "Tests after serve the same purpose"
  • "It's about the spirit not the ritual"
  • "Keep as reference" or "adapt existing code"
  • "Already invested X hours, deleting is wasteful"
  • "Test-first is dogmatic, I'm being pragmatic"
  • "This is different because..."

All of these mean: Delete the code. Start over with test-first.

Example: Bug Fix

Bug: Empty email accepted by form

RED

test('rejects empty email', async () => {
  const result = await processForm({ email: '' });
  expect(result.error).toBe('Email required');
});

Verify RED

$ npm test
FAIL: expected 'Email required', got undefined

GREEN

function processForm(data: FormData) {
  if (!data.email?.trim()) {
    return { error: 'Email required' };
  }
  // ...
}

Verify GREEN

$ npm test
PASS

REFACTOR Extract a validation pipeline for multiple fields if needed.

Completion Checklist

Before declaring work finished:

  • Every new function/method has a test
  • Watched each test fail before implementing
  • Each test failed for the expected reason (feature missing, not typo)
  • Wrote minimal code to pass each test
  • All tests pass
  • Output is clean (no errors, no warnings)
  • Tests use real code (mocks only when unavoidable)
  • Edge cases and error paths are covered

Cannot check every box? You skipped test-first. Start over.

When Stuck

Problem Solution
Do not know how to write the test Write the API you wish existed. Write the assertion first. Ask your human partner.
Test too complicated Design too complicated. Simplify the interface.
Must mock everything Code too coupled. Use dependency injection.
Test setup is enormous Extract helpers. Still heavy? Simplify the design.

Debugging Integration

Bug found? Write a failing test that reproduces it. Follow the test-first cycle. The test both proves the fix and prevents regression.

Never fix bugs without a test.

Testing Anti-Patterns

When adding mocks or test utilities, read @testing-anti-patterns.md to avoid common pitfalls:

  • Validating mock behavior instead of real behavior
  • Adding test-only methods to production classes
  • Mocking without understanding dependency chains

Final Rule

Production code -> test exists and failed first
Otherwise -> not test-first

No exceptions without your human partner's permission.

Related skills
Installs
13
GitHub Stars
29
First Seen
Apr 5, 2026