Test-Driven Development

Write tests first, implement minimal code to pass, refactor systematically.

<when_to_use>

New features with TDD methodology
Complex business logic requiring coverage
Critical paths: auth, payments, data integrity
Bug fixes: reproduce with test, fix, verify
Refactoring: ensure behavior preservation
API design: tests define the interface

NOT for: exploratory coding, UI prototypes, static config, trivial glue code

</when_to_use>

Load the maintain-tasks skill for stage tracking. Advance through RED-GREEN-REFACTOR cycle.

Stage	Trigger	activeForm
Red	Session start / cycle restart	"Writing failing test"
Green	Test written and failing	"Implementing code"
Refactor	Tests passing	"Refactoring code"
Verify	Refactor complete	"Verifying implementation"

Task format:

- Write failing test for { feature }
- Implement { feature } to pass tests
- Refactor { aspect }
- Verify { what's being checked }

Workflow:

Start: Create "Red" stage in_progress
Transition: Mark current completed, add next in_progress
After each stage: Run tests before advancing
Multiple cycles: Return to "Red" for next feature

Edge cases:

Good existing tests: Start at "Refactor" after confirming pass
Bug fix: Start at "Red" with failing test reproducing bug
No regression: Tests must continue passing through all stages

RED --> GREEN --> REFACTOR --> RED --> ...
 |       |          |
Test   Impl      Improve
Fails  Passes   Quality

Each cycle: 5-15 min. Longer = step too large, decompose.

Philosophy:

Red-Green-Refactor as primary workflow
Test quality over quantity - behavior, not implementation
Incremental progress - small focused cycles
Type safety throughout - tests as type-safe as production

<red_phase>

Write tests defining desired behavior before implementation exists.

Guidelines:

3-5 related tests fully specifying one feature
Type system makes invalid states unrepresentable
Each test = one specific behavior
Run tests, verify fail for right reason
Descriptive names forming sentences

TypeScript:

import { describe, test, expect } from 'bun:test'

describe('UserAuthentication', () => {
  test('authenticates with valid credentials', async () => {
    const result = await authenticate({ email: 'user@example.com', password: 'SecurePass123!' })
    expect(result).toMatchObject({ type: 'success', user: expect.objectContaining({ email: 'user@example.com' }) })
  })

  test('rejects invalid credentials', async () => {
    const result = await authenticate({ email: 'wrong@example.com', password: 'wrong' })
    expect(result).toMatchObject({ type: 'error', code: 'INVALID_CREDENTIALS' })
  })

  test.todo('implements rate limiting after failed attempts')
})

Rust:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn authenticates_with_valid_credentials() {
        let creds = Credentials { email: "user@example.com".into(), password: "SecurePass123!".into() };
        assert!(matches!(authenticate(&creds), Ok(AuthResult::Success { .. })));
    }

    #[test]
    fn rejects_invalid_credentials() {
        let creds = Credentials { email: "wrong@example.com".into(), password: "wrong".into() };
        assert!(matches!(authenticate(&creds), Err(AuthError::InvalidCredentials)));
    }
}

Commit: test: add failing tests for [feature]

Transition: Mark "Red" completed, create "Green" in_progress

</red_phase>

<green_phase>

Implement minimum code to make tests pass.

Guidelines:

Focus on passing tests, not perfect code
Explicit types where aids clarity
Straightforward solutions first
Hardcode if passes test - refactor generalizes
Run tests frequently

TypeScript:

type AuthResult = { type: 'success'; user: User } | { type: 'error'; code: string }

async function authenticate(creds: { email: string; password: string }): Promise<AuthResult> {
  if (!creds.password) return { type: 'error', code: 'MISSING_PASSWORD' }
  const user = await findUserByEmail(creds.email)
  if (!user) return { type: 'error', code: 'INVALID_CREDENTIALS' }
  const match = await comparePassword(creds.password, user.passwordHash)
  if (!match) return { type: 'error', code: 'INVALID_CREDENTIALS' }
  return { type: 'success', user }
}

Rust:

pub fn authenticate(creds: &Credentials) -> Result<AuthResult, AuthError> {
    if creds.password.is_empty() { return Err(AuthError::MissingPassword); }
    let user = find_user_by_email(&creds.email).ok_or(AuthError::InvalidCredentials)?;
    if !compare_password(&creds.password, &user.password_hash) {
        return Err(AuthError::InvalidCredentials);
    }
    Ok(AuthResult::Success { user })
}

Verify: bun test / cargo test

Commit: feat: implement [feature] to pass tests

Transition: Mark "Green" completed, create "Refactor" in_progress

</green_phase>

<refactor_phase>

Enhance code quality without changing behavior. Tests must continue passing.

Guidelines:

Extract common patterns into well-named functions
Apply SOLID principles where appropriate
Improve types: discriminated unions, branded types
No test behavior changes
Run tests after each step

TypeScript:

// Extract validation
function validateCredentials(creds: { email: string; password: string }): AuthResult | null {
  if (!creds.password) return { type: 'error', code: 'MISSING_PASSWORD' }
  if (!isValidEmail(creds.email)) return { type: 'error', code: 'INVALID_EMAIL' }
  return null
}

// Branded types for safety
type Email = string & { readonly __brand: 'Email' }

Rust:

// Extract validation
fn validate_credentials(creds: &Credentials) -> Result<(), AuthError> {
    if creds.password.is_empty() { return Err(AuthError::MissingPassword); }
    if !is_valid_email(&creds.email) { return Err(AuthError::InvalidEmail); }
    Ok(())
}

// Newtype for safety
pub struct Email(String);

Verify: bun test / cargo test

Commit: refactor: [improvement description]

Transition: Mark "Refactor" completed, create "Verify" in_progress

Final: Run full suite. Mark "Verify" completed when all checks pass.

</refactor_phase>

Follow project conventions, defaulting to:

TypeScript/Bun:

src/{module}/{name}.ts          # Implementation
src/{module}/{name}.test.ts     # Unit tests colocated
src/{module}/__fixtures__/      # Test data
tests/integration/              # Integration tests
tests/e2e/                      # End-to-end tests

Rust:

src/{module}/mod.rs             # #[cfg(test)] mod tests { ... }
tests/integration/              # Integration tests
tests/fixtures/                 # Test data

Metric	Target
Line coverage	>=80% (90% critical paths)
Mutation score	>=75%
Unit test time	<5s

Test characteristics:

Single clear assertion per test
No execution order dependencies
Descriptive names forming sentences
Behavior focus, not implementation

Smells to avoid:

Setup longer than test
Multiple unrelated assertions
Coupling to implementation details
Flaky tests

See quality-metrics.md for coverage and mutation testing details.

<bug_fixes>

TDD workflow for bugs:

Write failing test reproducing bug (Start "Red" in_progress)
Verify fails for right reason
Fix with minimal code (Transition to "Green")
Verify passes, all others still pass
Refactor if needed (Transition to "Refactor" or skip to "Verify")
Commit: fix: [bug description] with test coverage

Example:

// 1. Failing test
test('handles division by zero gracefully', () => {
  expect(divide(10, 0)).toMatchObject({ type: 'error', code: 'DIVISION_BY_ZERO' })
})

// 3. Fix
function divide(a: number, b: number): Result {
  if (b === 0) return { type: 'error', code: 'DIVISION_BY_ZERO' }
  return { type: 'success', value: a / b }
}

</bug_fixes>

ALWAYS:

Track progress with Tasks (load maintain-tasks skill)
Write tests before implementation (RED first)
Run tests after each stage
Verify tests fail for right reason in RED
Keep cycles 5-15 min max
Descriptive test names forming sentences
Test behavior, not implementation
Each test = one reason to fail

NEVER:

Skip to implementation without tests
Change test behavior during refactoring
Test implementation details or private methods
Allow tests to depend on execution order
Write flaky tests
Mark stage complete without running tests
Multiple unrelated assertions per test

<quick_reference>

# TypeScript/Bun
bun test                    # Run all tests
bun test --watch            # Watch mode
bun test --coverage         # Coverage report
bun test --only             # Run only .only tests
bun x stryker run           # Mutation testing

# Rust
cargo test                  # Run all tests
cargo test --test NAME      # Specific integration test
cargo tarpaulin             # Coverage report
cargo mutants               # Mutation testing
cargo test -- --nocapture   # Show println! output

</quick_reference>

test-patterns.md - Discriminated unions, builders, mocking, parameterized tests, async patterns for TypeScript and Rust
quality-metrics.md - Coverage analysis, mutation testing setup, CI integration, thresholds
feature-implementation.md - Full TDD session walkthrough
bug-fix.md - TDD workflow for bug fixes

tdd

Test-Driven Development

More from outfitter-dev/agents

codebase-recon

graphite-stacks

code-review

hono-dev

software-craft

subagents