test-driven-development
Test-Driven Development (TDD) Skill
Enforce the RED-GREEN-REFACTOR cycle for all code changes. Tests are written before implementation code, verified to fail for the right reasons, and maintained through disciplined development cycles.
Instructions
Before starting any TDD cycle, read and follow repository CLAUDE.md files. Project instructions override default TDD behaviors because local conventions (test frameworks, directory layout, naming) vary across codebases.
Phase 1: Write a Failing Test (RED)
The test MUST exist and fail before any implementation code is written, because seeing the test fail first proves it can actually detect the bug or missing feature. A test that has never been seen failing provides no evidence that it tests anything meaningful.
Steps:
- Understand the requirement -- clarify what behavior needs to be implemented
- Write the test first -- create a test that describes the desired behavior
- Use descriptive test names -- the test name should read as a specification of behavior (e.g.,
TestCalculateTotal_WithEmptyCart_ReturnsZero), because vague names likeTestCalcmake failures impossible to diagnose without reading the test body - Write minimal test setup -- only create fixtures/mocks needed for THIS test
- Assert expected behavior -- use specific assertions (not just "no error"), because weak assertions like
assert result != nilpass for wrong reasons and provide false confidence
Use specific assertions:
assert result == 42(specific value)assert error.message.contains("invalid")(specific content)- NOT
assert result != nil(too weak -- passes even when result is garbage) - NOT
assert len(result) > 0(not specific enough -- passes with wrong data)
Test one concept per test. If the test name needs "and", split into multiple tests, because multi-assertion tests produce ambiguous failures.
Follow the Arrange-Act-Assert pattern:
def test_feature():
# Arrange: Set up test data
input_data = create_test_data()
# Act: Execute the code under test
result = function_under_test(input_data)
# Assert: Verify expected behavior
assert result.status == "success"
Optional techniques (use when explicitly requested):
- Property-based testing: Generate tests with random/fuzzed inputs (Go:
testing/quick, Python:hypothesis) - Table-driven tests: Convert multiple similar tests to data-driven approach when 3+ tests share the same structure
Run the test:
go test ./... -v -run TestNewFeature # Go
pytest tests/test_feature.py::test_name -v # Python
npm test -- --testNamePattern="new feature" # JavaScript
Show the full test runner output -- never summarize test results, because summarization hides warnings, partial failures, and unexpected output that reveal problems early.
RED Phase Gate
Do NOT proceed to the GREEN phase until all of these are true:
- Test file is created and saved
- Test has been executed
- Test output shows FAILURE (not syntax/import error)
- Failure message indicates missing implementation
Phase 2: Verify Failure Reason (RED Verification)
The test must fail because the feature is not implemented, NOT because of syntax errors, import errors, wrong test setup, or unrelated failures. A test that fails for the wrong reason proves nothing about the missing feature and will pass for the wrong reason after implementation.
- Execute test command and show the complete output
- Verify failure reason -- confirm the error matches expected missing-implementation patterns:
- Go:
--- FAIL: TestFeatureNamewith expected vs actual mismatch - Python:
AssertionErrororAttributeError: module has no attribute - JavaScript:
Expected X but received undefined
- Go:
If the test fails for the WRONG reason:
- Fix the test setup/syntax
- Re-run until it fails for the RIGHT reason (missing implementation)
- Do NOT proceed until the failure clearly indicates "this feature does not exist yet"
Phase 3: Implement Minimum Code (GREEN)
Write ONLY enough code to make the failing test pass. Implement nothing beyond what the test demands, because untested code paths are invisible liabilities -- they cannot be verified, they rot silently, and they complicate future refactoring.
- Minimal implementation -- the simplest code that satisfies the test
- No extra features -- do not implement behavior not covered by tests. First make it work, then make it right
- Hardcoded values are OK initially -- a hardcoded return that passes the test is better than a generic algorithm that also handles untested cases
Wrong (over-engineering in GREEN phase):
// Test only requires simple addition
func TestCalculator_AddTwoNumbers(t *testing.T) {
calc := NewCalculator()
result := calc.Add(2, 3)
assert.Equal(t, 5, result)
}
// But implementation adds unnecessary complexity
type Calculator struct {
operations map[string]func(float64, float64) float64
precision int
history []Operation
}
Correct (implement only what is tested):
type Calculator struct{}
func (c *Calculator) Add(a, b int) int {
return a + b
}
// Add complexity ONLY when a test requires it
Phase 4: Verify Test Passes (GREEN Verification)
Run the test and show the complete output. Never summarize -- the full output reveals warnings, deprecation notices, and timing issues that summaries hide.
- Execute test command and display all output
- Verify PASS status
- Run the full test suite -- not just the new test, because a change that makes one test pass while breaking another is not progress. Run tests after every code modification to catch regressions immediately
go test ./... -v # Go - all tests
pytest -v # Python - all tests
npm test # JavaScript - all tests
If the test still fails:
- Review implementation logic
- Check test assertions are correct
- Debug until the test passes
GREEN Phase Gate
Do NOT proceed to the REFACTOR phase until all of these are true:
- Implementation code is written
- New test has been executed and shows PASS
- Full test suite has been executed
- No other tests have been broken
Phase 5: Refactor (REFACTOR)
Improve code quality without changing behavior. Run the full test suite before refactoring to establish a green baseline, because you need proof that any future failure was caused by your refactoring, not by a pre-existing issue.
Refactoring decision criteria (evaluate each):
| Criterion | Check | Action if YES |
|---|---|---|
| Duplication | Same logic in 2+ places? | Extract to shared function |
| Naming | Names unclear or misleading? | Rename for clarity |
| Length | Function >20 lines? | Extract sub-functions |
| Complexity | Nested conditionals >2 deep? | Simplify or extract |
| Reusability | Could other code use this? | Extract to module |
- Run full test suite BEFORE refactoring -- establish green baseline
- Refactor incrementally -- extract functions, rename for clarity, remove duplication
- Run tests after EACH refactoring step -- ensure tests stay green after every individual change, because large refactoring batches make it impossible to identify which change broke the test
- Refactor tests too -- improve test readability and maintainability. Suggest better assertions, edge cases, and test organization where they would strengthen coverage
Test behavior, not implementation details. Tests coupled to internals break on refactoring and defeat its purpose:
Wrong (testing internals):
func TestParser_UsesCorrectRegex(t *testing.T) {
parser := NewParser()
// Testing internal regex pattern -- breaks on refactor
assert.Equal(t, `\d{3}-\d{3}-\d{4}`, parser.phoneRegex)
}
Correct (testing behavior):
func TestParser_ValidPhoneNumber_ParsesCorrectly(t *testing.T) {
parser := NewParser()
result, err := parser.ParsePhone("123-456-7890")
assert.NoError(t, err)
assert.Equal(t, "1234567890", result.Digits())
}
func TestParser_InvalidPhoneNumber_ReturnsError(t *testing.T) {
parser := NewParser()
_, err := parser.ParsePhone("invalid")
assert.Error(t, err)
assert.Contains(t, err.Error(), "invalid phone format")
}
Track which code paths are tested and suggest missing coverage, because untested paths are invisible to the refactoring safety net.
Optional techniques (use when explicitly requested):
- Mutation testing: Verify test quality by introducing bugs -- if mutating code does not break a test, that test is too weak
- Benchmark tests: Performance regression testing to ensure refactoring does not degrade speed
- Test parallelization: Run independent tests concurrently for speed
REFACTOR Phase Gate
Do NOT mark the task complete until all of these are true:
- All refactoring changes are saved
- Full test suite has been executed
- ALL tests pass (not just the new one)
- Code quality has been evaluated against the criteria table above
Phase 6: Commit
Commit the test and implementation together as an atomic unit, because separating them creates a window where the repository is in an inconsistent state -- either tests exist for unimplemented code, or code exists without its test coverage.
- Review changes -- verify test + implementation are complete
- Run full test suite -- ensure nothing broke
- Commit with descriptive message
After committing, clean up any temporary test files, coverage reports, or debug outputs created during the TDD cycle. Keep only files explicitly needed for the project.
Report facts without self-congratulation. Show command output rather than describing it.
Cycle Discipline
Each feature gets its own RED-GREEN-REFACTOR cycle. Do not batch multiple features into one cycle:
Wrong (implementing everything at once):
// Implementing many features at once without tests
class UserManager {
createUser(data) { /* complex logic */ }
updateUser(id, data) { /* complex logic */ }
deleteUser(id) { /* complex logic */ }
validateUser(user) { /* complex logic */ }
}
// Then one giant test for everything
Correct (one cycle per feature):
// Cycle 1: Create user (RED -> GREEN -> REFACTOR)
it('should create user with valid data', () => {
const manager = new UserManager()
const user = manager.createUser({ name: 'Alice', email: 'alice@example.com' })
expect(user.id).toBeDefined()
expect(user.name).toBe('Alice')
})
// Implement createUser() to pass, then move to next cycle
// Cycle 2: Validate user (RED -> GREEN -> REFACTOR)
it('should reject user with invalid email', () => {
const manager = new UserManager()
expect(() => manager.createUser({ name: 'Bob', email: 'invalid' }))
.toThrow('Invalid email format')
})
// Add validation to make test pass
Reference Material
Language-Specific Testing Commands
| Language | Run One Test | Run All | With Coverage |
|---|---|---|---|
| Go | go test -v -run TestName ./pkg |
go test ./... |
go test -cover ./... |
| Python | pytest tests/test_file.py::test_fn -v |
pytest |
pytest --cov=src |
| JavaScript | npm test -- --testNamePattern="name" |
npm test |
npm test -- --coverage |
Reference Files
${CLAUDE_SKILL_DIR}/references/examples.md: Language-specific TDD examples (Go, Python, JavaScript)
Error Handling
Test passes before implementation (RED phase)
Symptom: Test shows PASS in RED phase
Causes:
- Test is testing the wrong thing
- Implementation already exists elsewhere
- Test assertions are too weak (always true)
Solution:
- Review test assertions -- are they specific enough?
- Verify test is actually calling the code under test
- Check for existing implementation of the feature
- Strengthen assertions to actually verify behavior
Test fails for wrong reason (RED phase)
Symptom: Syntax errors, import errors, setup failures in RED phase
Causes:
- Test setup incomplete
- Missing dependencies
- Incorrect import paths
Solution:
- Fix syntax/import errors first
- Set up necessary fixtures/mocks
- Verify test file structure matches project conventions
- Re-run until test fails for RIGHT reason (missing feature)
Tests pass but feature does not work
Symptom: Tests green but manual testing shows bugs
Causes:
- Tests do not cover actual usage
- Test mocks do not match real behavior
- Edge cases not tested
Solution:
- Review test coverage -- what is missing?
- Add integration tests alongside unit tests
- Test with real data, not just mocks
- Add edge case tests (empty input, null, extremes)
Refactoring breaks tests
Symptom: Tests fail after refactoring
Causes:
- Tests coupled to implementation details
- Brittle assertions (checking internals not behavior)
- Large refactoring without incremental steps
Solution:
- Test behavior, not implementation details
- Refactor in smaller steps
- Run tests after each micro-refactoring
- Update tests if API contract legitimately changed
More from notque/claude-code-toolkit
generate-claudemd
Generate project-specific CLAUDE.md from repo analysis.
12fish-shell-config
Fish shell configuration and PATH management.
12pptx-generator
PPTX presentation generation with visual QA: slides, pitch decks.
12codebase-overview
Systematic codebase exploration and architecture mapping.
10image-to-video
FFmpeg-based video creation from image and audio.
9data-analysis
Decision-first data analysis with statistical rigor gates.
9