test-driven-refactoring
Test-Driven Refactoring
Safely refactor code by establishing test coverage first, making incremental changes, and continuously verifying behavior is preserved.
When to Use This Skill
- Refactoring legacy code without adequate tests
- Improving code structure while preserving behavior
- Adding tests to existing code before modification
- Migrating or rewriting systems incrementally
- Reducing risk when changing critical code paths
Core Principles
The Refactoring Safety Net
┌─────────────────────────────────────────────────────────────┐
│ REFACTORING SAFETY │
├─────────────────────────────────────────────────────────────┤
│ 1. Tests MUST pass before starting │
│ 2. Tests MUST pass after every change │
│ 3. If tests fail → REVERT immediately │
│ 4. Never change tests and code in the same commit │
└─────────────────────────────────────────────────────────────┘
Two Hats Rule
"When refactoring, wear only one hat at a time"
| Hat | Activity | Tests |
|---|---|---|
| 🎩 Refactoring | Change structure, preserve behavior | Tests unchanged, must pass |
| 🧢 Adding Features | Add new behavior | Add new tests |
Never mix these activities in the same commit.
Test-Driven Refactoring Workflow
Phase 1: Characterization Testing
Before refactoring, capture current behavior with characterization tests.
# Characterization test captures ACTUAL behavior (even if wrong)
def test_calculate_discount_characterization():
"""Characterization test - documents current behavior."""
# ARRANGE: Set up the scenario
order = Order(total=150, customer=Customer(is_premium=True))
# ACT: Call the code under test
result = calculate_discount(order)
# ASSERT: Document what it actually does
# Note: We're not asserting what it SHOULD do,
# but what it CURRENTLY does
assert result == 22.5 # 15% discount observed
Writing Characterization Tests
- Identify the code to refactor
- Write a test that calls it
- Use a placeholder assertion (
assert result == "PLACEHOLDER") - Run the test and observe the actual output
- Update assertion with actual value
- Repeat for different inputs and edge cases
def test_discover_actual_behavior():
"""Step-by-step characterization test creation."""
# Step 1: Call the code
result = mystery_function(42, "test")
# Step 2: First run with placeholder to see output
# assert result == "PLACEHOLDER" # Will fail, showing actual value
# Step 3: Update with observed value
assert result == {"value": 42, "name": "test", "processed": True}
Phase 2: Coverage Analysis
Ensure adequate coverage before refactoring.
# Check current coverage
uv run pytest --cov=src/module_to_refactor --cov-report=term-missing
# Generate HTML report for detailed view
uv run pytest --cov=src/module_to_refactor --cov-report=html
# Target: 80%+ coverage on code being refactored
Coverage Targets
| Coverage Level | Risk Assessment |
|---|---|
| 90%+ | ✅ Low risk - safe to refactor |
| 80-89% | ⚠️ Moderate risk - add more tests for uncovered paths |
| 70-79% | 🟠 Higher risk - significant gaps to fill |
| <70% | 🔴 High risk - write characterization tests first |
Phase 3: Incremental Refactoring
Make small, safe changes with continuous verification.
┌──────────────────────────────────────────────────────────┐
│ REFACTORING LOOP │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Make │────▶│ Run │────▶│ Tests │ │
│ │ Small │ │ Tests │ │ Pass? │ │
│ │ Change │ │ │ │ │ │
│ └─────────┘ └─────────┘ └────┬────┘ │
│ ▲ │ │
│ │ ┌──────────┴──────────┐ │
│ │ │ │ │
│ │ YES NO │
│ │ │ │ │
│ │ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐│
│ │ Next │◀─────────│ Commit │ │ REVERT ││
│ │ Change │ │ │ │ ││
│ └─────────┘ └─────────┘ └─────────┘│
│ │
└──────────────────────────────────────────────────────────┘
Refactoring Commands
# Before each change
uv run pytest -v # Verify green
# After each change
uv run pytest -v # Must stay green
# If tests pass
git add -A && git commit -m "refactor: extract validation logic"
# If tests fail
git checkout -- . # Revert all changes
Phase 4: Verification
Confirm refactoring success without behavior changes.
# Full test suite
uv run pytest -v
# Type checking
uv run mypy src/
# Linting
uv run ruff check src/
# Coverage comparison (should not decrease)
uv run pytest --cov=src/ --cov-report=term-missing
Characterization Test Patterns
Pattern 1: Golden Master Testing
Capture complex output as a "golden master" to compare against.
import json
from pathlib import Path
def test_report_generation_golden_master():
"""Compare output against saved golden master."""
# Generate current output
report = generate_report(sample_data)
current_output = json.dumps(report, indent=2, sort_keys=True)
golden_path = Path("tests/golden/report_output.json")
if not golden_path.exists():
# First run: create the golden master
golden_path.parent.mkdir(parents=True, exist_ok=True)
golden_path.write_text(current_output)
pytest.skip("Golden master created - run again to verify")
# Compare against golden master
expected = golden_path.read_text()
assert current_output == expected, "Output differs from golden master"
Pattern 2: Approval Testing
Use approval testing libraries for complex outputs.
from approvaltests import verify, verify_as_json
def test_complex_transformation():
"""Use approval testing for complex outputs."""
result = complex_transformation(input_data)
# First run creates .approved file
# Review and rename to approve
verify_as_json(result)
Pattern 3: Parameterized Characterization
Capture multiple scenarios efficiently.
import pytest
# Discovered behaviors from production/logs
OBSERVED_BEHAVIORS = [
# (input, expected_output)
({"amount": 100, "type": "A"}, 95.0),
({"amount": 100, "type": "B"}, 90.0),
({"amount": 0, "type": "A"}, 0.0),
({"amount": -50, "type": "A"}, -47.5), # Note: negative allowed!
]
@pytest.mark.parametrize("input_data,expected", OBSERVED_BEHAVIORS)
def test_calculate_characterization(input_data, expected):
"""Characterization tests from observed behavior."""
result = calculate(input_data["amount"], input_data["type"])
assert result == expected
Pattern 4: Seam Testing
Test at "seams" - points where you can alter behavior for testing.
# Original tightly coupled code
class OrderProcessor:
def process(self, order):
# Hard to test - creates its own dependencies
db = Database()
email = EmailService()
...
# Refactored with seams
class OrderProcessor:
def __init__(self, db=None, email=None):
# Seam: can inject test doubles
self.db = db or Database()
self.email = email or EmailService()
def process(self, order):
...
# Test using the seam
def test_order_processing():
mock_db = Mock()
mock_email = Mock()
processor = OrderProcessor(db=mock_db, email=mock_email)
processor.process(sample_order)
mock_db.save.assert_called_once()
Safe Refactoring Techniques
Technique 1: Parallel Change (Expand-Contract)
Add new implementation alongside old, switch over gradually.
# Step 1: Add new implementation alongside old
class Calculator:
def calculate_legacy(self, x, y):
"""Original implementation."""
return x + y + 1 # Bug: adds 1
def calculate_new(self, x, y):
"""New correct implementation."""
return x + y
def calculate(self, x, y):
"""Router - can switch implementations."""
# Step 2: Run both, compare results (shadow mode)
old_result = self.calculate_legacy(x, y)
new_result = self.calculate_new(x, y)
if old_result != new_result:
logger.warning(f"Results differ: {old_result} vs {new_result}")
# Step 3: Return old result while validating
return old_result
# Step 4: After validation, switch to new
# return new_result
Technique 2: Branch by Abstraction
Introduce abstraction layer to enable safe switching.
# Step 1: Create abstraction
from abc import ABC, abstractmethod
class PaymentProcessor(ABC):
@abstractmethod
def process(self, amount: float) -> bool:
...
# Step 2: Wrap legacy code
class LegacyPaymentProcessor(PaymentProcessor):
def __init__(self):
self._legacy = OldPaymentSystem()
def process(self, amount: float) -> bool:
return self._legacy.do_payment(amount)
# Step 3: Create new implementation
class ModernPaymentProcessor(PaymentProcessor):
def process(self, amount: float) -> bool:
# New implementation
...
# Step 4: Switch via configuration
def get_payment_processor() -> PaymentProcessor:
if config.use_new_payment:
return ModernPaymentProcessor()
return LegacyPaymentProcessor()
Technique 3: Strangler Fig Pattern
Gradually replace legacy system by routing to new code.
class LegacyRouter:
"""Routes requests to legacy or new implementation."""
def __init__(self):
self.legacy = LegacySystem()
self.new = NewSystem()
# Migration configuration
self.migrated_routes = {
"get_user": True, # Migrated
"update_user": True, # Migrated
"delete_user": False, # Still legacy
"create_order": False, # Still legacy
}
def handle(self, route: str, **kwargs):
if self.migrated_routes.get(route, False):
return self.new.handle(route, **kwargs)
return self.legacy.handle(route, **kwargs)
Test Fixtures for Refactoring
Fixture: Snapshot Current State
import pytest
from dataclasses import asdict
@pytest.fixture
def snapshot_state():
"""Capture state before and after operations."""
snapshots = {"before": None, "after": None}
def capture(label, obj):
if hasattr(obj, "__dict__"):
snapshots[label] = obj.__dict__.copy()
elif hasattr(obj, "asdict"):
snapshots[label] = asdict(obj)
else:
snapshots[label] = obj
yield capture
# Can assert state unchanged if needed
return snapshots
Fixture: Behavior Recorder
@pytest.fixture
def behavior_recorder():
"""Record function calls and returns for characterization."""
recordings = []
def record(func):
def wrapper(*args, **kwargs):
result = func(*args, **kwargs)
recordings.append({
"function": func.__name__,
"args": args,
"kwargs": kwargs,
"result": result,
})
return result
return wrapper
record.get_recordings = lambda: recordings
return record
Checklist: Before Refactoring
## Pre-Refactoring Checklist
### Test Coverage
- [ ] Coverage report generated for target code
- [ ] Coverage >= 80% on code being refactored
- [ ] All critical paths have tests
- [ ] Edge cases identified and tested
### Characterization Tests
- [ ] Current behavior captured in tests
- [ ] Tests document actual behavior (not intended)
- [ ] Golden masters created for complex outputs
- [ ] Parameterized tests for multiple scenarios
### Environment
- [ ] All tests passing (green baseline)
- [ ] Clean git working directory
- [ ] Feature branch created
- [ ] CI/CD pipeline configured
### Plan
- [ ] Refactoring steps identified
- [ ] Each step is small and reversible
- [ ] Commit points planned
- [ ] Rollback strategy defined
Checklist: During Refactoring
## Refactoring Session Checklist
### Each Change
- [ ] Tests still passing before change
- [ ] Made ONE small change
- [ ] Ran tests immediately after
- [ ] Tests still passing
- [ ] Committed with descriptive message
### If Tests Fail
- [ ] STOPPED immediately
- [ ] Reverted changes (git checkout -- .)
- [ ] Analyzed what went wrong
- [ ] Planned smaller step
### Periodically
- [ ] Type checking passing (mypy)
- [ ] Linting passing (ruff)
- [ ] Coverage not decreased
Resources
- Working Effectively with Legacy Code - Michael Feathers
- Refactoring - Martin Fowler
- Approval Tests
Guidelines
- Always establish test coverage before refactoring
- Make the smallest possible change that improves the code
- Run tests after every change, no exceptions
- If tests fail, revert immediately—don't debug while refactoring
- Commit after each successful refactoring step
- Never change behavior and structure in the same commit
- Use characterization tests to document, not validate, behavior
- Preserve behavior first, then fix bugs in separate commits