Test-Driven Refactoring

Safely refactor code by establishing test coverage first, making incremental changes, and continuously verifying behavior is preserved.

When to Use This Skill

Refactoring legacy code without adequate tests
Improving code structure while preserving behavior
Adding tests to existing code before modification
Migrating or rewriting systems incrementally
Reducing risk when changing critical code paths

Core Principles

The Refactoring Safety Net

┌─────────────────────────────────────────────────────────────┐
│                    REFACTORING SAFETY                        │
├─────────────────────────────────────────────────────────────┤
│  1. Tests MUST pass before starting                         │
│  2. Tests MUST pass after every change                      │
│  3. If tests fail → REVERT immediately                      │
│  4. Never change tests and code in the same commit          │
└─────────────────────────────────────────────────────────────┘

Two Hats Rule

"When refactoring, wear only one hat at a time"

Hat	Activity	Tests
🎩 Refactoring	Change structure, preserve behavior	Tests unchanged, must pass
🧢 Adding Features	Add new behavior	Add new tests

Never mix these activities in the same commit.

Test-Driven Refactoring Workflow

Phase 1: Characterization Testing

Before refactoring, capture current behavior with characterization tests.

# Characterization test captures ACTUAL behavior (even if wrong)
def test_calculate_discount_characterization():
    """Characterization test - documents current behavior."""
    # ARRANGE: Set up the scenario
    order = Order(total=150, customer=Customer(is_premium=True))

    # ACT: Call the code under test
    result = calculate_discount(order)

    # ASSERT: Document what it actually does
    # Note: We're not asserting what it SHOULD do,
    # but what it CURRENTLY does
    assert result == 22.5  # 15% discount observed

Writing Characterization Tests

Identify the code to refactor
Write a test that calls it
Use a placeholder assertion (assert result == "PLACEHOLDER")
Run the test and observe the actual output
Update assertion with actual value
Repeat for different inputs and edge cases

def test_discover_actual_behavior():
    """Step-by-step characterization test creation."""
    # Step 1: Call the code
    result = mystery_function(42, "test")

    # Step 2: First run with placeholder to see output
    # assert result == "PLACEHOLDER"  # Will fail, showing actual value

    # Step 3: Update with observed value
    assert result == {"value": 42, "name": "test", "processed": True}

Phase 2: Coverage Analysis

Ensure adequate coverage before refactoring.

# Check current coverage
uv run pytest --cov=src/module_to_refactor --cov-report=term-missing

# Generate HTML report for detailed view
uv run pytest --cov=src/module_to_refactor --cov-report=html

# Target: 80%+ coverage on code being refactored

Coverage Targets

Coverage Level	Risk Assessment
90%+	✅ Low risk - safe to refactor
80-89%	⚠️ Moderate risk - add more tests for uncovered paths
70-79%	🟠 Higher risk - significant gaps to fill
<70%	🔴 High risk - write characterization tests first

Phase 3: Incremental Refactoring

Make small, safe changes with continuous verification.

┌──────────────────────────────────────────────────────────┐
│                  REFACTORING LOOP                         │
│                                                           │
│    ┌─────────┐     ┌─────────┐     ┌─────────┐          │
│    │  Make   │────▶│  Run    │────▶│ Tests   │          │
│    │  Small  │     │  Tests  │     │ Pass?   │          │
│    │ Change  │     │         │     │         │          │
│    └─────────┘     └─────────┘     └────┬────┘          │
│         ▲                               │                │
│         │                    ┌──────────┴──────────┐     │
│         │                    │                     │     │
│         │                   YES                   NO     │
│         │                    │                     │     │
│         │                    ▼                     ▼     │
│    ┌─────────┐          ┌─────────┐          ┌─────────┐│
│    │  Next   │◀─────────│ Commit  │          │ REVERT  ││
│    │ Change  │          │         │          │         ││
│    └─────────┘          └─────────┘          └─────────┘│
│                                                          │
└──────────────────────────────────────────────────────────┘

Refactoring Commands

# Before each change
uv run pytest -v  # Verify green

# After each change
uv run pytest -v  # Must stay green

# If tests pass
git add -A && git commit -m "refactor: extract validation logic"

# If tests fail
git checkout -- .  # Revert all changes

Phase 4: Verification

Confirm refactoring success without behavior changes.

# Full test suite
uv run pytest -v

# Type checking
uv run mypy src/

# Linting
uv run ruff check src/

# Coverage comparison (should not decrease)
uv run pytest --cov=src/ --cov-report=term-missing

Characterization Test Patterns

Pattern 1: Golden Master Testing

Capture complex output as a "golden master" to compare against.

import json
from pathlib import Path

def test_report_generation_golden_master():
    """Compare output against saved golden master."""
    # Generate current output
    report = generate_report(sample_data)
    current_output = json.dumps(report, indent=2, sort_keys=True)

    golden_path = Path("tests/golden/report_output.json")

    if not golden_path.exists():
        # First run: create the golden master
        golden_path.parent.mkdir(parents=True, exist_ok=True)
        golden_path.write_text(current_output)
        pytest.skip("Golden master created - run again to verify")

    # Compare against golden master
    expected = golden_path.read_text()
    assert current_output == expected, "Output differs from golden master"

Pattern 2: Approval Testing

Use approval testing libraries for complex outputs.

from approvaltests import verify, verify_as_json

def test_complex_transformation():
    """Use approval testing for complex outputs."""
    result = complex_transformation(input_data)

    # First run creates .approved file
    # Review and rename to approve
    verify_as_json(result)

Pattern 3: Parameterized Characterization

Capture multiple scenarios efficiently.

import pytest

# Discovered behaviors from production/logs
OBSERVED_BEHAVIORS = [
    # (input, expected_output)
    ({"amount": 100, "type": "A"}, 95.0),
    ({"amount": 100, "type": "B"}, 90.0),
    ({"amount": 0, "type": "A"}, 0.0),
    ({"amount": -50, "type": "A"}, -47.5),  # Note: negative allowed!
]

@pytest.mark.parametrize("input_data,expected", OBSERVED_BEHAVIORS)
def test_calculate_characterization(input_data, expected):
    """Characterization tests from observed behavior."""
    result = calculate(input_data["amount"], input_data["type"])
    assert result == expected

Pattern 4: Seam Testing

Test at "seams" - points where you can alter behavior for testing.

# Original tightly coupled code
class OrderProcessor:
    def process(self, order):
        # Hard to test - creates its own dependencies
        db = Database()
        email = EmailService()
        ...

# Refactored with seams
class OrderProcessor:
    def __init__(self, db=None, email=None):
        # Seam: can inject test doubles
        self.db = db or Database()
        self.email = email or EmailService()

    def process(self, order):
        ...

# Test using the seam
def test_order_processing():
    mock_db = Mock()
    mock_email = Mock()
    processor = OrderProcessor(db=mock_db, email=mock_email)

    processor.process(sample_order)

    mock_db.save.assert_called_once()

Safe Refactoring Techniques

Technique 1: Parallel Change (Expand-Contract)

Add new implementation alongside old, switch over gradually.

# Step 1: Add new implementation alongside old
class Calculator:
    def calculate_legacy(self, x, y):
        """Original implementation."""
        return x + y + 1  # Bug: adds 1

    def calculate_new(self, x, y):
        """New correct implementation."""
        return x + y

    def calculate(self, x, y):
        """Router - can switch implementations."""
        # Step 2: Run both, compare results (shadow mode)
        old_result = self.calculate_legacy(x, y)
        new_result = self.calculate_new(x, y)

        if old_result != new_result:
            logger.warning(f"Results differ: {old_result} vs {new_result}")

        # Step 3: Return old result while validating
        return old_result

        # Step 4: After validation, switch to new
        # return new_result

Technique 2: Branch by Abstraction

Introduce abstraction layer to enable safe switching.

# Step 1: Create abstraction
from abc import ABC, abstractmethod

class PaymentProcessor(ABC):
    @abstractmethod
    def process(self, amount: float) -> bool:
        ...

# Step 2: Wrap legacy code
class LegacyPaymentProcessor(PaymentProcessor):
    def __init__(self):
        self._legacy = OldPaymentSystem()

    def process(self, amount: float) -> bool:
        return self._legacy.do_payment(amount)

# Step 3: Create new implementation
class ModernPaymentProcessor(PaymentProcessor):
    def process(self, amount: float) -> bool:
        # New implementation
        ...

# Step 4: Switch via configuration
def get_payment_processor() -> PaymentProcessor:
    if config.use_new_payment:
        return ModernPaymentProcessor()
    return LegacyPaymentProcessor()

Technique 3: Strangler Fig Pattern

Gradually replace legacy system by routing to new code.

class LegacyRouter:
    """Routes requests to legacy or new implementation."""

    def __init__(self):
        self.legacy = LegacySystem()
        self.new = NewSystem()

        # Migration configuration
        self.migrated_routes = {
            "get_user": True,      # Migrated
            "update_user": True,   # Migrated
            "delete_user": False,  # Still legacy
            "create_order": False, # Still legacy
        }

    def handle(self, route: str, **kwargs):
        if self.migrated_routes.get(route, False):
            return self.new.handle(route, **kwargs)
        return self.legacy.handle(route, **kwargs)

Test Fixtures for Refactoring

Fixture: Snapshot Current State

import pytest
from dataclasses import asdict

@pytest.fixture
def snapshot_state():
    """Capture state before and after operations."""
    snapshots = {"before": None, "after": None}

    def capture(label, obj):
        if hasattr(obj, "__dict__"):
            snapshots[label] = obj.__dict__.copy()
        elif hasattr(obj, "asdict"):
            snapshots[label] = asdict(obj)
        else:
            snapshots[label] = obj

    yield capture

    # Can assert state unchanged if needed
    return snapshots

Fixture: Behavior Recorder

@pytest.fixture
def behavior_recorder():
    """Record function calls and returns for characterization."""
    recordings = []

    def record(func):
        def wrapper(*args, **kwargs):
            result = func(*args, **kwargs)
            recordings.append({
                "function": func.__name__,
                "args": args,
                "kwargs": kwargs,
                "result": result,
            })
            return result
        return wrapper

    record.get_recordings = lambda: recordings
    return record

Checklist: Before Refactoring

## Pre-Refactoring Checklist

### Test Coverage
- [ ] Coverage report generated for target code
- [ ] Coverage >= 80% on code being refactored
- [ ] All critical paths have tests
- [ ] Edge cases identified and tested

### Characterization Tests
- [ ] Current behavior captured in tests
- [ ] Tests document actual behavior (not intended)
- [ ] Golden masters created for complex outputs
- [ ] Parameterized tests for multiple scenarios

### Environment
- [ ] All tests passing (green baseline)
- [ ] Clean git working directory
- [ ] Feature branch created
- [ ] CI/CD pipeline configured

### Plan
- [ ] Refactoring steps identified
- [ ] Each step is small and reversible
- [ ] Commit points planned
- [ ] Rollback strategy defined

Checklist: During Refactoring

## Refactoring Session Checklist

### Each Change
- [ ] Tests still passing before change
- [ ] Made ONE small change
- [ ] Ran tests immediately after
- [ ] Tests still passing
- [ ] Committed with descriptive message

### If Tests Fail
- [ ] STOPPED immediately
- [ ] Reverted changes (git checkout -- .)
- [ ] Analyzed what went wrong
- [ ] Planned smaller step

### Periodically
- [ ] Type checking passing (mypy)
- [ ] Linting passing (ruff)
- [ ] Coverage not decreased

Resources

Working Effectively with Legacy Code - Michael Feathers
Refactoring - Martin Fowler
Approval Tests

Guidelines

Always establish test coverage before refactoring
Make the smallest possible change that improves the code
Run tests after every change, no exceptions
If tests fail, revert immediately—don't debug while refactoring
Commit after each successful refactoring step
Never change behavior and structure in the same commit
Use characterization tests to document, not validate, behavior
Preserve behavior first, then fix bugs in separate commits

test-driven-refactoring