Systematic Debugging Skill

Overview

This skill provides a structured four-phase debugging framework emphasizing root cause discovery before attempting fixes. Core principle: "Random fixes waste time and create new bugs. Quick patches mask underlying issues."

Quick Start

Investigate - Gather evidence, reproduce consistently
Analyze - Compare with working patterns
Hypothesize - Form and test specific theories
Implement - Fix with test coverage

When to Use

Bug reports requiring investigation
Test failures with unclear causes
Production incidents
Performance regressions
Integration failures
Any debugging that requires more than 5 minutes

The Four Phases

Phase 1: Root Cause Investigation

Objective: Understand the problem completely before attempting any fix.

Steps:

Examine error messages thoroughly
Reproduce the issue consistently
Review recent changes (commits, configs, dependencies)
Gather diagnostic evidence (logs, traces, metrics)
For multi-component systems, add instrumentation at each boundary

Questions to answer:

What exactly is failing?
When did it start failing?
What changed recently?
Can I reproduce it reliably?

Phase 2: Pattern Analysis

Objective: Find working examples and understand differences.

Steps:

Locate working examples in the codebase
Compare against reference implementations completely
Identify differences systematically
Understand all dependencies

Key comparisons:

Working vs. broken code paths
Expected vs. actual behavior
Known good state vs. current state

Phase 3: Hypothesis and Testing

Objective: Form and validate theories before changing code.

Steps:

Formulate a specific hypothesis
Design a test for the hypothesis
Test with minimal changes (one variable at a time)
Verify results before proceeding

Hypothesis format: "The bug occurs because [condition] when [trigger], which causes [symptom]."

Phase 4: Implementation

Objective: Fix the root cause with proper verification.

Steps:

Create a failing test case reproducing the bug
Implement a single fix addressing the root cause
Verify the test passes
Verify no other tests broke
Document the fix

Critical Safeguards

Hard Stop Rule

If >= 3 fixes fail: STOP and question the architecture.

When multiple fixes fail, the issue indicates deeper structural problems requiring discussion rather than continued symptom-patching.

Red Flags (Restart Process)

Proposing solutions before investigation
Attempting multiple simultaneous fixes
Assuming without verification
Skipping reproduction step
"It should work" without evidence

Debugging Anti-Patterns

Anti-Pattern	Problem	Correct Approach
Shotgun debugging	Random changes hoping something works	Systematic investigation
Printf debugging only	Incomplete picture	Structured instrumentation
Blame the framework	Avoids understanding	Verify framework behavior
"Works on my machine"	Environment assumptions	Document exact repro steps
Quick patch	Hides root cause	Find and fix actual cause

Instrumentation Strategies

Logging Strategy

1. Entry/exit of suspected functions
2. Input/output values at boundaries
3. State changes at key points
4. Timing information for performance issues

Boundary Tracing

For multi-component systems:

[Input] -> [Component A] -> [Component B] -> [Output]
   ^            ^               ^              ^
   |            |               |              |
 Check 1     Check 2         Check 3       Check 4

Add verification at each boundary to isolate failure point.

Best Practices

Do

Reproduce before investigating
Document investigation steps
Test one hypothesis at a time
Write regression test for every bug fix
Share findings with team
Update documentation when environment-related

Don't

Jump to conclusions
Make multiple changes at once
Fix symptoms instead of causes
Skip the hypothesis step
Merge fixes without tests
Ignore intermittent failures

Error Handling

Situation	Action
Cannot reproduce	Gather more context, check environment differences
Multiple potential causes	Isolate and test each separately
Fix breaks other things	Revert, investigate dependencies
Root cause unclear after investigation	Escalate, add more instrumentation

Metrics

Metric	Target	Description
First-fix success rate	>80%	Fixes that resolve issue first time
Regression rate	<5%	Bug fixes causing new bugs
Investigation time ratio	>60%	Time spent investigating vs. coding
Documentation rate	100%	Bugs documented with root cause

systematic-debugging

Systematic Debugging Skill

Overview

Quick Start

When to Use

The Four Phases

Phase 1: Root Cause Investigation

Phase 2: Pattern Analysis

Phase 3: Hypothesis and Testing

Phase 4: Implementation

Critical Safeguards

Hard Stop Rule

Red Flags (Restart Process)

Debugging Anti-Patterns

Instrumentation Strategies

Logging Strategy

Boundary Tracing

Best Practices

Do

Don't

Error Handling

Metrics

Debugging Checklist

Related Skills

Version History