Systematic Debugging Skill

Overview

A rigorous debugging framework that enforces root cause analysis before any fix attempts. This skill exists because symptom-focused repairs perpetuate problems rather than solving them.

Core Mandate: NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST

Type

technique

When to Use

Trigger this skill when:

Bug appears, test fails, or error occurs
Code "should work" but doesn't
Same bug appears twice (sign of symptom fix)
Fix attempt failed
Flaky tests or race conditions
Multi-layer system failures
Unclear where invalid data originates

Keywords: root cause, symptom, workaround, debugging, investigation, flaky, race condition, timeout, trace

digraph when_to_use {
    "Bug or error appears" [shape=ellipse];
    "Quick fix obvious?" [shape=diamond];
    "STOP - investigate anyway" [shape=box, style=filled, fillcolor=yellow];
    "Start Phase 1" [shape=box];

    "Bug or error appears" -> "Quick fix obvious?";
    "Quick fix obvious?" -> "STOP - investigate anyway" [label="yes"];
    "Quick fix obvious?" -> "Start Phase 1" [label="no"];
    "STOP - investigate anyway" -> "Start Phase 1";
}

The Four Phases

Phase 1: Root Cause Investigation (NEVER SKIP)

Purpose: Understand the bug before touching code

Checklist:

Read complete error message (all of it, not just first line)
Reproduce issue consistently
Identify: When did this start? What changed?
Examine recent commits, deployments, config changes
Gather diagnostic evidence at component boundaries
Trace data flow through all layers

Instrumentation questions:

What inputs trigger the failure?
At which layer does valid data become invalid?
What's in logs/console at failure time?

Phase 2: Pattern Analysis

Purpose: Learn from existing working code

Checklist:

Find comparable working implementation
Study reference example thoroughly (ACTUALLY READ IT)
Identify specific differences from failing code
Map all dependencies involved
Check documentation for expected behavior

Don't claim to understand a pattern without reading it completely.

Phase 3: Hypothesis Testing

Purpose: Apply scientific method, not guessing

Rules:

Formulate ONE explicit hypothesis
State what evidence would prove/disprove it
Test with MINIMAL change (single variable)
Record results before forming new hypothesis
If hypothesis wrong, return to Phase 1

Example:

Hypothesis: "Token invalidation occurs because middleware
checks timestamp before refresh completes"

Test: Add logging before/after middleware check
Expected: Logs show race condition timing
Actual: [record what actually happens]

Phase 4: Implementation

Purpose: Fix at the source, verify completely

Rules:

Write failing test first (proves bug exists)
Apply single fix (not multiple changes)
Verify test passes
Check for regressions
Add defense-in-depth validation
Document in BUG_REFERENCE.md

Critical Safeguards

The Three-Strike Rule

digraph three_strikes {
    "Fix attempt" [shape=ellipse];
    "Did it work?" [shape=diamond];
    "Done" [shape=doublecircle];
    "Strike count < 3?" [shape=diamond];
    "Try new fix" [shape=box];
    "STOP" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
    "Return to Phase 1" [shape=box];

    "Fix attempt" -> "Did it work?";
    "Did it work?" -> "Done" [label="yes"];
    "Did it work?" -> "Strike count < 3?" [label="no"];
    "Strike count < 3?" -> "Try new fix" [label="yes"];
    "Strike count < 3?" -> "STOP" [label="no - 3 strikes"];
    "Try new fix" -> "Fix attempt";
    "STOP" -> "Return to Phase 1";
}

After 3 failed fixes: STOP. You don't understand the problem. Return to Phase 1 with fresh eyes.

Time Pressure Resistance

This skill is designed to resist rationalization under pressure:

Pressure	Wrong Response	Right Response
"It's urgent!"	Skip investigation	15-min investigation saves 3 hours
"Customer waiting"	Quick patch	Root cause prevents repeat tickets
"I'm exhausted"	Just ship something	Rest + fresh investigation
"Senior says just fix it"	Blind compliance	"10 min to understand first?"

Reality check: Systematic debugging is FASTER than guess-and-check cycles.

Systematic: 15-30 minutes to root cause
Guessing: 2-3 hours of whack-a-mole

Anti-Patterns (What NOT to Do)

NEVER do these:

Fix where error appears - Trace backward to source instead
Multiple changes at once - Single variable only
"It's probably X" - Without evidence = guessing
Skip Phase 1 - "I know what's wrong" is usually wrong
Add timeout/retry as fix - Masks bug, doesn't fix it
Claim pattern understood - Without reading entire implementation
Fourth fix attempt - Three strikes means return to Phase 1

Red flags you're about to violate this skill:

"Quick fix, won't take long"
"Let me just try..."
"Should work now" (untested)
"I'll add some error handling"
"Just needs a longer timeout"

Iterative Debug Loop (Ralph Technique)

Complex bugs require multiple investigation cycles. The Ralph technique formalizes this:

Define Completion Criteria FIRST

Before debugging, explicitly state what "fixed" means:

COMPLETION CRITERIA:
- [ ] All 47 tests pass
- [ ] No error in logs during 5-minute run
- [ ] Memory usage stays under 500MB
- [ ] Response time < 200ms

NOT ACCEPTABLE:
- "Looks fixed"
- "Error went away"
- "Works on my machine"

If you can't define completion criteria, you don't understand the problem yet.

Iteration Tracking

Track each debug cycle explicitly:

Iteration	Hypothesis	Action	Result	Learning
1	Race condition in auth	Add mutex	Still fails	Not timing-related
2	Null reference in parser	Add null check	Different error	Closer to source
3	Config not loaded	Fix load order	PASSES	Root cause found

Key insight: Each "failure" narrows the search space. Failed iterations are data, not wasted effort.

Self-Correction Checkpoints

After each iteration, ask:

What did I learn that I didn't know before?
Does this change my hypothesis?
Am I getting closer or circling?
Should I step back and re-investigate?

Iteration Limits

Bug Complexity	Max Iterations Before Stepping Back
Simple	3
Medium	5
Complex/Multi-layer	7

At limit: STOP. Get fresh perspective. Rubber duck. Walk away. Return tomorrow.

Persistence Between Cycles

Your previous work is visible:

Git history shows what you tried
Logs show what you observed
Notes capture hypotheses
Tests document expected behavior

Don't repeat failed approaches. Check git log before trying "new" ideas.

Related Techniques

This skill works with:

root-cause-tracing.md - How to trace bugs to source
defense-in-depth.md - Multiple validation layers
condition-based-waiting.md - Fix flaky timing issues
iterative-debugging-loop.md - Extended Ralph technique guide
find-polluter.sh - Identify test pollution source

Integration with Actually Works Protocol

This skill reinforces the mandatory verification:

Before saying "Fixed":

Ran Phase 4 verification?
Root cause actually addressed (not symptom)?
Added defense layers?
Would bet $100 on this?

Real-World Impact

From debugging sessions:

Root cause tracing: 1847 tests passed, zero pollution
Condition-based waiting: 60% pass rate -> 100%
Defense-in-depth: Bugs made structurally impossible

The math:

Skip investigation: save 10 min, waste 2 hours
Follow process: invest 20 min, solve permanently

Adapted from obra/superpowers systematic-debugging skill Ralph technique concepts from anthropics/claude-plugins-official ralph-wiggum Enhanced for integration with User's "Actually Works" Protocol