systematic-debugging
Systematic Debugging
Overview
Random fixes waste time and create new bugs. Quick patches mask underlying issues.
Core principle: ALWAYS find root cause before attempting fixes.
When to Use
Use for ANY technical issue: test failures, unexpected behavior, pipeline errors, build failures, Galaxy workflow errors, notebook exceptions, environment issues.
Use ESPECIALLY when:
- "Just one quick fix" seems obvious
- You've already tried multiple fixes
- Previous fix didn't work
- You don't fully understand the issue
Supporting Files
- root-cause-tracing.md - Trace bugs backward through call chain to find the original trigger. Instrumentation techniques, stack trace analysis.
- defense-in-depth.md - Add validation at multiple layers after finding root cause. Entry point, business logic, environment guards, debug logging.
The Four Phases
Complete each phase before proceeding to the next.
Phase 1: Root Cause Investigation
BEFORE attempting ANY fix:
-
Read error messages carefully
- Don't skip past errors or warnings — they often contain the solution
- Read stack traces completely
- Note line numbers, file paths, error codes
-
Reproduce consistently
- Can you trigger it reliably? What are the exact steps?
- If not reproducible, gather more data — don't guess
-
Check recent changes
- Git diff, recent commits, new dependencies
- Config changes, environmental differences
-
Gather evidence in multi-component systems
- For pipelines (Galaxy workflow → tool → data), log what enters and exits each component
- Run once with diagnostics to see WHERE it breaks
- Then investigate that specific component
-
Trace data flow
- Where does the bad value originate? (See root-cause-tracing.md)
- Keep tracing up the call chain until you find the source
- Fix at source, not at symptom
Phase 2: Pattern Analysis
- Find working examples — similar working code in same codebase
- Compare against references — read reference implementation completely, don't skim
- Identify differences — list every difference, however small
- Understand dependencies — settings, config, environment, assumptions
Phase 3: Hypothesis and Testing
- Form single hypothesis — "I think X is the root cause because Y"
- Test minimally — smallest possible change, one variable at a time
- Verify — did it work? If not, form NEW hypothesis. Don't pile fixes on top.
- When you don't know — say so. Don't pretend. Research more.
Phase 4: Implementation
- Create failing test/reproduction — simplest possible, automated if possible
- Implement single fix — address root cause, ONE change, no "while I'm here" improvements
- Verify fix — test passes? No other tests broken? Issue resolved?
- If fix doesn't work:
- Count fixes attempted
- If < 3: return to Phase 1, re-analyze with new information
- If >= 3: STOP — question the architecture (see below)
When 3+ Fixes Fail
Pattern indicating architectural problem:
- Each fix reveals new issues in different places
- Fixes require "massive refactoring"
- Each fix creates new symptoms elsewhere
STOP and discuss with the user before attempting more fixes. This is not a failed hypothesis — it's a wrong approach.
Red Flags — STOP and Return to Phase 1
If you catch yourself thinking:
- "Quick fix for now, investigate later"
- "Just try changing X and see if it works"
- "It's probably X, let me fix that"
- "I don't fully understand but this might work"
- Proposing solutions before tracing data flow
- "One more fix attempt" when already tried 2+
Common Rationalizations
| Excuse | Reality |
|---|---|
| "Issue is simple, don't need process" | Simple issues have root causes too |
| "Emergency, no time for process" | Systematic is FASTER than guess-and-check |
| "Just try this first, then investigate" | First fix sets the pattern. Do it right. |
| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
| "I see the problem, let me fix it" | Seeing symptoms != understanding root cause |
Quick Reference
| Phase | Key Activities | Done when |
|---|---|---|
| 1. Root Cause | Read errors, reproduce, check changes, trace data | Understand WHAT and WHY |
| 2. Pattern | Find working examples, compare | Differences identified |
| 3. Hypothesis | Form theory, test minimally | Confirmed or new hypothesis |
| 4. Implementation | Create test, fix, verify | Bug resolved, tests pass |
Attribution
Adapted from obra/superpowers systematic-debugging skill.