debug-investigator
Debug Investigator
Structured debugging methodology that replaces ad-hoc exploration with hypothesis-driven investigation. Captures symptoms, analyzes evidence (stacktraces, logs, state), generates ranked hypotheses, designs bisection strategies, identifies instrumentation points, and produces minimal reproductions — documenting every step so dead ends are never revisited.
When to use this skill vs native debugging: The base model handles straightforward debugging (clear stacktraces, obvious errors) natively. Use this skill for non-obvious bugs requiring systematic investigation: intermittent failures, bugs with no clear stacktrace, performance regressions, or issues requiring git bisection and hypothesis ranking.
Reference Files
| File | Contents | Load When |
|---|---|---|
references/stacktrace-patterns.md |
Exception taxonomy, traceback reading, common Python/JS error signatures | Stacktrace or exception present |
references/hypothesis-templates.md |
Bug category catalog, probability ranking, confirmation/refutation tests | Always |
references/bisection-guide.md |
git bisect workflow, binary search debugging, narrowing techniques | Bug appeared after a change |
references/log-analysis.md |
Log pattern extraction, anomaly detection, timeline correlation | Log output available |
references/instrumentation-points.md |
Strategic logging placement, breakpoint strategy, state inspection techniques | Investigation plan needed |
Prerequisites
- git — for bisection and history analysis
- Access to source code — cannot debug opaque binaries
- Reproducible environment — or at minimum, error output (stacktrace, logs)
Workflow
Phase 1: Symptom Capture
Before touching code, document the observable problem:
- What is happening? — Describe the observed behavior precisely. "It crashes" is
insufficient. "Raises
KeyError('user_id')on line 42 ofauth.pywhen callingget_current_user()with a valid session token" is actionable. - What should happen? — Define the expected behavior. If unknown, state that.
- Reproducibility — Always, intermittent (with frequency), or one-time? Intermittent bugs require different strategies than deterministic ones.
- Recency — When did this start? Correlate with recent changes:
git log --oneline -20. If the bug appeared after a specific commit, bisection is the fastest path. - Environment — Python version, OS, dependency versions, configuration differences between working and broken environments.
Phase 2: Evidence Analysis
Examine all available evidence before forming hypotheses:
-
Stacktrace interpretation — If a traceback exists, read it bottom-up. The last frame is where the error manifested, but the cause is often several frames up. Identify:
- Exception type and message
- The frame where the error originated vs. where it was raised
- Any familiar patterns (see
references/stacktrace-patterns.md)
-
Log pattern extraction — Search logs for:
- Temporal anomalies (timestamps out of sequence, gaps)
- Repeated errors (same error appearing in bursts)
- State transitions that didn't complete
- Correlation with external events (deploys, config changes)
-
State inspection — If the system is running, inspect:
- Variable values at the failure point
- Database state (missing rows, unexpected values)
- Configuration values (environment variables, config files)
- External dependency status (API availability, DB connectivity)
-
Code diff analysis — If the bug is recent:
git diff HEAD~5— what changed?- Focus on files touched by the error's call chain
- Look for typos, wrong variable names, missing null checks
Phase 3: Hypothesis Generation
Generate ranked hypotheses — never start fixing without a hypothesis:
-
List 3-5 hypotheses ranked by likelihood. Each hypothesis must include:
- A concrete claim about what is wrong
- What evidence supports it
- What evidence would confirm it (a test you can run)
- What evidence would refute it
-
Rank by likelihood using:
- Proximity to recent changes (most bugs are in new code)
- Simplicity (typos before race conditions)
- Evidence fit (does the hypothesis explain ALL symptoms?)
-
Common bug categories (see
references/hypothesis-templates.md):- State bugs: wrong value, missing initialization, stale cache
- Logic bugs: off-by-one, wrong operator, inverted condition
- Integration bugs: API contract mismatch, serialization error
- Concurrency bugs: race condition, deadlock, resource starvation
- Environment bugs: missing dependency, wrong config, version mismatch
Phase 4: Investigation Plan
Design specific steps to test each hypothesis:
- Test H1 first — Always test the most likely hypothesis first. Design a single action that will confirm or refute it.
- Bisection — If the bug appeared after a change and H1 fails:
- Identify the known-good and known-bad commits
- Run
git bisect start <bad> <good> - Define the test command for each commit
- See
references/bisection-guide.mdfor workflow
- Isolation — Remove variables one at a time:
- Simplify input data
- Disable features/plugins
- Replace external calls with hardcoded values
- Run in a clean environment
- Instrumentation — Add targeted logging/breakpoints:
- At function entry/exit points in the call chain
- Before and after state mutations
- At decision points (if/else branches)
- See
references/instrumentation-points.md
Phase 5: Execution
Execute the investigation plan, updating hypotheses as evidence arrives:
- Test one variable at a time — Changing multiple things simultaneously makes results uninterpretable.
- Record results — Document what each test revealed, even negative results. Dead-end documentation prevents revisiting failed paths.
- Update probabilities — After each test, re-rank hypotheses. If H1 is refuted, H2 becomes the new priority.
- Know when to escalate — If all hypotheses are exhausted, the bug is in a category you haven't considered. Step back and re-examine assumptions.
Phase 6: Resolution Documentation
After finding the root cause:
- Root cause — What was actually wrong, precisely.
- Fix — What was changed and why.
- Prevention — How to prevent recurrence (test, lint rule, type check, etc.).
- Lessons — What was learned that applies beyond this specific bug.
Output Format
## Debug Investigation: {Brief Description}
### Symptom
**Observed:** {What is happening — precise description}
**Expected:** {What should happen}
**Reproducibility:** {Always | Intermittent (~N% of attempts) | Once}
**First noticed:** {Date/time or triggering event}
**Environment:** {Relevant versions and configuration}
### Evidence Analysis
#### Stacktrace
- **Exception:** {type}: {message}
- **Origin:** {file}:{line} in {function}
- **Call chain:** {caller} → {caller} → {failure point}
- **Key insight:** {What the traceback reveals about the cause}
#### Logs
- **Anomaly:** {What is unusual}
- **Timeline:** {When the anomaly started}
- **Correlation:** {Related events}
#### Code Changes
- **Recent commits:** {relevant commits since last known-good state}
- **Files in error path:** {which changed files appear in the traceback}
### Hypotheses
| # | Hypothesis | Likelihood | Confirming Test | Refuting Test |
|---|------------|------------|-----------------|---------------|
| H1 | {Specific claim} | High | {What to check} | {What would disprove} |
| H2 | {Specific claim} | Medium | {What to check} | {What would disprove} |
| H3 | {Specific claim} | Low | {What to check} | {What would disprove} |
### Investigation Plan
#### Step 1: Test H1 — {action}
- **Command/action:** {specific step}
- **If confirmed:** {next action — fix}
- **If refuted:** proceed to Step 2
#### Step 2: Bisection
- **Good commit:** {hash}
- **Bad commit:** {hash}
- **Test:** {command to verify each commit}
- **Command:** `git bisect start {bad} {good}`
#### Step 3: Isolation
- **Remove:** {variable to eliminate}
- **Expected change:** {what should happen}
### Instrumentation Points
1. {file}:{line} — log {variable/state} to observe {what}
2. {file}:{line} — breakpoint to inspect {what}
### Minimal Reproduction
```{language}
# Minimal code that triggers the bug
{code}
Resolution
Root cause: {What was wrong} Fix: {What was changed — file:line, diff summary} Prevention: {Test added, lint rule, type annotation, etc.} Lessons: {What generalizes beyond this bug}
## Configuring Scope
| Mode | Scope | Depth | When to Use |
|------|-------|-------|-------------|
| `quick` | Single error | H1 test + fix | Clear stacktrace, obvious cause |
| `standard` | Full investigation | 3 hypotheses + bisection plan | Default for non-obvious bugs |
| `deep` | Systemic analysis | 5+ hypotheses + instrumentation + reproduction | Intermittent bugs, no stacktrace, production issues |
## Calibration Rules
1. **Hypotheses before code changes.** Never start modifying code without at least one
explicit hypothesis. "Let me try this" is not debugging — it's guessing.
2. **One variable at a time.** Each investigation step should change exactly one thing.
If you change two things and the bug disappears, you don't know which fixed it.
3. **Document dead ends.** Failed hypotheses are valuable — they narrow the search space.
Record what was tested and what was learned.
4. **Simplest explanation first.** Test typos, wrong variable names, and missing imports
before considering race conditions, compiler bugs, or cosmic rays.
5. **Reproduce before fixing.** If you cannot reproduce the bug in a controlled environment,
any fix is speculative. Invest in reproduction first.
6. **Root cause, not symptoms.** A fix that addresses the symptom (adding a null check)
without understanding the root cause (why was it null?) leaves the real bug alive.
## Error Handling
| Problem | Resolution |
|---------|------------|
| No stacktrace available | Focus on log analysis and state inspection. Use instrumentation to generate diagnostic output. |
| Bug is intermittent | Add persistent logging at key decision points. Run under stress (high load, concurrent requests) to increase reproduction rate. |
| Cannot reproduce locally | Compare environments systematically: versions, config, data, timing. Use `docker` or VM to mirror production. |
| Multiple hypotheses equally likely | Design a single test that distinguishes between them. Binary decision: "If X, then H1; if Y, then H2." |
| Fix attempted but bug persists | The hypothesis was wrong. Revert the fix, update hypothesis rankings, and proceed to the next hypothesis. Do not stack fixes. |
| Bug is in a dependency | Confirm with a minimal reproduction that uses only the dependency. Check issue trackers. Pin to last known-good version while awaiting upstream fix. |
## When NOT to Investigate
Push back if:
- The error message already contains the fix ("missing module X" → install X)
- The issue is a known environment setup problem (wrong Python version, missing env var)
- The "bug" is actually a feature request or design disagreement — redirect to ADR or discussion
- The code is not under the user's control (third-party SaaS, managed service) — file a support ticket instead
- The user wants to debug generated/minified code — debug the source, not the output