Hypothesis-Driven Debugging

A four-phase loop that turns debugging from "try random fixes and hope" into a disciplined investigation. Each phase has a goal, hard rules, and a rationalization table for the excuses an agent will invent to skip it.

The core principle: you may not write a fix until you have evidence that your hypothesis is correct. Guessing is not debugging.

When to Use

A test fails and the cause is not immediately obvious
The same bug has been "fixed" twice and came back
The agent tried a fix that didn't work — stop it from trying another
A crash or error message you haven't seen before
Performance regression with no obvious culprit
Behavior differs between environments (local vs CI, dev vs prod)
The agent is stuck in a loop, applying the same wrong fix

When NOT to Use

Typos, missing imports, or syntax errors — just fix them
Build failures with an obvious single-line cause
Compiler/linter messages that tell you exactly what and where
You already know the root cause and just need to write the fix

If the bug survived one fix attempt, switch to this skill immediately.

The Debug Loop

  OBSERVE  ──▶  HYPOTHESIZE  ──▶  EXPERIMENT  ──▶  CONCLUDE
     │              │                  │               │
     ▼              ▼                  ▼               ▼
  Gather          List 3-5         One minimal       Root cause
  symptoms,       possible         test per          confirmed
  reproduce       causes +         hypothesis,       or loop
  reliably        evidence         max 5 lines       back
     │              │                  │               │
     └──────────────┴──────────────────┴───────────────┘
              write everything to DEBUG.md

Hard rules:

Everything gets written to DEBUG.md. Context compaction will eat your reasoning if it only lives in the conversation.
You may not write fix code during Observe, Hypothesize, or Experiment.
You may not skip Hypothesize. "I think I know what it is" is a hypothesis — write it down and test it like the others.
Each experiment changes at most 5 lines. If your experiment needs more, your hypothesis is too vague — split it.

Phase 1: OBSERVE

Goal. Collect raw facts. Reproduce the bug. Separate what you know from what you assume.

Steps.

Reproduce the bug. Get the exact error message, stack trace, or wrong output. If you cannot reproduce it, that is your first finding.
Find the minimal reproduction. Strip away unrelated code until the bug still appears.
Record the environment: OS, runtime version, dependencies, config.
Note what does work. The boundary between working and broken is where the bug lives.
Write all observations to DEBUG.md under ## Observations.

Exit criteria.

Bug reproduced (or documented as non-reproducible with conditions)
Exact error message or wrong behavior recorded
Minimal reproduction identified
Observations written to DEBUG.md

Common Rationalizations

Excuse	Reality
"I already know what's wrong"	Then write it as a hypothesis and prove it. If you're right, it takes 2 minutes.
"Let me just try this quick fix first"	That's how you end up 45 minutes deep with 6 failed attempts.
"The error message is clear enough"	Error messages describe symptoms, not causes. `NullPointerException` tells you what died, not why.
"I don't need to reproduce it, I can see the bug in the code"	Can you? Then why hasn't it been fixed yet?

Phase 2: HYPOTHESIZE

Goal. Generate 3-5 possible root causes. For each, list supporting and conflicting evidence from Phase 1. Rank by likelihood.

Steps.

List 3-5 hypotheses. Not 1. Not "I think it's X." Three minimum. Think across categories:
- Data: wrong input, missing field, type mismatch, encoding
- Logic: wrong condition, off-by-one, race condition, wrong order
- Environment: config, version, dependency, permissions
- State: stale cache, leaked state, initialization order
For each hypothesis, write:
- Supports: evidence from observations that backs this theory
- Conflicts: evidence that argues against it
- Test: the minimal experiment that would prove or disprove it
Mark the ROOT HYPOTHESIS — the one with supporting evidence and no conflicting evidence. If multiple qualify, pick the easiest to test.
Write everything to DEBUG.md under ## Hypotheses.

Example format in DEBUG.md:

## Hypotheses

### H1: Race condition in session middleware (ROOT HYPOTHESIS)
- Supports: only happens under concurrent requests, timing-dependent
- Conflicts: none yet
- Test: add mutex lock around session read, check if bug disappears

### H2: Stale cache returning expired token
- Supports: works after restart (cache cleared)
- Conflicts: cache TTL is 5min, bug appears within 30s
- Test: disable cache, reproduce

### H3: Wrong env variable in CI
- Supports: works locally, fails in CI
- Conflicts: env diff shows identical values
- Test: print actual runtime value in CI logs

Exit criteria.

At least 3 hypotheses written
Each has supporting/conflicting evidence
Each has a specific, minimal test
ROOT HYPOTHESIS identified
All written to DEBUG.md

Common Rationalizations

Excuse	Reality
"I only have one theory"	You have one favorite theory. Think harder. What if it's not that?
"Writing this down is slow"	Debugging without writing is slower. You'll forget hypothesis 2 after compaction eats it.
"The first hypothesis is obviously right"	Then proving it takes 2 minutes. If you skip proof, you'll spend 30 minutes when it turns out wrong.
"I don't have conflicting evidence"	That means you haven't looked hard enough, or it really is the root cause. Either way, test it.

Phase 3: EXPERIMENT

Goal. Test the ROOT HYPOTHESIS with the smallest possible change. You are a scientist — you are trying to falsify, not confirm.

Steps.

Write the experiment before running it. What will you change? What result confirms the hypothesis? What result rejects it?
Make the change. Maximum 5 lines. If you need more, your hypothesis is too vague.
Run the reproduction from Phase 1.
Record the result in DEBUG.md under ## Experiments.

Experiment rules.

One variable at a time. Do not combine two fixes "to save time."
Do not write production fix code. Write diagnostic code: log statements, assertions, simplified logic, hardcoded values.
Revert the experiment after recording results. Keep the tree clean.
If the experiment is inconclusive, that's a result — record it and test the next hypothesis.

Exit criteria.

Experiment executed (one change, one variable)
Result recorded: confirmed, rejected, or inconclusive
Experimental code reverted
Results written to DEBUG.md

Common Rationalizations

Excuse	Reality
"Let me just fix it instead of testing"	Fixing without confirming the cause is how you ship a wrong fix that breaks something else.
"I'll test two things at once to save time"	When both change and the bug disappears, which one fixed it? Now you have to test again.
"5 lines isn't enough"	5 lines is enough to add a log, an assertion, a hardcoded value, or a short-circuit. If it isn't, your hypothesis is "something is wrong somewhere" — not a hypothesis.
"I don't need to revert, the fix is basically the experiment"	The experiment is diagnostic. The fix is production code. They have different quality bars.

Phase 4: CONCLUDE

Goal. Confirm root cause, write the real fix, and add a regression test.

Steps.

If ROOT HYPOTHESIS confirmed:
- Write the root cause in one sentence in DEBUG.md.
- Now — and only now — write production fix code.
- Add a regression test that fails without the fix and passes with it.
- Commit fix and test together.
If ROOT HYPOTHESIS rejected:
- Record the rejection and evidence in DEBUG.md.
- Promote the next hypothesis to ROOT. Return to Phase 3.
- If all hypotheses rejected, return to Phase 1 with new observations.
Update DEBUG.md with the final ## Root Cause and ## Fix sections.

Exit criteria.

Root cause identified and written in one sentence
Fix committed
Regression test committed
DEBUG.md complete with full investigation trail
Original reproduction case now passes

Common Rationalizations

Excuse	Reality
"I don't need a regression test, it's a simple fix"	Simple fixes for simple bugs don't need this skill. You're here because it wasn't simple. Add the test.
"The DEBUG.md is just for debugging, I'll delete it"	Keep it. Future-you debugging the same area will thank present-you.
"All hypotheses failed, I'm stuck"	Go back to Observe. You missed something. The bug exists, therefore a cause exists.

The Anti-Bulldozer Rule

The #1 failure mode of AI debugging: the agent forms a theory, writes 150 lines of "fix" code, it doesn't work, so it writes another 150 lines going deeper into the same wrong theory.

This skill exists to prevent that. If you catch yourself or the agent:

Writing more than 5 lines before confirming a hypothesis → STOP. Back to Phase 2.
Trying the same approach a second time → STOP. The hypothesis is rejected. Next one.
Ignoring conflicting evidence → STOP. Write it down. Re-rank hypotheses.
Feeling "almost there" after 3 failed attempts → STOP. You are bulldozing.

Write it down. Test it. Prove it. Then fix it.

debug-hypothesis

Hypothesis-Driven Debugging

When to Use

When NOT to Use

The Debug Loop

Phase 1: OBSERVE

Phase 2: HYPOTHESIZE

Phase 3: EXPERIMENT

Phase 4: CONCLUDE

The Anti-Bulldozer Rule

More from lichamnesia/lich-skills

nano-banana

tavily-search

spec-driven-dev

wiki-aggregate