investigate
Investigate — Root Cause Analysis Engine
Systematic deep investigation protocol. Finds the REAL cause, not the surface symptom.
Core principle: Never fix what you don't understand. Every fix must trace to a proven root cause with evidence.
Protocol
Process every /investigate invocation through these 8 phases in strict order. Never skip a phase. Never jump to Phase 7 (FIX) without completing Phases 1-6.
Phase 1: OBSERVE — Gather All Symptoms
Collect every observable fact before forming any theory.
- Parse
$ARGUMENTSas the symptom description - Ask the user for additional context if the description is vague — use AskUserQuestion:
- What's the expected behavior vs actual behavior?
- When did it start? What changed recently?
- Is it consistent or intermittent?
- Any error messages, logs, or stack traces?
- Check memory files for known pitfalls related to this area:
- Read MEMORY.md and any topic-specific memory files
- Check CLAUDE.md for documented patterns
- Gather environmental context:
- Run
git log --oneline -20to see recent changes - Run
git diff --stat HEAD~5to see what files changed recently - Check for any failing tests with the project's test runner
- Run
Output: A symptom report listing every observable fact, recent changes, and any relevant memory entries.
Gate: Do NOT theorize yet. Only facts.
Phase 2: REPRODUCE — Confirm the Issue
An issue you cannot reproduce is an issue you cannot prove you fixed.
- Identify the shortest path to trigger the symptom:
- Run existing tests that cover the affected area
- If no test exists, attempt manual reproduction via Bash
- For UI issues, check if a Playwright MCP sequence can reproduce it
- Document the reproduction steps precisely
- If the issue is intermittent:
- Flag it as potentially timing-dependent (race condition, async, state)
- Look for concurrent access, shared mutable state, missing locks/guards
- Check for dependency on external state (network, filesystem, database)
- If the issue cannot be reproduced:
- Shift to forensic investigation (logs, git history, code review)
- Do NOT skip remaining phases — proceed with available evidence
Output: Reproduction steps, or explicit documentation of why reproduction failed.
Gate: Issue confirmed (or forensic mode declared). Proceed.
Phase 3: TRACE — Follow the Execution Path
Start from the symptom and trace backward to the origin.
- Locate the symptom — find the exact file and line where the error occurs:
- Use Grep for error messages, exception types, log strings
- Use Explore agent for broad searches if the location is unclear
- Trace the call chain — read every file in the execution path:
- From error site → caller → caller's caller → entry point
- Read each file fully with Read tool — do NOT skim
- Document the complete flow: input → transform → output
- Trace the data flow — follow the data that caused the error:
- What value caused the crash? Where did it come from?
- Trace the value backward: variable → assignment → source → input
- Map dependencies — what else touches this code path:
- Use Grep to find all callers of the failing function
- Check for shared state, singletons, global variables
- Look for recent changes in dependencies with
git log --oneline -- <file>
- Check git forensics — when was the problem introduced:
git log --oneline -- <affected-files>— who changed these files and when?git blame <file>on the suspicious lines — what commit introduced them?- If a clear suspect commit is found, read its full diff
Output: Complete execution trace with file paths and line numbers. Data flow map. Git timeline.
Gate: The full code path from entry point to symptom is mapped and understood.
Phase 4: HYPOTHESIZE — Deep Reasoning with 5 Whys
This phase MUST use the sequential-thinking MCP server for structured multi-step reasoning.
- Start the sequential-thinking chain with the symptom and all evidence from Phases 1-3
- Apply the 5 Whys method — for each answer, ask "but why does THAT happen?":
Symptom: App crashes when tapping a document Why 1: DocumentDetailView accesses a deleted NSManagedObject Why 2: The object was deleted from Core Data while the view held a reference Why 3: context.delete() was called from a background operation Why 4: The background sync didn't check if the view was still displaying the object Why 5: There's no soft-delete pattern — objects are hard-deleted immediately ROOT CAUSE: Missing soft-delete guard in the sync pipeline - Generate at least 2 competing hypotheses — don't lock on the first theory:
- Categorize each by type: Code Logic | Data State | Timing/Race | Environment | Dependency | Configuration
- For each hypothesis, define what evidence would prove or disprove it
- Use branching in sequential-thinking to explore alternative explanations:
branchFromThought: 3, branchId: "alternative-cause" - Rank hypotheses by likelihood based on available evidence
Output: Ranked list of hypotheses with evidence requirements for each.
Gate: At least 2 hypotheses generated. Each has defined proof criteria.
Phase 5: PROVE — Test Each Hypothesis with Evidence
Systematically confirm or eliminate each hypothesis. No guessing.
For each hypothesis (highest-ranked first):
- Gather confirming evidence:
- Read the specific code paths predicted by the hypothesis
- Check logs/output for patterns the hypothesis predicts
- Run targeted tests that would pass if the hypothesis is correct
- Use
git blame/git logto check if timing matches
- Gather disconfirming evidence:
- Look for code paths that should also fail if the hypothesis is correct but don't
- Check edge cases that contradict the hypothesis
- Check external sources:
- Use WebSearch for known issues in the library/framework version
- Use library-docs skill (context7 MCP) to verify correct API usage
- Search GitHub issues for the library:
mcp__github__search_issues
- Verdict per hypothesis:
- CONFIRMED — evidence supports it, no contradictions
- ELIMINATED — evidence contradicts it
- INCONCLUSIVE — need more evidence (define what)
If all hypotheses are eliminated: Return to Phase 4 with new evidence. Generate new hypotheses.
Output: Evidence log per hypothesis. One confirmed root cause (or request for more data).
Gate: Exactly one root cause confirmed with evidence. Or an explicit statement that the cause requires additional data from the user (with specific questions).
Phase 6: ROOT CAUSE — Document the Causal Chain
Write the definitive explanation before touching any code.
- Document the complete causal chain:
ROOT CAUSE: <the deepest systemic issue> → causes: <intermediate effect> → causes: <intermediate effect> → manifests as: <the symptom the user reported> - Explain why this is the root cause (not just a proximate cause):
- If fixed, would it prevent recurrence? (yes = root cause)
- Is there a deeper cause? (if yes, keep digging)
- Identify the blast radius — what else is affected:
- Are there similar patterns elsewhere in the codebase?
- Use Grep to find analogous code that may have the same bug
- Present the root cause analysis to the user before proceeding to fix
Output: Root cause statement, causal chain, blast radius assessment.
Gate: User understands and agrees with the diagnosis before any fix is attempted.
Phase 7: FIX — Address the Root Cause
Fix the root cause, not the symptom. Minimal, targeted change.
- Design the fix:
- What is the minimum change that eliminates the root cause?
- Does the fix handle all cases in the blast radius (Phase 6)?
- Does the fix introduce any new risks?
- Implement the fix:
- Read every file before modifying it
- Make the smallest change possible
- Add inline comments only where the fix is non-obvious
- Verify the fix:
- Run the reproduction steps from Phase 2 — symptom should be gone
- Run existing tests — no regressions
- Run code-quality agent on modified files if the change is substantial
- Check for similar patterns:
- If the bug was a pattern (e.g., missing null check), search for the same pattern elsewhere
- Fix all instances, not just the reported one
Output: Code changes with explanation of what was changed and why.
Phase 8: PREVENT — Ensure It Never Recurs
The investigation isn't complete until recurrence is prevented.
- Add a regression test that would have caught this bug:
- The test must fail without the fix and pass with it
- Use test-automation agent for comprehensive test generation
- Update project memory if a new pitfall was discovered:
- Add to MEMORY.md under Common Pitfalls
- Include the pattern, why it's dangerous, and the safe alternative
- Suggest structural improvements (optional, only if the bug reveals a design flaw):
- Propose architectural changes that make this class of bug impossible
- Present as a suggestion, not an immediate action
- Write the investigation summary:
## Investigation Report
**Symptom:** <what was reported>
**Root Cause:** <the deepest systemic issue>
**Causal Chain:** root cause → ... → symptom
**Fix:** <what was changed, which files>
**Blast Radius:** <other areas checked/fixed>
**Regression Test:** <test added>
**Prevention:** <memory updated, guard added, pattern documented>
**Time:** <phases completed, hypotheses tested>
Tool Usage by Phase
| Phase | Primary Tools | When to Use Agents |
|---|---|---|
| 1. OBSERVE | Read, Grep, Bash (git log) | — |
| 2. REPRODUCE | Bash (test runner), Playwright MCP | — |
| 3. TRACE | Read, Grep, Glob, Bash (git blame) | Explore agent for broad searches |
| 4. HYPOTHESIZE | sequential-thinking MCP | deep-analysis skill |
| 5. PROVE | Read, Grep, Bash, WebSearch, context7 MCP | library-docs skill, GitHub MCP |
| 6. ROOT CAUSE | Read, Grep | Explore agent for blast radius |
| 7. FIX | Read, Edit, Write, Bash | code-quality agent for review |
| 8. PREVENT | Write, Edit, Bash | test-automation agent for tests |
Anti-Patterns — What This Skill Prevents
| Bad Habit | What /investigate Does Instead |
|---|---|
| Jump straight to fixing | Forces Phases 1-6 before any code change |
| Fix the symptom | 5 Whys drills to root cause |
| Single theory tunnel vision | Requires 2+ competing hypotheses |
| "It works now" without understanding | Demands evidence-based proof |
| Fix one instance, miss others | Blast radius analysis in Phase 6 |
| No regression test | Phase 8 mandates a test |
| Knowledge lost | Memory update in Phase 8 |
When to Use /investigate vs Other Tools
| Situation | Use |
|---|---|
| Bug, crash, error, unexpected behavior | /investigate |
| Build a new feature | /execute |
| Quick "what does this code do?" | Explore agent directly |
| Performance slow but unclear why | /investigate (treat slowness as symptom) |
| Known fix, just need to apply it | Direct Edit — no investigation needed |
| Security vulnerability found | /investigate + security-scan |
References
See references/investigation-frameworks.md for detailed methodology guides.