investigate
Investigate
Systematic methodology for finding the root cause of bugs, failures, and unexpected behavior. Cycle through characterize-isolate-hypothesize-test steps, with oracle escalation for hard problems. Diagnose the root cause — do not apply fixes.
Optional: $ARGUMENTS contains the problem description or error message.
Step 1: Characterize
Gather the symptom and establish what is actually happening:
- Collect evidence — error message, stack trace, test output, log entries, or user description of unexpected behavior
- Classify the problem type:
| Signal | Type |
|---|---|
| Stack trace / exception | Runtime error |
| Test assertion failure | Test failure |
| Compilation / bundler / build error | Build failure |
| Type checker error (tsc, mypy, pyright) | Type error |
| Slow response / high CPU / memory growth | Performance |
| "It does X instead of Y" / no error | Unexpected behavior |
- Establish reproduction — run the failing command, test, or operation. If the problem cannot be reproduced (intermittent, environment-specific), document the constraints and proceed with historical evidence.
Record the exact reproduction command and its output for verification. For intermittent or long-running reproductions, use the Monitor tool to tail logs filtered for relevant signals (errors, stack traces, specific identifiers) so failures surface live while you work.
Step 2: Isolate
Narrow from "something is wrong" to "the problem is in this area." Read references/problem-type-playbooks.md for type-specific first moves and tool sequences.
Git Archeology
For all problem types, check what changed recently near the failure point:
git log --oneline -20 -- <file>
git blame -L <start>,<end> <file>
If a known-good state exists (e.g., "this worked yesterday"), consider git bisect to pinpoint the breaking commit.
Scope Narrowing
- Stack traces: Read the throwing function and its callers — full functions, not just the flagged line
- Test failures: Read both the test and the system under test
- Build errors: Read the config file and the referenced source
- Unexpected behavior: Trace the data flow from input to the unexpected output
Step 3: Hypothesize
Generate 2-4 hypotheses ranked by likelihood. Each hypothesis must be falsifiable — specify what evidence would confirm or refute it.
Format:
H1 (most likely): [description] — confirmed if [X], refuted if [Y]
H2: [description] — confirmed if [X], refuted if [Y]
H3: [description] — confirmed if [X], refuted if [Y]
Parallel Investigation
For complex problems with 3+ hypotheses and a non-obvious root cause, spawn parallel investigators simultaneously.
Spawn condition: 3+ hypotheses AND the problem is not a simple typo, missing import, or syntax error.
Skip when 1-2 hypotheses are obvious (e.g., stack trace points directly to the bug).
Use the Agent tool to launch all agents below in a single assistant message so they run concurrently. Each Agent call uses model: "opus" and does not set run_in_background. Expect (one Agent per hypothesis + one Codex Agent) total. State the count explicitly when emitting the calls.
- Hypothesis Agent (one per hypothesis): Each receives the hypothesis, relevant file paths, what evidence to look for, and instructions to report confirmed / refuted / inconclusive with evidence. Budget: max 5 tool calls per subagent.
- Codex Agent: Launch one Agent whose prompt instructs the subagent to invoke
/consult-codexvia the Skill tool with a focused prompt describing the problem, reproduction, and files examined. The multi-turn conversation allows it to dig deeper into patterns the hypothesis-driven subagents miss. Run the/evaluate-findingsskill on its output after the Agent returns.
After all investigators complete, merge results. Codex findings that overlap with a subagent's confirmed hypothesis reinforce confidence. Novel codex findings become additional hypotheses to test in Step 4.
Step 4: Test
Verify each hypothesis with minimal, targeted actions:
| Action Type | Tool |
|---|---|
| Find usage or pattern | Grep |
| Read surrounding code | Read |
| Check recent changes | Bash (git log, git blame, git diff) |
| Run isolated test | Bash (specific test command) |
| Check dependency version | Bash (npm ls, pip3 show, etc.) |
| Inspect runtime state | Bash (add temporary logging, run, check output) |
Record each result:
| Hypothesis | Verdict | Evidence |
|---|---|---|
| H1 | confirmed / refuted / inconclusive | [what was found] |
| H2 | confirmed / refuted / inconclusive | [what was found] |
Iteration
If all hypotheses are refuted or inconclusive:
- Document what was learned — each refuted hypothesis eliminates a possibility and narrows the search
- Return to Step 2 with the new information to re-isolate
- Generate new hypotheses in Step 3 based on updated understanding
Cycle budget: maximum 2 full cycles (hypothesize → test → learn → repeat) before escalating.
Escalation
After 2 failed hypothesis cycles, offer escalation to /consult-oracle via AskUserQuestion:
Investigation stalled after [N] hypothesis cycles.
Tested: [summary of hypotheses and evidence]
Remaining unknowns: [what is still unclear]
Escalate to Oracle? (consults external model with full context)
Proceed only if the user approves.
Investigation Report
Output results as text:
Investigation Report:
Problem: [one-line description]
Type: [runtime error | test failure | build failure | type error | performance | unexpected behavior]
Root cause: [confirmed cause, or "unresolved" with best hypothesis]
Evidence:
- [what confirmed the root cause]
Suggested fix: [description of what to change, or "needs further investigation"]
Reproduction command: [command to verify the fix once applied]
Hypotheses tested:
1. [hypothesis] — [confirmed/refuted/inconclusive] — [evidence]
2. [hypothesis] — [confirmed/refuted/inconclusive] — [evidence]
Escalation: [none | oracle]
Then use the TaskList tool and proceed to any remaining task.
Rules
- If the problem turns out to be environmental (wrong Node version, missing dependency, OS-specific), report that clearly — it may not require a code fix.
- If the problem is in a dependency (not the project's code), document the dependency issue and suggest workaround options rather than patching the dependency.
More from tobihagemann/turbo
find-dead-code
Find dead code using parallel subagent analysis and optional CLI tools, treating code only referenced from tests as dead. Use when the user asks to \"find dead code\", \"find unused code\", \"find unused exports\", \"find unreferenced functions\", \"clean up dead code\", or \"what code is unused\". Analysis-only — does not modify or delete code.
30simplify-code
Run a multi-agent review of changed files for reuse, quality, efficiency, and clarity issues followed by automated fixes. Use when the user asks to \"simplify code\", \"review changed code\", \"check for code reuse\", \"review code quality\", \"review efficiency\", \"simplify changes\", \"clean up code\", \"refactor changes\", or \"run simplify\".
23smoke-test
Launch the app and hands-on verify that it works by interacting with it. Use when the user asks to \"smoke test\", \"test it manually\", \"verify it works\", \"try it out\", \"run a smoke test\", \"check it in the browser\", or \"does it actually work\". Not for unit/integration tests.
22finalize
Run the post-implementation quality assurance workflow including tests, code polishing, review, and commit. Use when the user asks to \"finalize implementation\", \"finalize changes\", \"wrap up implementation\", \"finish up\", \"ready to commit\", or \"run QA workflow\".
22self-improve
Extract lessons from the current session and route them to the appropriate knowledge layer (project AGENTS.md, auto memory, existing skills, or new skills). Use when the user asks to \"self-improve\", \"distill this session\", \"save learnings\", \"update CLAUDE.md with what we learned\", \"capture session insights\", \"remember this for next time\", \"extract lessons\", \"update skills from session\", or \"what did we learn\".
22evaluate-findings
Critically assess external feedback (code reviews, AI reviewers, PR comments) and decide which suggestions to apply using adversarial verification. Use when the user asks to \"evaluate findings\", \"assess review comments\", \"triage review feedback\", \"evaluate review output\", or \"filter false positives\".
22