debug
<live_context>
Recent git changes (regression candidates):
!git log --oneline -10 2>/dev/null || echo "(not a git repo)"
Working tree status:
!git status --short 2>/dev/null || echo "(not a git repo)"
</live_context>
<trigger_examples>
- "Debug why the form submission fails with a 500 error"
- "The dashboard page shows a blank screen after login"
- "API returns 403 but the user should be authorized"
- "Investigate this Sentry issue: PROJECT-123"
- "The form submits but nothing appears in the list"
- "Debug the notification queue — jobs are stuck" </trigger_examples>
<phase_1_triage> Classify the bug before investigating. This determines which agents to dispatch.
- Read the error description, stack trace, or reproduction steps provided by the user.
- Determine the affected layers:
| Layer | Signals |
|---|---|
| Frontend | UI doesn't render, hydration errors, blank pages, console errors, wrong data displayed |
| API | HTTP error codes (4xx/5xx), validation failures, timeout, wrong response body |
| Database | Missing data, wrong query results, migration issues, connection errors |
| Queue/Worker | Jobs stuck, not processing, duplicate execution, Redis connectivity |
| Infrastructure | Docker containers down, ports in use, env vars missing |
| Cross-cutting | Data flows correctly in one layer but breaks in another |
- Check for quick context — run these in parallel:
# Recent changes that might have introduced the bug
git log --oneline -10
git diff HEAD~3 --stat
# Infrastructure health
docker compose ps
- If the user provided a Sentry issue URL or error ID, query Sentry for the full stack trace and event details.
- If the user mentions production errors, check PostHog error tracking for frequency and user impact.
State the triage result to the user:
TRIAGE: [layer(s)] — Brief description of what appears to be happening. </phase_1_triage>
<phase_2_execution_mode> Based on the triage, decide the execution mode:
SINGLE-LAYER — The bug is isolated to one layer. Debug directly without dispatching agents.
MULTI-LAYER — The bug spans 2+ layers. Act as orchestrator and dispatch specialized debugging agents.
State the decision:
EXECUTION MODE: SINGLE-LAYER — I will investigate and fix this directly.
OR
EXECUTION MODE: MULTI-LAYER — I will dispatch specialized agents to investigate each layer in parallel. </phase_2_execution_mode>
<phase_3_reproduce> Reproduce the bug before investigating. Never fix what you can't reproduce.
<reproduction_strategies>
API bugs — Use curl or Bash to hit the endpoint directly:
# Discover the API base URL and endpoints from CLAUDE.md, package.json, or route files
curl -v http://localhost:<port>/<endpoint> | jq .
Frontend bugs — Use browser automation MCPs:
- Call
mcp__chrome-devtools__list_pagesormcp__plugin_playwright_playwright__browser_tabsto check current state. - Navigate to the affected page and take a screenshot.
- Check the browser console for errors.
- Inspect network requests for failed API calls.
Queue/Worker bugs — Check Redis and BullMQ state:
# Check Redis connectivity
redis-cli ping
# Check queue state — discover the dev command from package.json scripts
# e.g., yarn dev, npm run dev, or the relevant workspace command
Database bugs — Query directly:
# Discover database credentials and container names from docker-compose.yml or .env
docker compose exec <db-container> psql -U <user> -c "SELECT * FROM ... LIMIT 5"
If reproduction fails, add strategic logging and retry. See references/debug-dispatch.md for logging patterns. </reproduction_strategies> </phase_3_reproduce>
<phase_4_investigate> With the bug reproduced, investigate the root cause.
<single_layer_investigation> Follow the data flow from the error point backward:
- Read the source code at the error location.
- Trace inputs — where does the data come from? What transformations happen?
- Form a hypothesis about the root cause.
- Test the hypothesis — add logging, inspect state, check the database.
- If the hypothesis is wrong, form a new one based on what you learned.
Load relevant skills before diving in:
- Load skills matching the technologies used in the project (discover from package.json).
- Common examples: backend framework skills, ORM/database skills, frontend framework skills, queue/worker skills.
- Read CLAUDE.md and package.json to identify the project's tech stack before selecting skills.
When the bug involves library API misuse, version-specific behavior, or unclear method signatures, query Context7 for up-to-date documentation:
mcp__context7__resolve-library-id— resolve the library name (e.g., "fastify", "drizzle-orm", "bullmq") to its Context7 ID.mcp__context7__query-docs— query with the specific API, method, or error message you're investigating.
Use Context7 when:
- The error message references a library API you're unsure about.
- You suspect a breaking change between versions (check the project's package.json for installed versions first).
- The stack trace points to library internals and you need to understand expected behavior.
- You need to verify correct usage patterns for any project dependency. </single_layer_investigation>
<multi_layer_investigation> Dispatch specialized agents in parallel. Each agent investigates its layer independently and reports findings.
See references/debug-dispatch.md for the agent prompt templates and dispatch patterns.
<debugging_roles>
| Role | Agent type | Domain | MCPs |
|---|---|---|---|
| Backend debugger | debugger |
API routes, use cases, DB queries, server hooks | Sentry, PostHog, Context7 |
| Frontend debugger | debugger |
Pages, components, client state, network | Chrome DevTools, Playwright, Context7 |
| Queue debugger | debugger |
Job queues, workers, Redis state | Context7 |
| E2E verifier | playwright-expert |
Reproduce and verify via browser automation | Playwright |
Each debugging agent should use Context7 (mcp__context7__resolve-library-id + mcp__context7__query-docs) to verify library API usage when the bug involves unclear method signatures, version-specific behavior, or suspected API misuse. Include this instruction in agent prompts — see references/debug-dispatch.md.
</debugging_roles>
<dispatch_rules>
- Launch layer-specific debuggers in parallel — they investigate independently.
- Each agent receives: the error description, reproduction steps, relevant file paths, and which skills to load.
- Each agent reports: root cause hypothesis, evidence (logs, screenshots, stack traces), and suggested fix location.
- After all agents report, correlate findings — the root cause often lives at the boundary between layers.
- If agents disagree on the root cause, investigate the boundary between their domains. </dispatch_rules>
<model_selection>
- Simple, well-scoped investigation (single file, clear error) →
haiku - Moderate investigation (multiple files, data flow tracing) →
sonnet - Complex cross-cutting investigation (race conditions, auth flows, distributed state) →
opus</model_selection> </multi_layer_investigation> </phase_4_investigate>
<phase_5_fix> Implement the minimal fix that addresses the root cause.
<fix_principles>
- Fix the root cause, not the symptom. A 500 error caused by a missing null check needs the null check AND proper error handling upstream.
- Follow existing patterns. Read similar code in the codebase before writing the fix.
- Load the relevant skills before writing the fix (same skills as investigation phase).
- If the fix spans multiple layers, dispatch agents for each layer. See references/debug-dispatch.md for the fix dispatch template. </fix_principles>
<fix_dispatch> In MULTI-LAYER mode, the orchestrator decides who fixes what:
- Identify which files need changes based on investigation findings.
- Group changes by package/domain.
- Determine dependency order (shared → api → web).
- Dispatch fix agents sequentially if there are dependencies, or in parallel if independent. </fix_dispatch> </phase_5_fix>
<phase_6_verify> A fix is not done until it's verified end-to-end.
- Re-run the reproduction — the original error should no longer occur.
- Quality gates — run the project's type checking, linting, and testing commands for every affected package. Discover available commands from package.json scripts (e.g.,
tsc,lint,test). - Browser verification (for user-facing bugs) — use Playwright or Chrome DevTools to navigate to the affected page and confirm the fix works visually.
- Regression check — verify the fix doesn't break related functionality. Run the full test suite for affected packages.
If any step fails, return to phase 5 and fix the issue. Re-run all verification steps from the beginning. </phase_6_verify>
<common_patterns> Quick reference for the most frequent bugs in this stack. Use these to accelerate triage.
Frontend → API boundary
- CORS errors → check server CORS config
- 401/403 → check auth middleware, token expiry, cookie settings
- Wrong data shape → compare shared schema/type definitions with API response
API → Database boundary
- "relation does not exist" → migration not applied
- Empty results → wrong query conditions, check WHERE clause
- Connection errors → Docker not running, pool exhausted
API → Redis/Queue boundary
- Jobs not processing → worker not started, Redis down, wrong queue name
- Duplicate jobs → missing deduplication, idempotency key
- Stalled jobs → worker crashed without ack, check stall interval
Frontend rendering
- Hydration mismatch → server/client content differs, date formatting, conditional rendering
- Blank page → unhandled error in RSC, check error.tsx boundaries
- Infinite loading → API call hanging, missing Suspense boundary </common_patterns>
Do not add explanations, caveats, or follow-up suggestions unless the user asks.