debug
Debug
The golden rule
NO GUESSING. GATHER INFO FIRST.
Bad: Something broke → try random fix → doesn't work → try another → still broken after 5 attempts.
Good: Something broke → reproduce it → gather diagnostic info → diagnose root cause → fix it (usually first try).
Diagnosis before fixes.
Debugging by tool
How you debug depends on which tool you're using.
Claude Code (you have direct access)
Claude Code can gather its own diagnostics. Before asking the founder for screenshots or logs, do this automatically:
Auto-debug steps (do these yourself):
1. Check git history: git log --oneline -10 and git diff HEAD~3
2. Search for the error: Grep for error text across the codebase
3. Read the failing file: Read the file + surrounding context
4. Run the app/tests: Bash to run dev server, test suite, or reproduce
5. Check logs: Read server logs, build output, or error logs
6. Check environment: Verify .env.example vs actual config
Only ask the founder for information you can't get yourself: what they saw in the browser, what they clicked, screenshots of visual bugs.
Lovable / Replit (founder pastes into chat)
The founder needs to gather info manually and paste it. Use the "Tell AI:" prompts in DEBUG-PROMPTS.md — they're structured templates that ensure complete context.
Production bugs (check monitoring first)
Before debugging production issues, check monitoring and error tracking:
1. Error tracker (Sentry, LogRocket): exact error + stack trace + user context
2. Server logs: filter by timestamp of report
3. Hosting dashboard: any deployment or outage at that time?
4. Database: any failed migrations or connection issues?
See /monitor skill for setting up monitoring. See /deploy skill for rollback procedures.
Workflow
Debug process:
- [ ] Reproduce bug consistently
- [ ] Gather diagnostic info (auto in Claude Code, manual elsewhere)
- [ ] Check what changed recently
- [ ] Diagnose root cause before proposing fixes
- [ ] Fix the root cause
- [ ] Test fix works
- [ ] Verify didn't break anything else
- [ ] Ask: how do we prevent this?
Reproducing bugs
Before fixing, reproduce it:
Can you reproduce it?
- [ ] Exact steps to trigger bug
- [ ] Happens every time or intermittently?
- [ ] Specific browser/device?
- [ ] Specific data or user?
If can't reproduce:
- Ask user for exact steps or screen recording
- Try different browser/device/account
- Try with different data
- Clear cache and retry
- Check if timing-dependent
Tell AI:
Bug: [description]
Steps to reproduce:
1. [Step]
2. [Step]
3. [Bug happens]
Happens: [Always / Sometimes / Once]
Browser: [Chrome 120 on Mac]
Screenshot: [attach]
Capturing error info
Browser console
- Right-click page → Inspect → Console tab
- Look for red errors
- Screenshot the full error including stack trace
Tell AI:
Console error: [paste full error message]
When it happens: [what you were doing]
Network tab
- DevTools → Network tab → reproduce bug
- Look for failed requests (red, 4xx, 5xx)
- Click failed request → check Response tab
Tell AI:
API call failing:
URL: /api/endpoint
Method: [GET/POST]
Status: [status code]
Response: [paste error response]
This happens when: [action]
Visual bugs
Screenshot what you expected vs what actually shows. Include device and browser.
Common bug types
"Nothing happens when I click"
Check: console errors? Network request failing? Element actually clickable (not covered by another element)?
"Page won't load"
Check: network errors? JavaScript errors? Infinite redirect? Missing environment variable?
"Wrong data showing"
Check: API returning wrong data (network tab)? Caching issue? State not updating? Wrong user context?
"Form doesn't submit"
Check: validation errors visible? Console errors? Network request firing at all?
"Works in dev, broken in production"
Check: environment variables set? Different database? Build step stripping something? CORS configured for production domain?
"Works in Chrome, broken in Safari"
Check: CSS/JS compatibility? Safari-specific defaults? Date parsing differences?
Escalation discipline
After 1 failed fix
Reassess. Did we misdiagnose? Is there more info we should gather?
Fix didn't work. Here's what happened after applying it: [new info].
Are we fixing the right thing?
After 2 failed fixes
Stop trying fixes. The diagnosis is probably wrong.
2 fixes failed.
Fix 1: [tried] → [result]
Fix 2: [tried] → [result]
Are we fixing the wrong thing? Should we rethink the approach entirely?
After 3 failed fixes
Don't try a 4th. Change strategy:
- Rebuild the feature with a simpler approach
- Get a human developer to look at it (see /hiring)
- Ship a workaround and fix properly later
Digging deeper: find the real root cause
Most debugging failures happen because you stop at the first plausible cause instead of the actual root cause. Use the "keep asking why" technique:
Problem: Server crashed
Why? → Out of memory
Why? → Memory leak in the auth service ← Most people stop here and "add more RAM"
Why? → Database connections not being released
Why? → Error handler doesn't close connections
Why? → No cleanup in the finally block ← THIS is the fix
How to tell you've found the real root cause:
- It's something you can actually change (code, config, process)
- Fixing it would prevent the problem from recurring
- Asking "why?" again doesn't lead anywhere actionable
Common mistake: stopping at "the AI broke it." That's blame, not a cause. Ask instead: what process would have caught this? Missing test? Missing validation? No code review?
When a bug has multiple causes
Sometimes a bug needs two things to go wrong at the same time. When the obvious cause doesn't fully explain the problem, look for a second branch:
Problem: Deployment failed
Why? → Database migration timed out
Branch A: Why was the migration slow?
→ Table lock from a long-running query → Missing index
Branch B: Why is the timeout so short?
→ Using default timeout → No deployment-specific config
Both branches need fixing, or the bug will come back under slightly different conditions.
Validate your diagnosis
Before implementing a fix, trace it backwards: "If I fix X, does that prevent Y, which prevents Z, which prevents the original problem?" If the chain breaks, you found the wrong root cause.
Intermittent bugs
"Works sometimes, breaks sometimes" — likely a race condition, caching issue, or external API flakiness.
Tell AI:
Bug is intermittent.
Works: [X] out of 10 times
Fails: [Y] out of 10 times
Pattern: Fails more when [condition]. Never fails when [condition].
Add logging to capture state when it fails.
Edge case testing
When a fix works for the main case, also test:
- Empty states: no data, empty lists, missing fields
- Volume: 1 item, 100 items, 10,000 items
- Timing: slow connection (3G throttle in DevTools), rapid double-clicks, expired sessions, multiple tabs
- Boundaries: very long text, special characters, zero values, negative numbers
Bugs in production
Priority 1: Can users work around it?
- Yes → fix in next deployment
- No → emergency fix needed
Emergency fix:
Production bug blocking users.
Bug: [description]
Impact: [how many users affected]
Need the simplest fix that unblocks users. Can improve later.
Multiple bugs at once
Symptoms that look like one bug might be several, or several symptoms might share one root cause.
List all symptoms:
1. [Symptom]
2. [Symptom]
3. [Symptom]
Are these separate bugs or one root cause?
Fix in priority order: blocking (can't use app) → critical (main features broken) → major → minor. Don't fix minor bugs while critical ones are unfixed.
Adding debug logging
When a bug is hard to diagnose, add strategic logging:
Add logging at:
- Function entry with input values
- Before/after API calls with request/response
- State changes with before/after values
- Error handlers with full context
- Decision points (which branch was taken)
Format: [TIMESTAMP] [LEVEL] [CONTEXT] Message
Example: [2025-01-13 10:30:45] [ERROR] [UserAuth] Login failed for user@example.com - Reason: Invalid password - Attempts: 3
Remove or reduce logging after the bug is fixed.
Prevention
After every fix, think at three levels:
- Immediate fix — you already did this (the bug is gone)
- Preventive measure — what stops this from ever happening again? (validation, test, type check)
- Detection mechanism — if prevention fails, how do you catch it early? (monitoring alert, error tracking)
Bug is fixed. Now:
- What validation or test would prevent this from recurring?
- What monitoring or alert would catch it early if it does recur? (see /monitor)
- Is this a pattern? Could the same type of bug exist elsewhere in the codebase?
When to get help
Consider hiring a developer when:
- Stuck after following this entire process
- Critical production bug you can't figure out
- Same bug keeps coming back after fixing
- Bug in a complex third-party integration
- Security issue or data corruption risk
For most bugs, this process with AI tools is sufficient.
Common mistakes
| Mistake | Fix |
|---|---|
| Trying fixes without info | Gather diagnostic info first |
| "It doesn't work" (vague) | Be specific: what exactly doesn't work? |
| Not reproducing first | Find consistent steps to trigger the bug |
| Asking AI for random fixes | Diagnose root cause first |
| Ignoring console/network errors | Always check both tabs |
| Not testing after fix | Verify fix works AND didn't break other things |
| Fixing minor bugs while critical ones exist | Prioritize by user impact |
| Accepting first plausible cause | Keep asking "why?" until you reach something you can actually fix |
| "The AI broke it" (blame, not diagnosis) | Ask: what process would have caught this? |