Debugging
Debugging Skill
This skill defines how to systematically debug an issue reported by a developer. It follows a two-phase approach: first Intake (gathering all necessary context), then Diagnosis (structured analysis and actionable output).
Phase 1 — Intake
Before performing any diagnosis, collect the following inputs from the developer. Ask for all of them in a single, structured prompt. Mark optional fields clearly.
Required Inputs
| # | Field | Required | Notes |
|---|---|---|---|
| 1 | Problem Description | ✅ Yes | What went wrong? What was expected vs. actual behaviour? |
| 2 | Error Message / Stack Trace | ⚡ Highly Recommended | Paste the full error. If none, describe what is observed. |
| 3 | Steps to Reproduce | ⚡ Highly Recommended | Exact steps to trigger the issue. Note if it is consistent or intermittent. |
| 4 | Environment Context | ⚡ Highly Recommended | Environment (dev/staging/prod), OS, runtime version, recent deployments or changes |
| 5 | Attachments | 🔵 Optional | Screenshots, logs, network traces, or config files |
| 6 | Recent Changes | 🔵 Optional | Any recent code, config, infra, or data changes that preceded the issue |
| 7 | Affected Users / Scope | 🔵 Optional | All users? A specific role, region, or data set? |
Intake Prompt Template
Use this template when asking the developer for inputs:
I'm ready to help debug. Please provide the following:
1. 📝 Problem Description (required)
What went wrong? What did you expect vs. what actually happened?
2. 🔴 Error Message / Stack Trace (highly recommended)
Paste the full error output. If no error is shown, describe what you observe instead.
3. 🔁 Steps to Reproduce (highly recommended)
Walk me through how to trigger this issue. Is it consistent or intermittent?
4. 🌐 Environment Context (highly recommended)
- Environment: dev / staging / prod
- OS / Runtime version (e.g., Node 20, Python 3.11, Java 17)
- Any recent deployments, config changes, or dependency updates?
5. 📎 Attachments (optional)
Screenshots, log files, network traces, or relevant configs.
6. 🔧 Recent Changes (optional)
Any recent code, database, infra, or data changes before this issue appeared?
7. 👥 Affected Scope (optional)
Who is affected? All users, specific roles, regions, or certain data?
Problem Description Validation Gate
This gate MUST be evaluated before Phase 2 begins. Do not proceed to Diagnosis if the Problem Description fails validation.
What Makes a Valid Problem Description
A valid Problem Description must satisfy all three of the following criteria:
| # | Criterion | Why It Matters |
|---|---|---|
| 1 | Describes a specific observable symptom or failure | Vague inputs like "it's broken" or "it doesn't work" provide zero signal for diagnosis |
| 2 | Contains at least one concrete detail — a feature name, an action taken, a screen/page, an API endpoint, a value, or a behaviour | Without a concrete anchor, there is nothing in the codebase to investigate |
| 3 | Is coherent and in good faith — not gibberish, placeholder text, or clearly random input | Random text wastes investigation effort and produces meaningless results |
Examples
| Input | Valid? | Reason |
|---|---|---|
"The login button does nothing when I click it on the /auth/login page" |
✅ Yes | Specific symptom + concrete detail (button, page) |
"Order creation fails with a 500 error after I added the discount field" |
✅ Yes | Symptom + context (500, feature name, recent change) |
"Something is wrong with the app" |
❌ No | No specific symptom, no concrete detail |
"It's broken" |
❌ No | Completely vague |
"asdfghjkl" |
❌ No | Gibberish / not in good faith |
"help" |
❌ No | Not a problem description |
"I don't know, just debug it" |
❌ No | No actionable information |
"The payment module has an issue" |
❌ No | No symptom described; too vague to investigate |
⚠️ A problem description that identifies a module or file alone (e.g., "the payment module") is not sufficient. It must describe what the module is doing wrong.
Validation Flow
Developer submits intake response
│
▼
Is Problem Description present?
│ │
No Yes
│ │
▼ ▼
Re-prompt Does it pass all 3 validity criteria?
(Attempt 1) │ │
Yes No
│ │
▼ ▼
Proceed to Re-prompt with
Phase 2 specific feedback
(Attempt 1 → max 2)
│
After 2 failed attempts
│
▼
Halt and escalate
Re-Prompt Rules
- Maximum re-prompts: 2. If the Problem Description is still invalid after 2 correction attempts, halt the session and send the Escalation Message.
- Each re-prompt must be specific — tell the developer exactly which criterion their input failed and give a concrete example of what a better input looks like.
- Do not soften or skip the gate — even if the developer insists, a valid Problem Description is a prerequisite for meaningful diagnosis.
Re-Prompt Template
Use this when the Problem Description is invalid:
⚠️ I wasn't able to start the debug session yet because the Problem Description needs a bit more detail.
**What I received:** "<their input>"
**What's missing:** <state which criterion failed — e.g., "no specific symptom described" / "no concrete detail" / "input is not a problem description">
**What a good Problem Description looks like:**
> "The checkout button on /cart throws a 500 error after I enter a promo code."
> "User profile pictures are not loading for accounts created before 2025-01-01."
Please re-describe the problem with at least one specific symptom and one concrete detail (feature, page, endpoint, value, or behaviour). The other fields (error message, steps to reproduce, etc.) remain unchanged — just update the Problem Description.
Attempt [1 / 2] remaining.
Escalation Message (after 2 failed attempts)
🚫 Debug session could not be started.
After 2 attempts, the Problem Description provided does not contain enough information to proceed with a meaningful diagnosis. Starting an investigation without a clear problem statement risks producing inaccurate or misleading results.
**Next steps:**
1. Gather more context about what is failing — check logs, error messages, or ask a colleague who observed the issue.
2. Restart the debug session once you have a clearer description of the symptom.
If you believe this is a false rejection, please provide the full error message or a screenshot — that alone may be sufficient to establish a valid problem context.
Phase 2 — Diagnosis
Once intake is complete, perform a structured investigation and produce the following outputs.
Step 1 — Understand the Bug Behaviour
Classify the bug before diving in:
| Dimension | Options |
|---|---|
| Reproducibility | Consistent / Intermittent / Unknown |
| Severity | Critical (data loss, outage) / High (feature broken) / Medium (degraded) / Low (cosmetic) |
| Blast Radius | All users / Subset of users / Single user / No user impact (internal/silent) |
| Bug Type | Logic error / Runtime exception / Configuration / Race condition / Data / Integration / Environment |
Step 2 — Investigate
Perform the following steps using available tools (code search, file reading, log analysis):
- Read the error message and stack trace — identify the exact file, line, and function where the error originates
- Trace the call chain leading to the failure point (use the code-exploration skill's L4 Function level if needed)
- Search for recent changes in the impacted files (
git log, diff review) - Check for environmental factors: config values, missing env vars, dependency version mismatches
- Check data assumptions: is there any data that the code assumes exists, is non-null, or is in a specific format?
- Identify all other files or services that call into or are called by the failing code — assess the blast radius
Step 3 — Root Cause Analysis
List all plausible root causes with a likelihood score and supporting evidence.
Format
## 🔎 Root Cause Analysis
| # | Suspected Root Cause | Likelihood | Evidence / Reasoning |
|---|---|---|---|
| 1 | <description of cause> | 🔴 High (75%) | <what points to this> |
| 2 | <description of cause> | 🟡 Medium (20%) | <what points to this> |
| 3 | <description of cause> | 🟢 Low (5%) | <what points to this> |
**Most Likely Root Cause:** #1 — <brief summary>
Likelihood Legend
| Indicator | Score Range | Meaning |
|---|---|---|
| 🔴 High | 60–100% | Strong evidence; the stack trace or code directly supports this |
| 🟡 Medium | 30–59% | Plausible; partial evidence or requires validation |
| 🟢 Low | 1–29% | Possible edge case; cannot be ruled out without more data |
⚠️ If total likelihood does not add up to 100%, normalise across all candidates. Always rank by descending likelihood. If only one cause is identified, state "single root cause identified" and assign 100%.
Step 4 — Impacted File Map
List every file that is directly or indirectly affected by the bug or the fix.
Format
## 📁 Impacted Files
| File | Role | Impact Type |
|---|---|---|
| `src/orders/orders.service.ts` | Origin of error | 🔴 Direct — fix required here |
| `src/users/users.service.ts` | Called by failing code | 🟡 Indirect — may need defensive update |
| `src/common/filters/http-exception.filter.ts` | Error handler | 🔵 Review — ensure error surfaces correctly |
| `tests/orders/orders.service.spec.ts` | Unit tests | ✅ Update — add test case for this scenario |
Impact Type Legend
| Icon | Type | Meaning |
|---|---|---|
| 🔴 Direct | Fix required | This file contains the defective code |
| 🟡 Indirect | May need update | Depends on or is called by the defective code |
| 🔵 Review | Check only | Peripheral; review to confirm it handles this case |
| ✅ Update | Test/doc update | No logic change, but tests or docs need updating |
Step 5 — Proposed Solutions
Provide at least 2 distinct solutions. Solutions must differ meaningfully (not just syntax variants).
Format for Each Solution
### Solution [N] — <Short Title>
**Approach:**
<Describe the approach in 2–4 sentences. What changes, and why does this fix the issue?>
**Trade-offs:**
| Pro | Con |
|---|---|
| <benefit> | <drawback> |
**Key Changes:**
- `<file>`: <what changes>
- `<file>`: <what changes>
**How to Validate:**
<Step-by-step instructions to confirm the fix works — unit test to write, curl command to run, UI action to perform, etc.>
**Estimated Effort:** <Low / Medium / High>
Step 6 — Recommendation
State the recommended solution clearly and explain why it is preferred.
Format
## ✅ Recommended Solution: Solution [N] — <Title>
**Why this is recommended:**
<2–4 sentences justifying the choice. Reference trade-offs, risk, effort, and alignment with existing patterns.>
**Implementation Order:**
1. <First action to take>
2. <Second action to take>
3. <Verification step>
Step 7 — Preventive Recommendation
After diagnosing and recommending a fix, always suggest how to prevent this class of bug from recurring.
Format
## 🛡️ Prevention
| Recommendation | Type | Priority |
|---|---|---|
| Add null check for `user.id` before DB query | Code Guard | 🔴 High |
| Write unit test for empty payload edge case | Test Coverage | 🟡 Medium |
| Add schema validation at API boundary | Input Validation | 🟡 Medium |
| Set up alerting for 5xx error rate spike | Observability | 🔵 Low |
Step 8 — Investigation Trail
Maintain a log of what was checked and what was ruled out. This is critical if the debugging session spans multiple iterations or is handed off to another developer.
Format
## 🧭 Investigation Trail
| Step | What Was Checked | Finding | Status |
|---|---|---|---|
| 1 | Stack trace origin | Error thrown in `orders.service.ts` line 47 | ✅ Confirmed |
| 2 | Recent git changes to `orders.service.ts` | No changes in last 30 days | 🚫 Ruled out |
| 3 | ENV variable `DATABASE_URL` | Present and correct in all environments | 🚫 Ruled out |
| 4 | Null check on `user` object before `.id` access | Missing null guard — matches error pattern | ✅ Confirmed |
| 5 | Upstream caller `orders.controller.ts` | Passes unvalidated user object | 🟡 Contributing factor |
Full Output Template
Use this skeleton as the final output structure for every debugging session:
# 🐛 Debug Report — <Short Problem Title>
> **Reported:** YYYY-MM-DD
> **Environment:** <dev/staging/prod>
> **Severity:** <Critical / High / Medium / Low>
> **Status:** <Investigating / Root Cause Identified / Solution Proposed / Resolved>
---
## 📋 Problem Summary
<1–3 sentence summary of the issue, expected vs. actual behaviour, and reproducibility>
## 🔎 Bug Behaviour
- **Reproducibility:** <Consistent / Intermittent>
- **Blast Radius:** <Who/what is affected>
- **Bug Type:** <Logic / Runtime / Config / Race Condition / Data / Integration / Environment>
---
## 🔎 Root Cause Analysis
<Section 3 output>
---
## 📁 Impacted Files
<Section 4 output>
---
## 💡 Proposed Solutions
### Solution 1 — <Title>
<Section 5 format>
### Solution 2 — <Title>
<Section 5 format>
---
## ✅ Recommended Solution
<Section 6 output>
---
## 🛡️ Prevention
<Section 7 output>
---
## 🧭 Investigation Trail
<Section 8 output>
File Output (MANDATORY)
Every debug report must be saved as a markdown file before presenting it to the user.
Save Location
<repo-root>/.agent/debugs/
Naming Convention
debug_<YYYYMMDD>_<HHMMSS>_<short-slug>.md
Examples:
.agent/debugs/debug_20260311_143022_null-user-id-orders.md
.agent/debugs/debug_20260311_160045_auth-token-expiry.md
File Header
Every debug file must start with a metadata header:
# Debug Report — <Short Problem Title>
> **Level:** Debug Session
> **Reported:** YYYY-MM-DD HH:MM:SS
> **Environment:** <environment>
> **Severity:** <severity>
> **Status:** <status>
---
Quick Decision Guide
Developer reports a bug?
→ Run Phase 1 Intake (collect all inputs in one prompt)
Intake complete?
→ Run Phase 2 Diagnosis (Steps 1–8 in order)
Only one root cause found?
→ Still provide ≥2 solutions (different approaches to fix the same cause)
Bug is intermittent / no stack trace?
→ Increase weight on environment, race conditions, and data assumptions
Bug appeared "out of nowhere" with no code changes?
→ Prioritise: infra/config changes, data anomalies, dependency updates
Fix applied but issue persists?
→ Revisit investigation trail, promote lower-likelihood root causes, add new findings
Session handed off to another developer?
→ Ensure Investigation Trail (Step 8) is fully up to date before handoff