Debugging Skill

This skill defines how to systematically debug an issue reported by a developer. It follows a two-phase approach: first Intake (gathering all necessary context), then Diagnosis (structured analysis and actionable output).

Phase 1 — Intake

Before performing any diagnosis, collect the following inputs from the developer. Ask for all of them in a single, structured prompt. Mark optional fields clearly.

Required Inputs

#	Field	Required	Notes
1	Problem Description	✅ Yes	What went wrong? What was expected vs. actual behaviour?
2	Error Message / Stack Trace	⚡ Highly Recommended	Paste the full error. If none, describe what is observed.
3	Steps to Reproduce	⚡ Highly Recommended	Exact steps to trigger the issue. Note if it is consistent or intermittent.
4	Environment Context	⚡ Highly Recommended	Environment (dev/staging/prod), OS, runtime version, recent deployments or changes
5	Attachments	🔵 Optional	Screenshots, logs, network traces, or config files
6	Recent Changes	🔵 Optional	Any recent code, config, infra, or data changes that preceded the issue
7	Affected Users / Scope	🔵 Optional	All users? A specific role, region, or data set?

Intake Prompt Template

Use this template when asking the developer for inputs:

I'm ready to help debug. Please provide the following:

1. 📝 Problem Description (required)
   What went wrong? What did you expect vs. what actually happened?

2. 🔴 Error Message / Stack Trace (highly recommended)
   Paste the full error output. If no error is shown, describe what you observe instead.

3. 🔁 Steps to Reproduce (highly recommended)
   Walk me through how to trigger this issue. Is it consistent or intermittent?

4. 🌐 Environment Context (highly recommended)
   - Environment: dev / staging / prod
   - OS / Runtime version (e.g., Node 20, Python 3.11, Java 17)
   - Any recent deployments, config changes, or dependency updates?

5. 📎 Attachments (optional)
   Screenshots, log files, network traces, or relevant configs.

6. 🔧 Recent Changes (optional)
   Any recent code, database, infra, or data changes before this issue appeared?

7. 👥 Affected Scope (optional)
   Who is affected? All users, specific roles, regions, or certain data?

Problem Description Validation Gate

This gate MUST be evaluated before Phase 2 begins. Do not proceed to Diagnosis if the Problem Description fails validation.

What Makes a Valid Problem Description

A valid Problem Description must satisfy all three of the following criteria:

#	Criterion	Why It Matters
1	Describes a specific observable symptom or failure	Vague inputs like "it's broken" or "it doesn't work" provide zero signal for diagnosis
2	Contains at least one concrete detail — a feature name, an action taken, a screen/page, an API endpoint, a value, or a behaviour	Without a concrete anchor, there is nothing in the codebase to investigate
3	Is coherent and in good faith — not gibberish, placeholder text, or clearly random input	Random text wastes investigation effort and produces meaningless results

Examples

Input	Valid?	Reason
`"The login button does nothing when I click it on the /auth/login page"`	✅ Yes	Specific symptom + concrete detail (button, page)
`"Order creation fails with a 500 error after I added the discount field"`	✅ Yes	Symptom + context (500, feature name, recent change)
`"Something is wrong with the app"`	❌ No	No specific symptom, no concrete detail
`"It's broken"`	❌ No	Completely vague
`"asdfghjkl"`	❌ No	Gibberish / not in good faith
`"help"`	❌ No	Not a problem description
`"I don't know, just debug it"`	❌ No	No actionable information
`"The payment module has an issue"`	❌ No	No symptom described; too vague to investigate

⚠️ A problem description that identifies a module or file alone (e.g., "the payment module") is not sufficient. It must describe what the module is doing wrong.

Validation Flow

Developer submits intake response
         │
         ▼
Is Problem Description present?
    │               │
   No              Yes
    │               │
    ▼               ▼
Re-prompt        Does it pass all 3 validity criteria?
(Attempt 1)          │                  │
                    Yes                No
                     │                  │
                     ▼                  ▼
              Proceed to          Re-prompt with
              Phase 2             specific feedback
                                  (Attempt 1 → max 2)
                                       │
                                  After 2 failed attempts
                                       │
                                       ▼
                                  Halt and escalate

Re-Prompt Rules

Maximum re-prompts: 2. If the Problem Description is still invalid after 2 correction attempts, halt the session and send the Escalation Message.
Each re-prompt must be specific — tell the developer exactly which criterion their input failed and give a concrete example of what a better input looks like.
Do not soften or skip the gate — even if the developer insists, a valid Problem Description is a prerequisite for meaningful diagnosis.

Re-Prompt Template

Use this when the Problem Description is invalid:

⚠️ I wasn't able to start the debug session yet because the Problem Description needs a bit more detail.

**What I received:** "<their input>"

**What's missing:** <state which criterion failed — e.g., "no specific symptom described" / "no concrete detail" / "input is not a problem description">

**What a good Problem Description looks like:**
> "The checkout button on /cart throws a 500 error after I enter a promo code."
> "User profile pictures are not loading for accounts created before 2025-01-01."

Please re-describe the problem with at least one specific symptom and one concrete detail (feature, page, endpoint, value, or behaviour). The other fields (error message, steps to reproduce, etc.) remain unchanged — just update the Problem Description.

Attempt [1 / 2] remaining.

Escalation Message (after 2 failed attempts)

🚫 Debug session could not be started.

After 2 attempts, the Problem Description provided does not contain enough information to proceed with a meaningful diagnosis. Starting an investigation without a clear problem statement risks producing inaccurate or misleading results.

**Next steps:**
1. Gather more context about what is failing — check logs, error messages, or ask a colleague who observed the issue.
2. Restart the debug session once you have a clearer description of the symptom.

If you believe this is a false rejection, please provide the full error message or a screenshot — that alone may be sufficient to establish a valid problem context.

Phase 2 — Diagnosis

Once intake is complete, perform a structured investigation and produce the following outputs.

Step 1 — Understand the Bug Behaviour

Classify the bug before diving in:

Dimension	Options
Reproducibility	Consistent / Intermittent / Unknown
Severity	Critical (data loss, outage) / High (feature broken) / Medium (degraded) / Low (cosmetic)
Blast Radius	All users / Subset of users / Single user / No user impact (internal/silent)
Bug Type	Logic error / Runtime exception / Configuration / Race condition / Data / Integration / Environment

Step 2 — Investigate

Perform the following steps using available tools (code search, file reading, log analysis):

Read the error message and stack trace — identify the exact file, line, and function where the error originates
Trace the call chain leading to the failure point (use the code-exploration skill's L4 Function level if needed)
Search for recent changes in the impacted files (git log, diff review)
Check for environmental factors: config values, missing env vars, dependency version mismatches
Check data assumptions: is there any data that the code assumes exists, is non-null, or is in a specific format?
Identify all other files or services that call into or are called by the failing code — assess the blast radius

Step 3 — Root Cause Analysis

List all plausible root causes with a likelihood score and supporting evidence.

Format

## 🔎 Root Cause Analysis

| # | Suspected Root Cause | Likelihood | Evidence / Reasoning |
|---|---|---|---|
| 1 | <description of cause> | 🔴 High (75%) | <what points to this> |
| 2 | <description of cause> | 🟡 Medium (20%) | <what points to this> |
| 3 | <description of cause> | 🟢 Low (5%) | <what points to this> |

**Most Likely Root Cause:** #1 — <brief summary>

Likelihood Legend

Indicator	Score Range	Meaning
🔴 High	60–100%	Strong evidence; the stack trace or code directly supports this
🟡 Medium	30–59%	Plausible; partial evidence or requires validation
🟢 Low	1–29%	Possible edge case; cannot be ruled out without more data

⚠️ If total likelihood does not add up to 100%, normalise across all candidates. Always rank by descending likelihood. If only one cause is identified, state "single root cause identified" and assign 100%.

Step 4 — Impacted File Map

List every file that is directly or indirectly affected by the bug or the fix.

Format

## 📁 Impacted Files

| File | Role | Impact Type |
|---|---|---|
| `src/orders/orders.service.ts` | Origin of error | 🔴 Direct — fix required here |
| `src/users/users.service.ts` | Called by failing code | 🟡 Indirect — may need defensive update |
| `src/common/filters/http-exception.filter.ts` | Error handler | 🔵 Review — ensure error surfaces correctly |
| `tests/orders/orders.service.spec.ts` | Unit tests | ✅ Update — add test case for this scenario |

Impact Type Legend

Icon	Type	Meaning
🔴 Direct	Fix required	This file contains the defective code
🟡 Indirect	May need update	Depends on or is called by the defective code
🔵 Review	Check only	Peripheral; review to confirm it handles this case
✅ Update	Test/doc update	No logic change, but tests or docs need updating

Step 5 — Proposed Solutions

Provide at least 2 distinct solutions. Solutions must differ meaningfully (not just syntax variants).

Format for Each Solution

### Solution [N] — <Short Title>

**Approach:**
<Describe the approach in 2–4 sentences. What changes, and why does this fix the issue?>

**Trade-offs:**
| Pro | Con |
|---|---|
| <benefit> | <drawback> |

**Key Changes:**
- `<file>`: <what changes>
- `<file>`: <what changes>

**How to Validate:**
<Step-by-step instructions to confirm the fix works — unit test to write, curl command to run, UI action to perform, etc.>

**Estimated Effort:** <Low / Medium / High>

Step 6 — Recommendation

State the recommended solution clearly and explain why it is preferred.

Format

## ✅ Recommended Solution: Solution [N] — <Title>

**Why this is recommended:**
<2–4 sentences justifying the choice. Reference trade-offs, risk, effort, and alignment with existing patterns.>

**Implementation Order:**
1. <First action to take>
2. <Second action to take>
3. <Verification step>

Step 7 — Preventive Recommendation

After diagnosing and recommending a fix, always suggest how to prevent this class of bug from recurring.

Format

## 🛡️ Prevention

| Recommendation | Type | Priority |
|---|---|---|
| Add null check for `user.id` before DB query | Code Guard | 🔴 High |
| Write unit test for empty payload edge case | Test Coverage | 🟡 Medium |
| Add schema validation at API boundary | Input Validation | 🟡 Medium |
| Set up alerting for 5xx error rate spike | Observability | 🔵 Low |

Step 8 — Investigation Trail

Maintain a log of what was checked and what was ruled out. This is critical if the debugging session spans multiple iterations or is handed off to another developer.

Format

## 🧭 Investigation Trail

| Step | What Was Checked | Finding | Status |
|---|---|---|---|
| 1 | Stack trace origin | Error thrown in `orders.service.ts` line 47 | ✅ Confirmed |
| 2 | Recent git changes to `orders.service.ts` | No changes in last 30 days | 🚫 Ruled out |
| 3 | ENV variable `DATABASE_URL` | Present and correct in all environments | 🚫 Ruled out |
| 4 | Null check on `user` object before `.id` access | Missing null guard — matches error pattern | ✅ Confirmed |
| 5 | Upstream caller `orders.controller.ts` | Passes unvalidated user object | 🟡 Contributing factor |

Full Output Template

Use this skeleton as the final output structure for every debugging session:

# 🐛 Debug Report — <Short Problem Title>

> **Reported:** YYYY-MM-DD
> **Environment:** <dev/staging/prod>
> **Severity:** <Critical / High / Medium / Low>
> **Status:** <Investigating / Root Cause Identified / Solution Proposed / Resolved>

---

## 📋 Problem Summary
<1–3 sentence summary of the issue, expected vs. actual behaviour, and reproducibility>

## 🔎 Bug Behaviour
- **Reproducibility:** <Consistent / Intermittent>
- **Blast Radius:** <Who/what is affected>
- **Bug Type:** <Logic / Runtime / Config / Race Condition / Data / Integration / Environment>

---

## 🔎 Root Cause Analysis
<Section 3 output>

---

## 📁 Impacted Files
<Section 4 output>

---

## 💡 Proposed Solutions

### Solution 1 — <Title>
<Section 5 format>

### Solution 2 — <Title>
<Section 5 format>

---

## ✅ Recommended Solution
<Section 6 output>

---

## 🛡️ Prevention
<Section 7 output>

---

## 🧭 Investigation Trail
<Section 8 output>

File Output (MANDATORY)

Every debug report must be saved as a markdown file before presenting it to the user.

Save Location

<repo-root>/.agent/debugs/

Naming Convention

debug_<YYYYMMDD>_<HHMMSS>_<short-slug>.md

Examples:

.agent/debugs/debug_20260311_143022_null-user-id-orders.md
.agent/debugs/debug_20260311_160045_auth-token-expiry.md

File Header

Every debug file must start with a metadata header:

# Debug Report — <Short Problem Title>

> **Level:** Debug Session
> **Reported:** YYYY-MM-DD HH:MM:SS
> **Environment:** <environment>
> **Severity:** <severity>
> **Status:** <status>

---

Quick Decision Guide

Developer reports a bug?
  → Run Phase 1 Intake (collect all inputs in one prompt)

Intake complete?
  → Run Phase 2 Diagnosis (Steps 1–8 in order)

Only one root cause found?
  → Still provide ≥2 solutions (different approaches to fix the same cause)

Bug is intermittent / no stack trace?
  → Increase weight on environment, race conditions, and data assumptions

Bug appeared "out of nowhere" with no code changes?
  → Prioritise: infra/config changes, data anomalies, dependency updates

Fix applied but issue persists?
  → Revisit investigation trail, promote lower-likelihood root causes, add new findings

Session handed off to another developer?
  → Ensure Investigation Trail (Step 8) is fully up to date before handoff