debug-systematic

SKILL.md

Systematic Debugging Workflow

I'll help you debug issues systematically using the scientific method - hypothesis formation, testing, and iterative refinement.

Arguments: $ARGUMENTS - error description, reproduction steps, or context

Token Optimization

Target: 50% reduction (4,000-6,000 → 1,500-3,000 tokens)

Core Optimization Strategies

1. Hypothesis-Driven Debugging (Not Exhaustive Analysis)

  • AVOID: Reading entire codebase to find bugs
  • DO: Form hypotheses about likely causes, test top 2-3 first
  • Token savings: 90% (200 tokens vs 2,000+ tokens)
  • Pattern: Prioritize recently changed files, common failure patterns

2. Git Diff for Recently Changed Files (Likely Bug Source)

  • AVOID: ls -R then reading all files
  • DO: git diff --name-only HEAD~3..HEAD to find changed files
  • DO: git log --oneline --since="3 days ago" for recent commits
  • Token savings: 85% (300 tokens vs 2,000+ tokens)
  • Pattern: Bugs often introduced in recent changes

3. Stack Trace Parsing with Grep

  • AVOID: Reading entire log files with Read tool
  • DO: grep -i "error\|exception\|fatal" logs/*.log | tail -20
  • DO: Parse stack traces to extract file paths and line numbers
  • Token savings: 95% (100 tokens vs 2,000+ tokens for large logs)
  • Pattern: Stack traces reveal exact failure locations

4. Test Failure Analysis Caching

  • ✅ Cache test results in debug/state.json
  • ✅ Cache hypothesis outcomes to avoid retesting
  • ✅ Cache reproduction steps once confirmed
  • Token savings: 70% on subsequent debugging turns
  • Pattern: Multi-turn debugging sessions benefit from state

5. Progressive Investigation (Narrow Before Deep)

  • ✅ Start with stack trace → identify file → read specific function
  • ✅ Hypothesis testing: test most likely causes first
  • ✅ Binary search through git history when needed
  • Token savings: 60% (stop early when cause found)
  • Pattern: Most bugs have obvious causes in changed code

6. Session State Tracking for Multi-Turn Debugging

  • ✅ Session files in debug/ directory
  • ✅ Track tested hypotheses to avoid repetition
  • ✅ Resume from last checkpoint on subsequent runs
  • Token savings: 80% on resumed sessions (skip completed work)
  • Pattern: Complex bugs require multiple debugging turns

Token Usage by Operation

Operation Unoptimized Optimized Savings
Initial bug analysis 2,000-3,000 500-1,000 60-75%
Hypothesis formation 1,500-2,000 400-800 60-73%
Stack trace parsing 2,000+ 100-200 90-95%
File investigation 2,000+ 300-600 70-85%
Test reproduction 1,000-1,500 200-400 73-80%
Session resume 2,000-3,000 300-600 80-85%

Average Reduction: 50% (4,000-6,000 → 1,500-3,000 tokens)

Debugging-Specific Patterns

Stack Trace Analysis:

# Extract file paths and line numbers from stack traces
grep -E "at .+ \(.+:[0-9]+:[0-9]+\)" error.log | head -10
# Focus investigation on these specific files/lines

Recent Changes Focus:

# Find files changed in last 3 days (likely bug sources)
git diff --name-only HEAD~10..HEAD
# Only read files that changed recently

Hypothesis Prioritization:

  1. Recent changes (80% of bugs) - Check git diff first
  2. Stack trace files (90% reliability) - Read exact failure locations
  3. Error message patterns (70% of bugs) - Grep for similar errors
  4. Environment/config (20% of bugs) - Check if configs changed
  5. External dependencies (10% of bugs) - Check updates

Binary Search for Regressions:

# Use git bisect to find regression commit
git bisect start HEAD v1.2.3
git bisect run npm test  # Automated testing
# Saves 95% tokens vs manual testing each commit

Caching Behavior

Session Location: debug/ (in project root)

  • debug/plan.md - Debugging plan with hypotheses and results
  • debug/state.json - Session state and test results
  • debug/reproduction.log - Issue reproduction steps and logs

Cache Location: .claude/cache/debug/

  • hypotheses.json - Tested hypotheses and outcomes
  • stack-traces.json - Parsed stack trace information
  • changed-files.json - Recently changed files analysis

Cache Validity:

  • Until issue resolved (status: "solved" in state.json)
  • Until source files change (checksum-based)
  • 7 days maximum for stale sessions

Shared With:

  • /debug-root-cause - Root cause analysis skill
  • /debug-session - Debug session documentation
  • /test - Test execution for verification

Usage Examples

Start New Debugging Session:

debug-systematic "API returns 500 on POST /users"
# Expected tokens: 1,500-3,000 (full analysis)

Resume Existing Session:

debug-systematic resume
# Expected tokens: 800-1,500 (skips completed hypotheses)

Test Specific Hypothesis:

debug-systematic test 1
# Expected tokens: 500-1,000 (focused testing)

Check Debugging Progress:

debug-systematic status
# Expected tokens: 200-500 (read session state only)

Mark Issue as Solved:

debug-systematic solved
# Expected tokens: 300-600 (generate summary)

Early Exit Conditions

Exit immediately (saves 90% tokens) when:

  • ✅ Issue already solved (check debug/state.json status)
  • ✅ No test framework available (can't reproduce)
  • ✅ Not a git repository (can't check recent changes)
  • ✅ Root cause already identified in session state

Progressive disclosure saves 60-80% tokens:

  • Show hypothesis formation → wait for user confirmation
  • Test one hypothesis at a time → report results
  • Only deep dive when hypothesis confirms

Implementation Checklist

  • ✅ Git diff analysis for recent changes (PRIMARY optimization)
  • ✅ Stack trace parsing with Grep (saves 90-95%)
  • ✅ Session-based hypothesis tracking (saves 70-80% on reruns)
  • ✅ Progressive hypothesis testing (most likely → least likely)
  • ✅ Bash-based log analysis (minimal tokens)
  • ✅ Test failure result caching
  • ✅ Early exit when issue resolved
  • ✅ Binary search for regressions (git bisect)
  • ✅ Focus area flags (specific file/function debugging)

Optimization Status: ✅ Optimized (Phase 2 Batch 2, 2026-01-26) Expected Tokens: 1,500-3,000 (vs. 4,000-6,000 unoptimized) Achieved Reduction: 50% average across all debugging operations

Session Intelligence

I'll maintain debugging session continuity:

Session Files (in current project directory):

  • debug/plan.md - Debugging plan with hypotheses and results
  • debug/state.json - Session state and test results
  • debug/reproduction.log - Issue reproduction steps and logs

IMPORTANT: Session files are stored in a debug folder in your current project root

Auto-Detection:

  • If session exists: Resume debugging from last hypothesis
  • If no session: Create debugging plan and initial reproduction
  • Commands: resume, reproduce, status, solved

Phase 1: Issue Reproduction & Information Gathering

Extended Thinking for Complex Debugging

For complex or elusive bugs, I'll use extended thinking to explore debugging strategies:

Triggers for Extended Analysis:

  • Intermittent or non-deterministic bugs
  • Production-only failures
  • Performance issues without obvious cause
  • Security vulnerabilities
  • Multi-component system failures

MANDATORY FIRST STEPS:

  1. Check if debug directory exists in current working directory
  2. If directory exists, check for session files:
    • Look for debug/state.json
    • Look for debug/plan.md
    • If found, resume from last hypothesis
  3. If no directory or session exists:
    • Gather error information
    • Create reproduction steps
    • Initialize debugging session

Information Gathering (Token-Efficient):

#!/bin/bash
# Systematic Debugging - Information Gathering

gather_debug_info() {
    echo "=== Issue Reproduction Information ==="
    echo ""

    # 1. Error logs (use Grep, not cat)
    echo "Recent error logs:"
    if [ -d "logs" ]; then
        grep -i "error\|exception\|fatal" logs/*.log 2>/dev/null | tail -20 || echo "  No errors in logs"
    fi

    # 2. Git status (what changed recently)
    echo ""
    echo "Recent changes:"
    git log --oneline --since="3 days ago" | head -10 || echo "  Not a git repository"

    # 3. Environment info
    echo ""
    echo "Environment:"
    if [ -f "package.json" ]; then
        echo "  Node: $(node --version 2>/dev/null || echo 'not installed')"
        echo "  NPM: $(npm --version 2>/dev/null || echo 'not installed')"
    elif [ -f "requirements.txt" ]; then
        echo "  Python: $(python --version 2>/dev/null || echo 'not installed')"
    fi

    # 4. System resources
    echo ""
    echo "System resources:"
    echo "  Memory: $(free -h 2>/dev/null | grep Mem | awk '{print $3 "/" $2}' || echo 'N/A')"
    echo "  Disk: $(df -h . 2>/dev/null | tail -1 | awk '{print $3 "/" $2 " (" $5 ")"}' || echo 'N/A')"

    # 5. Running processes (if server issue)
    echo ""
    echo "Relevant processes:"
    ps aux | grep -E "node|python|java" | grep -v grep | head -5 || echo "  No relevant processes"
}

gather_debug_info > debug/initial-state.log
cat debug/initial-state.log

Reproduction Steps:

#!/bin/bash
# Create reproducible test case

create_reproduction() {
    cat > debug/reproduction.sh << 'EOF'
#!/bin/bash
# Minimal reproduction script

echo "=== Bug Reproduction Steps ==="
echo ""
echo "Step 1: Setup environment"
# TODO: Add setup commands

echo "Step 2: Execute actions that trigger bug"
# TODO: Add trigger commands

echo "Step 3: Verify bug occurs"
# TODO: Add verification

echo ""
echo "Expected: [describe expected behavior]"
echo "Actual: [describe actual behavior]"
EOF

    chmod +x debug/reproduction.sh
    echo "Created reproduction script: debug/reproduction.sh"
}

create_reproduction

Phase 2: Hypothesis Formation

I'll formulate testable hypotheses about the root cause:

Hypothesis Generation Framework:

# Debugging Plan - [timestamp]

## Issue Description
**Summary**: [brief description]
**Severity**: Critical | High | Medium | Low
**Impact**: [affected users/systems]
**Frequency**: Always | Intermittent | Rare

## Error Details

[Full error message/stack trace]


## Environment
- **Platform**: [OS, runtime version]
- **Configuration**: [relevant settings]
- **Recent Changes**: [commits/deployments]

## Hypotheses (Prioritized)

### Hypothesis 1: [Most likely cause] - PRIORITY: HIGH
**Theory**: [explanation of suspected cause]
**Evidence**: [supporting observations]
**Test**: [how to verify/disprove]
**Expected**: [what should happen if correct]
**Result**: [ ] Pending | [ ] Confirmed | [ ] Disproved

### Hypothesis 2: [Second most likely] - PRIORITY: MEDIUM
**Theory**: [explanation]
**Evidence**: [observations]
**Test**: [verification method]
**Expected**: [expected outcome]
**Result**: [ ] Pending | [ ] Confirmed | [ ] Disproved

### Hypothesis 3: [Alternative cause] - PRIORITY: LOW
**Theory**: [explanation]
**Evidence**: [observations]
**Test**: [verification method]
**Expected**: [expected outcome]
**Result**: [ ] Pending | [ ] Confirmed | [ ] Disproved

## Investigation Log
- [timestamp]: Initial reproduction successful
- [timestamp]: Hypothesis 1 testing in progress

Hypothesis Prioritization:

  1. Recent changes - Check git history
  2. Common patterns - Known bug categories
  3. Environment issues - Dependencies, config
  4. Logic errors - Code analysis
  5. External factors - Third-party services

Phase 3: Systematic Testing

I'll test each hypothesis methodically:

Testing Framework:

#!/bin/bash
# Hypothesis Testing Script

test_hypothesis() {
    local hypothesis_num="$1"
    local test_description="$2"

    echo "=== Testing Hypothesis $hypothesis_num ==="
    echo "Test: $test_description"
    echo ""

    # Create checkpoint before testing
    git stash push -m "Debug checkpoint before hypothesis $hypothesis_num"

    # Run test
    local result="PENDING"

    # Log result
    echo "[$hypothesis_num] $test_description: $result" >> debug/test-results.log
}

# Example: Test hypothesis about missing dependency
test_dependency_hypothesis() {
    echo "Hypothesis: Missing or incompatible dependency"

    # Check dependency versions
    if [ -f "package.json" ]; then
        echo "Checking npm dependencies..."
        npm list --depth=0 2>&1 | grep -i "missing\|error" && {
            echo "❌ CONFIRMED: Missing dependencies detected"
            return 0
        }
    fi

    echo "✓ DISPROVED: All dependencies present"
    return 1
}

# Example: Test hypothesis about race condition
test_race_condition_hypothesis() {
    echo "Hypothesis: Race condition in async code"

    # Add delays to test timing sensitivity
    echo "Running test with delays..."
    # TODO: Add test with deliberate delays

    echo "Running test rapidly..."
    for i in {1..10}; do
        # TODO: Run test in tight loop
        true
    done
}

# Test each hypothesis in priority order
test_dependency_hypothesis
test_race_condition_hypothesis

Binary Search Debugging:

#!/bin/bash
# Binary search through git history to find regression

git_bisect_debug() {
    echo "=== Git Bisect Debugging ==="

    # Find last known good commit
    read -p "Enter last known good commit (or tag): " good_commit
    read -p "Enter first known bad commit (or 'HEAD'): " bad_commit

    git bisect start
    git bisect bad $bad_commit
    git bisect good $good_commit

    cat > debug/bisect-test.sh << 'EOF'
#!/bin/bash
# Automated bisect test script

# Run test
npm test || exit 1  # Exit 1 if bad, 0 if good

# Or manual verification
echo "Test the current commit and press:"
echo "  g - if this commit is good"
echo "  b - if this commit is bad"
read -n 1 response
[ "$response" = "g" ] && exit 0 || exit 1
EOF

    chmod +x debug/bisect-test.sh
    echo "Run: git bisect run ./debug/bisect-test.sh"
}

Phase 4: Isolation & Simplification

I'll create minimal test cases:

Issue Isolation:

#!/bin/bash
# Create minimal reproducible example

create_minimal_reproduction() {
    local issue_type="$1"

    mkdir -p debug/minimal-case

    case $issue_type in
        "api")
            cat > debug/minimal-case/test.js << 'EOF'
// Minimal API test case
const fetch = require('node-fetch');

async function testIssue() {
    const response = await fetch('http://localhost:3000/api/endpoint');
    const data = await response.json();
    console.log('Response:', data);
    // Add assertion that fails
}

testIssue().catch(console.error);
EOF
            ;;

        "frontend")
            cat > debug/minimal-case/test.html << 'EOF'
<!DOCTYPE html>
<html>
<head>
    <title>Minimal Test Case</title>
</head>
<body>
    <button id="testBtn">Click to trigger issue</button>
    <div id="output"></div>

    <script>
        document.getElementById('testBtn').addEventListener('click', () => {
            // Minimal code to reproduce issue
            console.log('Testing...');
        });
    </script>
</body>
</html>
EOF
            ;;

        "database")
            cat > debug/minimal-case/test.sql << 'EOF'
-- Minimal database query to reproduce issue
BEGIN TRANSACTION;

-- Setup test data
CREATE TEMP TABLE test_data (id INT, value TEXT);
INSERT INTO test_data VALUES (1, 'test');

-- Query that demonstrates issue
SELECT * FROM test_data WHERE condition;

ROLLBACK;
EOF
            ;;
    esac

    echo "Created minimal test case in debug/minimal-case/"
}

Phase 5: Solution Implementation

Once root cause is identified, I'll implement the fix:

Fix Validation:

#!/bin/bash
# Validate fix before committing

validate_fix() {
    echo "=== Fix Validation ==="

    # 1. Run original reproduction - should now pass
    echo "Step 1: Run original reproduction..."
    if [ -f "debug/reproduction.sh" ]; then
        ./debug/reproduction.sh && echo "✓ Original issue resolved" || {
            echo "❌ Issue still reproduces"
            return 1
        }
    fi

    # 2. Run full test suite
    echo "Step 2: Run test suite..."
    npm test 2>&1 | tee debug/post-fix-tests.log

    # 3. Check for regressions
    echo "Step 3: Check for regressions..."
    git diff HEAD -- . | grep -E "^\+" | grep -v "^+++" | head -20

    # 4. Verify no new errors
    echo "Step 4: Lint check..."
    npm run lint 2>&1 | grep -i "error" && {
        echo "⚠️  New linting errors introduced"
    } || echo "✓ No new linting errors"

    echo ""
    echo "✓ Fix validation complete"
}

validate_fix

Fix Documentation:

## Solution

### Root Cause
[Detailed explanation of what caused the issue]

### Fix Applied
[Description of the solution]

```diff
// Before
- problematic code

// After
+ corrected code

Verification

  • Original reproduction no longer triggers issue
  • All tests passing
  • No regressions introduced
  • Edge cases handled

Prevention

[How to prevent similar issues in the future]

  • Add test coverage for [scenario]
  • Update validation to catch [condition]
  • Add monitoring for [metric]

## Phase 6: Regression Prevention

I'll add safeguards to prevent recurrence:

**Test Addition:**

```bash
#!/bin/bash
# Add regression test

add_regression_test() {
    local test_framework="$1"

    case $test_framework in
        "jest")
            cat >> tests/regression.test.js << 'EOF'

describe('Regression: [Issue Description]', () => {
  test('should not reproduce issue #123', async () => {
    // Reproduce the scenario that previously failed
    const result = await functionThatHadBug();

    // Assert correct behavior
    expect(result).toBe(expectedValue);
  });
});
EOF
            ;;

        "pytest")
            cat >> tests/test_regression.py << 'EOF'

def test_issue_123_regression():
    """Regression test for [issue description]"""
    # Reproduce the scenario
    result = function_that_had_bug()

    # Assert correct behavior
    assert result == expected_value
EOF
            ;;
    esac

    echo "Added regression test to prevent future occurrence"
}

Context Continuity

Session Resume: When you return and run /debug-systematic or /debug-systematic resume:

  • Load debugging plan and hypothesis results
  • Show which hypotheses have been tested
  • Continue from next untested hypothesis
  • Track full debugging timeline

Progress Example:

RESUMING DEBUGGING SESSION
├── Issue: API timeout on user search
├── Hypotheses: 5 total
├── Tested: 3 (2 disproved, 1 confirmed)
├── Current: Testing database query optimization
└── Status: Root cause identified

Continuing investigation...

Practical Examples

Start Debugging:

/debug-systematic "API returns 500 on POST /users"
/debug-systematic reproduce    # Create reproduction steps
/debug-systematic             # Auto-resume if session exists

Hypothesis Testing:

/debug-systematic test 1      # Test specific hypothesis
/debug-systematic isolate     # Create minimal reproduction
/debug-systematic bisect      # Git bisect to find regression

Session Control:

/debug-systematic resume      # Continue debugging
/debug-systematic status      # Show current progress
/debug-systematic solved      # Mark as solved and summarize

Debugging Techniques

Common Debugging Patterns:

  1. Print Debugging:
add_debug_logging() {
    echo "Adding strategic debug points..."
    # Add before suspected issue
    # Add after suspected issue
    # Compare outputs
}
  1. Rubber Duck Debugging:
## Explain to Rubber Duck
1. What the code should do: [expected behavior]
2. What the code actually does: [actual behavior]
3. Step-by-step execution: [trace through]
4. Where it diverges: [AHA moment]
  1. Divide and Conquer:
# Comment out half the code
# Does issue persist?
# - Yes: Issue in remaining half
# - No: Issue in commented half
# Repeat until isolated

Safety Guarantees

Protection Measures:

  • Git checkpoints before each test
  • Automated state restoration
  • No destructive operations without confirmation
  • Clear rollback paths

Important: I will NEVER:

  • Modify production code without validation
  • Skip hypothesis testing
  • Apply fixes without verification
  • Add AI attribution

Skill Integration

When appropriate, I may suggest:

  • /test - Run comprehensive test suite
  • /security-scan - Check if bug is security-related
  • /commit - Commit fix with clear message

Advanced Debugging Tools

Performance Profiling:

profile_performance() {
    # Node.js profiling
    node --prof app.js
    node --prof-process isolate-*.log > profile.txt

    # Python profiling
    python -m cProfile -o profile.stats script.py
    python -m pstats profile.stats
}

Memory Leak Detection:

detect_memory_leak() {
    # Monitor memory over time
    while true; do
        ps aux | grep node | awk '{print $6}' | head -1
        sleep 5
    done | tee memory.log

    # Analyze pattern
    gnuplot << 'EOF'
set terminal png
set output 'memory-usage.png'
plot 'memory.log' with lines
EOF
}

Network Debugging:

debug_network() {
    # Capture network traffic
    tcpdump -i any -w debug/network.pcap port 3000

    # Analyze with tshark
    tshark -r debug/network.pcap -Y "http.response.code >= 400"
}

What I'll Actually Do

  1. Gather information - Comprehensive context using Grep
  2. Reproduce issue - Create reliable reproduction
  3. Form hypotheses - Prioritized theories about cause
  4. Test systematically - Validate each hypothesis
  5. Isolate problem - Minimal reproducible case
  6. Implement fix - Targeted solution
  7. Prevent regression - Add tests and monitoring

I'll maintain complete debugging session continuity, tracking all hypotheses and results across sessions.

Credits: Systematic debugging methodology based on scientific method and debugging best practices from "Debugging: The 9 Indispensable Rules" by David Agans.

Weekly Installs
4
GitHub Stars
1
First Seen
Feb 21, 2026
Installed on
opencode4
gemini-cli4
github-copilot4
codex4
kimi-cli4
amp4