Systematic Debugging Workflow

I'll help you debug issues systematically using the scientific method - hypothesis formation, testing, and iterative refinement.

Arguments: $ARGUMENTS - error description, reproduction steps, or context

Token Optimization

Target: 50% reduction (4,000-6,000 → 1,500-3,000 tokens)

Core Optimization Strategies

1. Hypothesis-Driven Debugging (Not Exhaustive Analysis)

❌ AVOID: Reading entire codebase to find bugs
✅ DO: Form hypotheses about likely causes, test top 2-3 first
Token savings: 90% (200 tokens vs 2,000+ tokens)
Pattern: Prioritize recently changed files, common failure patterns

2. Git Diff for Recently Changed Files (Likely Bug Source)

❌ AVOID: ls -R then reading all files
✅ DO: git diff --name-only HEAD~3..HEAD to find changed files
✅ DO: git log --oneline --since="3 days ago" for recent commits
Token savings: 85% (300 tokens vs 2,000+ tokens)
Pattern: Bugs often introduced in recent changes

3. Stack Trace Parsing with Grep

❌ AVOID: Reading entire log files with Read tool
✅ DO: grep -i "error\|exception\|fatal" logs/*.log | tail -20
✅ DO: Parse stack traces to extract file paths and line numbers
Token savings: 95% (100 tokens vs 2,000+ tokens for large logs)
Pattern: Stack traces reveal exact failure locations

4. Test Failure Analysis Caching

✅ Cache test results in debug/state.json
✅ Cache hypothesis outcomes to avoid retesting
✅ Cache reproduction steps once confirmed
Token savings: 70% on subsequent debugging turns
Pattern: Multi-turn debugging sessions benefit from state

5. Progressive Investigation (Narrow Before Deep)

✅ Start with stack trace → identify file → read specific function
✅ Hypothesis testing: test most likely causes first
✅ Binary search through git history when needed
Token savings: 60% (stop early when cause found)
Pattern: Most bugs have obvious causes in changed code

6. Session State Tracking for Multi-Turn Debugging

✅ Session files in debug/ directory
✅ Track tested hypotheses to avoid repetition
✅ Resume from last checkpoint on subsequent runs
Token savings: 80% on resumed sessions (skip completed work)
Pattern: Complex bugs require multiple debugging turns

Token Usage by Operation

Operation	Unoptimized	Optimized	Savings
Initial bug analysis	2,000-3,000	500-1,000	60-75%
Hypothesis formation	1,500-2,000	400-800	60-73%
Stack trace parsing	2,000+	100-200	90-95%
File investigation	2,000+	300-600	70-85%
Test reproduction	1,000-1,500	200-400	73-80%
Session resume	2,000-3,000	300-600	80-85%

Average Reduction: 50% (4,000-6,000 → 1,500-3,000 tokens)

Debugging-Specific Patterns

Stack Trace Analysis:

# Extract file paths and line numbers from stack traces
grep -E "at .+ \(.+:[0-9]+:[0-9]+\)" error.log | head -10
# Focus investigation on these specific files/lines

Recent Changes Focus:

# Find files changed in last 3 days (likely bug sources)
git diff --name-only HEAD~10..HEAD
# Only read files that changed recently

Hypothesis Prioritization:

Recent changes (80% of bugs) - Check git diff first
Stack trace files (90% reliability) - Read exact failure locations
Error message patterns (70% of bugs) - Grep for similar errors
Environment/config (20% of bugs) - Check if configs changed
External dependencies (10% of bugs) - Check updates

Binary Search for Regressions:

# Use git bisect to find regression commit
git bisect start HEAD v1.2.3
git bisect run npm test  # Automated testing
# Saves 95% tokens vs manual testing each commit

Caching Behavior

Session Location: debug/ (in project root)

debug/plan.md - Debugging plan with hypotheses and results
debug/state.json - Session state and test results
debug/reproduction.log - Issue reproduction steps and logs

Cache Location: .claude/cache/debug/

hypotheses.json - Tested hypotheses and outcomes
stack-traces.json - Parsed stack trace information
changed-files.json - Recently changed files analysis

Cache Validity:

Until issue resolved (status: "solved" in state.json)
Until source files change (checksum-based)
7 days maximum for stale sessions

Shared With:

/debug-root-cause - Root cause analysis skill
/debug-session - Debug session documentation
/test - Test execution for verification

Usage Examples

Start New Debugging Session:

debug-systematic "API returns 500 on POST /users"
# Expected tokens: 1,500-3,000 (full analysis)

Resume Existing Session:

debug-systematic resume
# Expected tokens: 800-1,500 (skips completed hypotheses)

Test Specific Hypothesis:

debug-systematic test 1
# Expected tokens: 500-1,000 (focused testing)

Check Debugging Progress:

debug-systematic status
# Expected tokens: 200-500 (read session state only)

Mark Issue as Solved:

debug-systematic solved
# Expected tokens: 300-600 (generate summary)

Early Exit Conditions

Exit immediately (saves 90% tokens) when:

✅ Issue already solved (check debug/state.json status)
✅ No test framework available (can't reproduce)
✅ Not a git repository (can't check recent changes)
✅ Root cause already identified in session state

Progressive disclosure saves 60-80% tokens:

Show hypothesis formation → wait for user confirmation
Test one hypothesis at a time → report results
Only deep dive when hypothesis confirms

Implementation Checklist

✅ Git diff analysis for recent changes (PRIMARY optimization)
✅ Stack trace parsing with Grep (saves 90-95%)
✅ Session-based hypothesis tracking (saves 70-80% on reruns)
✅ Progressive hypothesis testing (most likely → least likely)
✅ Bash-based log analysis (minimal tokens)
✅ Test failure result caching
✅ Early exit when issue resolved
✅ Binary search for regressions (git bisect)
✅ Focus area flags (specific file/function debugging)

Optimization Status: ✅ Optimized (Phase 2 Batch 2, 2026-01-26) Expected Tokens: 1,500-3,000 (vs. 4,000-6,000 unoptimized) Achieved Reduction: 50% average across all debugging operations

Session Intelligence

I'll maintain debugging session continuity:

Session Files (in current project directory):

debug/plan.md - Debugging plan with hypotheses and results
debug/state.json - Session state and test results
debug/reproduction.log - Issue reproduction steps and logs

IMPORTANT: Session files are stored in a debug folder in your current project root

Auto-Detection:

If session exists: Resume debugging from last hypothesis
If no session: Create debugging plan and initial reproduction
Commands: resume, reproduce, status, solved

Phase 1: Issue Reproduction & Information Gathering

Extended Thinking for Complex Debugging

For complex or elusive bugs, I'll use extended thinking to explore debugging strategies:

Triggers for Extended Analysis:

Intermittent or non-deterministic bugs
Production-only failures
Performance issues without obvious cause
Security vulnerabilities
Multi-component system failures

MANDATORY FIRST STEPS:

Check if debug directory exists in current working directory
If directory exists, check for session files:
- Look for debug/state.json
- Look for debug/plan.md
- If found, resume from last hypothesis
If no directory or session exists:
- Gather error information
- Create reproduction steps
- Initialize debugging session

Information Gathering (Token-Efficient):

#!/bin/bash
# Systematic Debugging - Information Gathering

gather_debug_info() {
    echo "=== Issue Reproduction Information ==="
    echo ""

    # 1. Error logs (use Grep, not cat)
    echo "Recent error logs:"
    if [ -d "logs" ]; then
        grep -i "error\|exception\|fatal" logs/*.log 2>/dev/null | tail -20 || echo "  No errors in logs"
    fi

    # 2. Git status (what changed recently)
    echo ""
    echo "Recent changes:"
    git log --oneline --since="3 days ago" | head -10 || echo "  Not a git repository"

    # 3. Environment info
    echo ""
    echo "Environment:"
    if [ -f "package.json" ]; then
        echo "  Node: $(node --version 2>/dev/null || echo 'not installed')"
        echo "  NPM: $(npm --version 2>/dev/null || echo 'not installed')"
    elif [ -f "requirements.txt" ]; then
        echo "  Python: $(python --version 2>/dev/null || echo 'not installed')"
    fi

    # 4. System resources
    echo ""
    echo "System resources:"
    echo "  Memory: $(free -h 2>/dev/null | grep Mem | awk '{print $3 "/" $2}' || echo 'N/A')"
    echo "  Disk: $(df -h . 2>/dev/null | tail -1 | awk '{print $3 "/" $2 " (" $5 ")"}' || echo 'N/A')"

    # 5. Running processes (if server issue)
    echo ""
    echo "Relevant processes:"
    ps aux | grep -E "node|python|java" | grep -v grep | head -5 || echo "  No relevant processes"
}

gather_debug_info > debug/initial-state.log
cat debug/initial-state.log

Reproduction Steps:

#!/bin/bash
# Create reproducible test case

create_reproduction() {
    cat > debug/reproduction.sh << 'EOF'
#!/bin/bash
# Minimal reproduction script

echo "=== Bug Reproduction Steps ==="
echo ""
echo "Step 1: Setup environment"
# TODO: Add setup commands

echo "Step 2: Execute actions that trigger bug"
# TODO: Add trigger commands

echo "Step 3: Verify bug occurs"
# TODO: Add verification

echo ""
echo "Expected: [describe expected behavior]"
echo "Actual: [describe actual behavior]"
EOF

    chmod +x debug/reproduction.sh
    echo "Created reproduction script: debug/reproduction.sh"
}

create_reproduction

Phase 2: Hypothesis Formation

I'll formulate testable hypotheses about the root cause:

Hypothesis Generation Framework:

# Debugging Plan - [timestamp]

## Issue Description
**Summary**: [brief description]
**Severity**: Critical | High | Medium | Low
**Impact**: [affected users/systems]
**Frequency**: Always | Intermittent | Rare

## Error Details

[Full error message/stack trace]


## Environment
- **Platform**: [OS, runtime version]
- **Configuration**: [relevant settings]
- **Recent Changes**: [commits/deployments]

## Hypotheses (Prioritized)

### Hypothesis 1: [Most likely cause] - PRIORITY: HIGH
**Theory**: [explanation of suspected cause]
**Evidence**: [supporting observations]
**Test**: [how to verify/disprove]
**Expected**: [what should happen if correct]
**Result**: [ ] Pending | [ ] Confirmed | [ ] Disproved

### Hypothesis 2: [Second most likely] - PRIORITY: MEDIUM
**Theory**: [explanation]
**Evidence**: [observations]
**Test**: [verification method]
**Expected**: [expected outcome]
**Result**: [ ] Pending | [ ] Confirmed | [ ] Disproved

### Hypothesis 3: [Alternative cause] - PRIORITY: LOW
**Theory**: [explanation]
**Evidence**: [observations]
**Test**: [verification method]
**Expected**: [expected outcome]
**Result**: [ ] Pending | [ ] Confirmed | [ ] Disproved

## Investigation Log
- [timestamp]: Initial reproduction successful
- [timestamp]: Hypothesis 1 testing in progress

Hypothesis Prioritization:

Recent changes - Check git history
Common patterns - Known bug categories
Environment issues - Dependencies, config
Logic errors - Code analysis
External factors - Third-party services

Phase 3: Systematic Testing

I'll test each hypothesis methodically:

Testing Framework:

#!/bin/bash
# Hypothesis Testing Script

test_hypothesis() {
    local hypothesis_num="$1"
    local test_description="$2"

    echo "=== Testing Hypothesis $hypothesis_num ==="
    echo "Test: $test_description"
    echo ""

    # Create checkpoint before testing
    git stash push -m "Debug checkpoint before hypothesis $hypothesis_num"

    # Run test
    local result="PENDING"

    # Log result
    echo "[$hypothesis_num] $test_description: $result" >> debug/test-results.log
}

# Example: Test hypothesis about missing dependency
test_dependency_hypothesis() {
    echo "Hypothesis: Missing or incompatible dependency"

    # Check dependency versions
    if [ -f "package.json" ]; then
        echo "Checking npm dependencies..."
        npm list --depth=0 2>&1 | grep -i "missing\|error" && {
            echo "❌ CONFIRMED: Missing dependencies detected"
            return 0
        }
    fi

    echo "✓ DISPROVED: All dependencies present"
    return 1
}

# Example: Test hypothesis about race condition
test_race_condition_hypothesis() {
    echo "Hypothesis: Race condition in async code"

    # Add delays to test timing sensitivity
    echo "Running test with delays..."
    # TODO: Add test with deliberate delays

    echo "Running test rapidly..."
    for i in {1..10}; do
        # TODO: Run test in tight loop
        true
    done
}

# Test each hypothesis in priority order
test_dependency_hypothesis
test_race_condition_hypothesis

Binary Search Debugging:

#!/bin/bash
# Binary search through git history to find regression

git_bisect_debug() {
    echo "=== Git Bisect Debugging ==="

    # Find last known good commit
    read -p "Enter last known good commit (or tag): " good_commit
    read -p "Enter first known bad commit (or 'HEAD'): " bad_commit

    git bisect start
    git bisect bad $bad_commit
    git bisect good $good_commit

    cat > debug/bisect-test.sh << 'EOF'
#!/bin/bash
# Automated bisect test script

# Run test
npm test || exit 1  # Exit 1 if bad, 0 if good

# Or manual verification
echo "Test the current commit and press:"
echo "  g - if this commit is good"
echo "  b - if this commit is bad"
read -n 1 response
[ "$response" = "g" ] && exit 0 || exit 1
EOF

    chmod +x debug/bisect-test.sh
    echo "Run: git bisect run ./debug/bisect-test.sh"
}

Phase 4: Isolation & Simplification

I'll create minimal test cases:

Issue Isolation:

#!/bin/bash
# Create minimal reproducible example

create_minimal_reproduction() {
    local issue_type="$1"

    mkdir -p debug/minimal-case

    case $issue_type in
        "api")
            cat > debug/minimal-case/test.js << 'EOF'
// Minimal API test case
const fetch = require('node-fetch');

async function testIssue() {
    const response = await fetch('http://localhost:3000/api/endpoint');
    const data = await response.json();
    console.log('Response:', data);
    // Add assertion that fails
}

testIssue().catch(console.error);
EOF
            ;;

        "frontend")
            cat > debug/minimal-case/test.html << 'EOF'
<!DOCTYPE html>
<html>
<head>
    <title>Minimal Test Case</title>
</head>
<body>
    <button id="testBtn">Click to trigger issue</button>
    <div id="output"></div>

    <script>
        document.getElementById('testBtn').addEventListener('click', () => {
            // Minimal code to reproduce issue
            console.log('Testing...');
        });
    </script>
</body>
</html>
EOF
            ;;

        "database")
            cat > debug/minimal-case/test.sql << 'EOF'
-- Minimal database query to reproduce issue
BEGIN TRANSACTION;

-- Setup test data
CREATE TEMP TABLE test_data (id INT, value TEXT);
INSERT INTO test_data VALUES (1, 'test');

-- Query that demonstrates issue
SELECT * FROM test_data WHERE condition;

ROLLBACK;
EOF
            ;;
    esac

    echo "Created minimal test case in debug/minimal-case/"
}

Phase 5: Solution Implementation

Once root cause is identified, I'll implement the fix:

Fix Validation:

#!/bin/bash
# Validate fix before committing

validate_fix() {
    echo "=== Fix Validation ==="

    # 1. Run original reproduction - should now pass
    echo "Step 1: Run original reproduction..."
    if [ -f "debug/reproduction.sh" ]; then
        ./debug/reproduction.sh && echo "✓ Original issue resolved" || {
            echo "❌ Issue still reproduces"
            return 1
        }
    fi

    # 2. Run full test suite
    echo "Step 2: Run test suite..."
    npm test 2>&1 | tee debug/post-fix-tests.log

    # 3. Check for regressions
    echo "Step 3: Check for regressions..."
    git diff HEAD -- . | grep -E "^\+" | grep -v "^+++" | head -20

    # 4. Verify no new errors
    echo "Step 4: Lint check..."
    npm run lint 2>&1 | grep -i "error" && {
        echo "⚠️  New linting errors introduced"
    } || echo "✓ No new linting errors"

    echo ""
    echo "✓ Fix validation complete"
}

validate_fix

Fix Documentation:

## Solution

### Root Cause
[Detailed explanation of what caused the issue]

### Fix Applied
[Description of the solution]

```diff
// Before
- problematic code

// After
+ corrected code

Verification

Original reproduction no longer triggers issue
All tests passing
No regressions introduced
Edge cases handled

Prevention

[How to prevent similar issues in the future]

Add test coverage for [scenario]
Update validation to catch [condition]
Add monitoring for [metric]


## Phase 6: Regression Prevention

I'll add safeguards to prevent recurrence:

**Test Addition:**

```bash
#!/bin/bash
# Add regression test

add_regression_test() {
    local test_framework="$1"

    case $test_framework in
        "jest")
            cat >> tests/regression.test.js << 'EOF'

describe('Regression: [Issue Description]', () => {
  test('should not reproduce issue #123', async () => {
    // Reproduce the scenario that previously failed
    const result = await functionThatHadBug();

    // Assert correct behavior
    expect(result).toBe(expectedValue);
  });
});
EOF
            ;;

        "pytest")
            cat >> tests/test_regression.py << 'EOF'

def test_issue_123_regression():
    """Regression test for [issue description]"""
    # Reproduce the scenario
    result = function_that_had_bug()

    # Assert correct behavior
    assert result == expected_value
EOF
            ;;
    esac

    echo "Added regression test to prevent future occurrence"
}

Context Continuity

Session Resume: When you return and run /debug-systematic or /debug-systematic resume:

Load debugging plan and hypothesis results
Show which hypotheses have been tested
Continue from next untested hypothesis
Track full debugging timeline

Progress Example:

RESUMING DEBUGGING SESSION
├── Issue: API timeout on user search
├── Hypotheses: 5 total
├── Tested: 3 (2 disproved, 1 confirmed)
├── Current: Testing database query optimization
└── Status: Root cause identified

Continuing investigation...

Practical Examples

Start Debugging:

/debug-systematic "API returns 500 on POST /users"
/debug-systematic reproduce    # Create reproduction steps
/debug-systematic             # Auto-resume if session exists

Hypothesis Testing:

/debug-systematic test 1      # Test specific hypothesis
/debug-systematic isolate     # Create minimal reproduction
/debug-systematic bisect      # Git bisect to find regression

Session Control:

/debug-systematic resume      # Continue debugging
/debug-systematic status      # Show current progress
/debug-systematic solved      # Mark as solved and summarize

Debugging Techniques

Common Debugging Patterns:

Print Debugging:

add_debug_logging() {
    echo "Adding strategic debug points..."
    # Add before suspected issue
    # Add after suspected issue
    # Compare outputs
}

Rubber Duck Debugging:

## Explain to Rubber Duck
1. What the code should do: [expected behavior]
2. What the code actually does: [actual behavior]
3. Step-by-step execution: [trace through]
4. Where it diverges: [AHA moment]

Divide and Conquer:

# Comment out half the code
# Does issue persist?
# - Yes: Issue in remaining half
# - No: Issue in commented half
# Repeat until isolated

Safety Guarantees

Protection Measures:

Git checkpoints before each test
Automated state restoration
No destructive operations without confirmation
Clear rollback paths

Important: I will NEVER:

Modify production code without validation
Skip hypothesis testing
Apply fixes without verification
Add AI attribution

Skill Integration

When appropriate, I may suggest:

/test - Run comprehensive test suite
/security-scan - Check if bug is security-related
/commit - Commit fix with clear message

Advanced Debugging Tools

Performance Profiling:

profile_performance() {
    # Node.js profiling
    node --prof app.js
    node --prof-process isolate-*.log > profile.txt

    # Python profiling
    python -m cProfile -o profile.stats script.py
    python -m pstats profile.stats
}

Memory Leak Detection:

detect_memory_leak() {
    # Monitor memory over time
    while true; do
        ps aux | grep node | awk '{print $6}' | head -1
        sleep 5
    done | tee memory.log

    # Analyze pattern
    gnuplot << 'EOF'
set terminal png
set output 'memory-usage.png'
plot 'memory.log' with lines
EOF
}

Network Debugging:

debug_network() {
    # Capture network traffic
    tcpdump -i any -w debug/network.pcap port 3000

    # Analyze with tshark
    tshark -r debug/network.pcap -Y "http.response.code >= 400"
}

What I'll Actually Do

Gather information - Comprehensive context using Grep
Reproduce issue - Create reliable reproduction
Form hypotheses - Prioritized theories about cause
Test systematically - Validate each hypothesis
Isolate problem - Minimal reproducible case
Implement fix - Targeted solution
Prevent regression - Add tests and monitoring

I'll maintain complete debugging session continuity, tracking all hypotheses and results across sessions.

Credits: Systematic debugging methodology based on scientific method and debugging best practices from "Debugging: The 9 Indispensable Rules" by David Agans.

debug-systematic