agent-debugger

SKILL.md

Agent Debugger

Overview

Debug AI agent issues systematically using analysis scripts and proven debugging patterns. This skill helps identify root causes of common agent failures: incorrect responses, tool calling errors, conversation loops, performance problems, and more.

When to Use This Skill

Trigger this skill when:

  • Agent gives wrong or irrelevant responses
  • Tools are not being called or are called incorrectly
  • Conversation gets stuck in loops or repeated patterns
  • Agent performance is slow or inconsistent
  • Tool executions are failing or returning errors
  • Need to analyze conversation logs or API traces

Debugging Workflow

Step 1: Gather Diagnostic Data

Collect these artifacts from the user:

  • Conversation logs - Full transcript or chat history
  • API request/response logs - Raw LLM API calls if available
  • Tool execution logs - Records of tool calls and outputs
  • Agent configuration - System prompts, tool schemas, settings
  • Description of the issue - What's wrong and when it occurs

Step 2: Run Automated Analysis

Use the appropriate analysis scripts based on symptoms:

For general conversation issues:

python scripts/analyze_conversation.py <log_file>

Analyzes role distribution, message patterns, detects potential issues, provides summary metrics.

For suspected loops or stuck states:

python scripts/detect_loops.py <log_file> [--threshold 2] [--window 5]

Detects exact loops, fuzzy patterns, stuck states, and ping-pong exchanges.

For tool/function calling problems:

python scripts/analyze_tool_calls.py <log_file> [--schema tool_schema.json]

Analyzes tool usage patterns, validates against schema, detects errors and retry loops.

For performance/latency issues:

python scripts/analyze_performance.py <log_file>

Calculates latency statistics, identifies slow responses, analyzes performance by role.

Note: Scripts accept JSON-formatted logs. For text logs, analyze_conversation.py can auto-detect and parse common formats.

Step 3: Interpret Results

Review script outputs and identify patterns:

  • Check for warnings and issues flagged by scripts
  • Look at metrics (latency, token usage, tool call counts)
  • Examine repeated patterns or anomalies
  • Cross-reference with common failure modes

Step 4: Match to Known Patterns

Consult the debugging patterns reference:

Read references/debugging-patterns.md

This comprehensive guide covers:

  1. Conversation Loops - Symptoms, causes, solutions
  2. Tool Calling Failures - Detection and fixes
  3. Context Window Exhaustion - Management strategies
  4. Incorrect Responses - Prompt engineering fixes
  5. Performance Issues - Optimization techniques
  6. Tool Execution Errors - Error handling approaches
  7. State Management Issues - Tracking strategies

Each pattern includes:

  • Observable symptoms
  • Root causes
  • Concrete solutions
  • Detection methods

Step 5: Recommend Solutions

Based on analysis and pattern matching:

  1. Identify root cause - What's actually broken?
  2. Propose specific fixes - Concrete changes to prompts, tools, or config
  3. Explain reasoning - Why this will solve the problem
  4. Suggest testing - How to verify the fix works
  5. Preventive measures - How to avoid similar issues

Step 6: Provide Best Practices

For broader improvements, reference:

Read references/agent-best-practices.md

Covers:

  • System prompt design principles
  • Tool design and implementation
  • Conversation management strategies
  • Error handling approaches
  • Quality assurance and monitoring
  • Optimization techniques

Log Format Requirements

Scripts work best with structured JSON logs:

Minimal format:

[
  {"role": "user", "content": "Hello"},
  {"role": "assistant", "content": "Hi there!"}
]

With tool calls (OpenAI/Anthropic format):

[
  {
    "role": "assistant",
    "content": null,
    "tool_calls": [
      {
        "id": "call_123",
        "type": "function",
        "function": {
          "name": "search_kb",
          "arguments": "{\"query\": \"password reset\"}"
        }
      }
    ]
  },
  {
    "role": "tool",
    "tool_call_id": "call_123",
    "content": "Article: How to reset your password..."
  }
]

With timestamps and metadata:

[
  {
    "role": "user",
    "content": "Hello",
    "timestamp": "2024-01-15T10:30:00Z",
    "message_id": "msg_1"
  },
  {
    "role": "assistant",
    "content": "Hi there!",
    "timestamp": "2024-01-15T10:30:02Z",
    "usage": {
      "prompt_tokens": 50,
      "completion_tokens": 10,
      "total_tokens": 60
    }
  }
]

Scripts auto-detect format and extract available information.

Quick Diagnostic Checklist

Agent not responding:

  • Check API connectivity and auth
  • Review error logs
  • Verify configuration is valid
  • Check rate limits

Wrong/irrelevant responses:

  • Review system prompt clarity
  • Check if appropriate tools are called
  • Verify necessary context is present
  • Test with clearer user input

Conversation stuck/looping:

  • Run detect_loops.py
  • Check for repeated tool errors
  • Review last few agent responses
  • Add explicit loop break conditions

Tool calling issues:

  • Run analyze_tool_calls.py with schema
  • Validate tool descriptions are clear
  • Check tool implementation for bugs
  • Test tools independently

Performance problems:

  • Run analyze_performance.py
  • Check token usage and context length
  • Review tool execution times
  • Consider model/infrastructure

Example Debugging Session

User reports: "Agent keeps asking for the same information repeatedly"

Analysis approach:

  1. Collect conversation log
  2. Run detect_loops.py → Confirms ping-pong pattern detected
  3. Run analyze_conversation.py → Shows high repeated content
  4. Review conversation → Agent not retaining context from earlier messages
  5. Consult debugging-patterns.md → Matches "State Management Issues"
  6. Solution: Add explicit state tracking to system prompt, include conversation summary
  7. Test: Verify agent now references earlier information
  8. Document: Record fix and add to monitoring

Resources

scripts/

Analysis utilities that can be run directly on log files:

  • analyze_conversation.py - General conversation analysis
  • detect_loops.py - Loop and pattern detection
  • analyze_tool_calls.py - Tool usage analysis and validation
  • analyze_performance.py - Performance and latency analysis

references/

In-depth debugging knowledge:

  • debugging-patterns.md - Common failure modes and solutions (read when interpreting analysis results)
  • agent-best-practices.md - Design and implementation best practices (read when providing recommendations)
Weekly Installs
3
GitHub Stars
6
First Seen
Jan 30, 2026
Installed on
claude-code3
cursor2
mcpjam1
kilo1
windsurf1
zencoder1