agent-debugger
Agent Debugger
Overview
Debug AI agent issues systematically using analysis scripts and proven debugging patterns. This skill helps identify root causes of common agent failures: incorrect responses, tool calling errors, conversation loops, performance problems, and more.
When to Use This Skill
Trigger this skill when:
- Agent gives wrong or irrelevant responses
- Tools are not being called or are called incorrectly
- Conversation gets stuck in loops or repeated patterns
- Agent performance is slow or inconsistent
- Tool executions are failing or returning errors
- Need to analyze conversation logs or API traces
Debugging Workflow
Step 1: Gather Diagnostic Data
Collect these artifacts from the user:
- Conversation logs - Full transcript or chat history
- API request/response logs - Raw LLM API calls if available
- Tool execution logs - Records of tool calls and outputs
- Agent configuration - System prompts, tool schemas, settings
- Description of the issue - What's wrong and when it occurs
Step 2: Run Automated Analysis
Use the appropriate analysis scripts based on symptoms:
For general conversation issues:
python scripts/analyze_conversation.py <log_file>
Analyzes role distribution, message patterns, detects potential issues, provides summary metrics.
For suspected loops or stuck states:
python scripts/detect_loops.py <log_file> [--threshold 2] [--window 5]
Detects exact loops, fuzzy patterns, stuck states, and ping-pong exchanges.
For tool/function calling problems:
python scripts/analyze_tool_calls.py <log_file> [--schema tool_schema.json]
Analyzes tool usage patterns, validates against schema, detects errors and retry loops.
For performance/latency issues:
python scripts/analyze_performance.py <log_file>
Calculates latency statistics, identifies slow responses, analyzes performance by role.
Note: Scripts accept JSON-formatted logs. For text logs, analyze_conversation.py can auto-detect and parse common formats.
Step 3: Interpret Results
Review script outputs and identify patterns:
- Check for warnings and issues flagged by scripts
- Look at metrics (latency, token usage, tool call counts)
- Examine repeated patterns or anomalies
- Cross-reference with common failure modes
Step 4: Match to Known Patterns
Consult the debugging patterns reference:
Read references/debugging-patterns.md
This comprehensive guide covers:
- Conversation Loops - Symptoms, causes, solutions
- Tool Calling Failures - Detection and fixes
- Context Window Exhaustion - Management strategies
- Incorrect Responses - Prompt engineering fixes
- Performance Issues - Optimization techniques
- Tool Execution Errors - Error handling approaches
- State Management Issues - Tracking strategies
Each pattern includes:
- Observable symptoms
- Root causes
- Concrete solutions
- Detection methods
Step 5: Recommend Solutions
Based on analysis and pattern matching:
- Identify root cause - What's actually broken?
- Propose specific fixes - Concrete changes to prompts, tools, or config
- Explain reasoning - Why this will solve the problem
- Suggest testing - How to verify the fix works
- Preventive measures - How to avoid similar issues
Step 6: Provide Best Practices
For broader improvements, reference:
Read references/agent-best-practices.md
Covers:
- System prompt design principles
- Tool design and implementation
- Conversation management strategies
- Error handling approaches
- Quality assurance and monitoring
- Optimization techniques
Log Format Requirements
Scripts work best with structured JSON logs:
Minimal format:
[
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there!"}
]
With tool calls (OpenAI/Anthropic format):
[
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_123",
"type": "function",
"function": {
"name": "search_kb",
"arguments": "{\"query\": \"password reset\"}"
}
}
]
},
{
"role": "tool",
"tool_call_id": "call_123",
"content": "Article: How to reset your password..."
}
]
With timestamps and metadata:
[
{
"role": "user",
"content": "Hello",
"timestamp": "2024-01-15T10:30:00Z",
"message_id": "msg_1"
},
{
"role": "assistant",
"content": "Hi there!",
"timestamp": "2024-01-15T10:30:02Z",
"usage": {
"prompt_tokens": 50,
"completion_tokens": 10,
"total_tokens": 60
}
}
]
Scripts auto-detect format and extract available information.
Quick Diagnostic Checklist
Agent not responding:
- Check API connectivity and auth
- Review error logs
- Verify configuration is valid
- Check rate limits
Wrong/irrelevant responses:
- Review system prompt clarity
- Check if appropriate tools are called
- Verify necessary context is present
- Test with clearer user input
Conversation stuck/looping:
- Run
detect_loops.py - Check for repeated tool errors
- Review last few agent responses
- Add explicit loop break conditions
Tool calling issues:
- Run
analyze_tool_calls.pywith schema - Validate tool descriptions are clear
- Check tool implementation for bugs
- Test tools independently
Performance problems:
- Run
analyze_performance.py - Check token usage and context length
- Review tool execution times
- Consider model/infrastructure
Example Debugging Session
User reports: "Agent keeps asking for the same information repeatedly"
Analysis approach:
- Collect conversation log
- Run
detect_loops.py→ Confirms ping-pong pattern detected - Run
analyze_conversation.py→ Shows high repeated content - Review conversation → Agent not retaining context from earlier messages
- Consult
debugging-patterns.md→ Matches "State Management Issues" - Solution: Add explicit state tracking to system prompt, include conversation summary
- Test: Verify agent now references earlier information
- Document: Record fix and add to monitoring
Resources
scripts/
Analysis utilities that can be run directly on log files:
analyze_conversation.py- General conversation analysisdetect_loops.py- Loop and pattern detectionanalyze_tool_calls.py- Tool usage analysis and validationanalyze_performance.py- Performance and latency analysis
references/
In-depth debugging knowledge:
debugging-patterns.md- Common failure modes and solutions (read when interpreting analysis results)agent-best-practices.md- Design and implementation best practices (read when providing recommendations)