agent-debugger
Agent Debugger
Overview
Debug AI agent issues systematically using analysis scripts and proven debugging patterns. This skill helps identify root causes of common agent failures: incorrect responses, tool calling errors, conversation loops, performance problems, and more.
When to Use This Skill
Trigger this skill when:
- Agent gives wrong or irrelevant responses
- Tools are not being called or are called incorrectly
- Conversation gets stuck in loops or repeated patterns
- Agent performance is slow or inconsistent
- Tool executions are failing or returning errors
- Need to analyze conversation logs or API traces
Debugging Workflow
Step 1: Gather Diagnostic Data
Collect these artifacts from the user:
- Conversation logs - Full transcript or chat history
- API request/response logs - Raw LLM API calls if available
- Tool execution logs - Records of tool calls and outputs
- Agent configuration - System prompts, tool schemas, settings
- Description of the issue - What's wrong and when it occurs
Step 2: Run Automated Analysis
Use the appropriate analysis scripts based on symptoms:
For general conversation issues:
python scripts/analyze_conversation.py <log_file>
Analyzes role distribution, message patterns, detects potential issues, provides summary metrics.
For suspected loops or stuck states:
python scripts/detect_loops.py <log_file> [--threshold 2] [--window 5]
Detects exact loops, fuzzy patterns, stuck states, and ping-pong exchanges.
For tool/function calling problems:
python scripts/analyze_tool_calls.py <log_file> [--schema tool_schema.json]
Analyzes tool usage patterns, validates against schema, detects errors and retry loops.
For performance/latency issues:
python scripts/analyze_performance.py <log_file>
Calculates latency statistics, identifies slow responses, analyzes performance by role.
Note: Scripts accept JSON-formatted logs. For text logs, analyze_conversation.py can auto-detect and parse common formats.
Step 3: Interpret Results
Review script outputs and identify patterns:
- Check for warnings and issues flagged by scripts
- Look at metrics (latency, token usage, tool call counts)
- Examine repeated patterns or anomalies
- Cross-reference with common failure modes
Step 4: Match to Known Patterns
Consult the debugging patterns reference:
Read references/debugging-patterns.md
This comprehensive guide covers:
- Conversation Loops - Symptoms, causes, solutions
- Tool Calling Failures - Detection and fixes
- Context Window Exhaustion - Management strategies
- Incorrect Responses - Prompt engineering fixes
- Performance Issues - Optimization techniques
- Tool Execution Errors - Error handling approaches
- State Management Issues - Tracking strategies
Each pattern includes:
- Observable symptoms
- Root causes
- Concrete solutions
- Detection methods
Step 5: Recommend Solutions
Based on analysis and pattern matching:
- Identify root cause - What's actually broken?
- Propose specific fixes - Concrete changes to prompts, tools, or config
- Explain reasoning - Why this will solve the problem
- Suggest testing - How to verify the fix works
- Preventive measures - How to avoid similar issues
Step 6: Provide Best Practices
For broader improvements, reference:
Read references/agent-best-practices.md
Covers:
- System prompt design principles
- Tool design and implementation
- Conversation management strategies
- Error handling approaches
- Quality assurance and monitoring
- Optimization techniques
Log Format Requirements
Scripts work best with structured JSON logs:
Minimal format:
[
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there!"}
]
With tool calls (OpenAI/Anthropic format):
[
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_123",
"type": "function",
"function": {
"name": "search_kb",
"arguments": "{\"query\": \"password reset\"}"
}
}
]
},
{
"role": "tool",
"tool_call_id": "call_123",
"content": "Article: How to reset your password..."
}
]
With timestamps and metadata:
[
{
"role": "user",
"content": "Hello",
"timestamp": "2024-01-15T10:30:00Z",
"message_id": "msg_1"
},
{
"role": "assistant",
"content": "Hi there!",
"timestamp": "2024-01-15T10:30:02Z",
"usage": {
"prompt_tokens": 50,
"completion_tokens": 10,
"total_tokens": 60
}
}
]
Scripts auto-detect format and extract available information.
Quick Diagnostic Checklist
Agent not responding:
- Check API connectivity and auth
- Review error logs
- Verify configuration is valid
- Check rate limits
Wrong/irrelevant responses:
- Review system prompt clarity
- Check if appropriate tools are called
- Verify necessary context is present
- Test with clearer user input
Conversation stuck/looping:
- Run
detect_loops.py - Check for repeated tool errors
- Review last few agent responses
- Add explicit loop break conditions
Tool calling issues:
- Run
analyze_tool_calls.pywith schema - Validate tool descriptions are clear
- Check tool implementation for bugs
- Test tools independently
Performance problems:
- Run
analyze_performance.py - Check token usage and context length
- Review tool execution times
- Consider model/infrastructure
Example Debugging Session
User reports: "Agent keeps asking for the same information repeatedly"
Analysis approach:
- Collect conversation log
- Run
detect_loops.py→ Confirms ping-pong pattern detected - Run
analyze_conversation.py→ Shows high repeated content - Review conversation → Agent not retaining context from earlier messages
- Consult
debugging-patterns.md→ Matches "State Management Issues" - Solution: Add explicit state tracking to system prompt, include conversation summary
- Test: Verify agent now references earlier information
- Document: Record fix and add to monitoring
Resources
scripts/
Analysis utilities that can be run directly on log files:
analyze_conversation.py- General conversation analysisdetect_loops.py- Loop and pattern detectionanalyze_tool_calls.py- Tool usage analysis and validationanalyze_performance.py- Performance and latency analysis
references/
In-depth debugging knowledge:
debugging-patterns.md- Common failure modes and solutions (read when interpreting analysis results)agent-best-practices.md- Design and implementation best practices (read when providing recommendations)
More from avivk5498/my-claude-code-skills
ceo-companion
Collaborative CEO co-pilot for SaaS strategy sessions. Researches markets, validates ideas, designs UI inspiration boards, and produces a .strategy/ folder that Beads Orchestration consumes for autonomous building. Use as Session 1 before a Beads build session.
17runpod-serverless-builder
Build production-ready RunPod serverless endpoints with optimized cold start times. Use when creating or modifying RunPod serverless workers for (1) vLLM-based LLM inference, (2) ComfyUI image/video generation, or (3) custom Python inference. Supports both baked models (fastest cold starts) and dynamic loading (shared models). Generates complete projects including Dockerfiles, worker handlers, startup scripts, and configuration optimized for minimal cold start latency.
12create-beads-orchestration
Bootstrap lean multi-agent orchestration with beads task tracking. Use for projects needing agent delegation without heavy MCP overhead.
9agentform
|
5