Agent Implementation Review

Review AI agent implementations for architectural best practices.

Target: $ARGUMENTS (path to agent project or codebase)

When to Use This Skill

Auditing existing agent implementations
Designing new agent architectures
Reviewing agent code for production readiness
Evaluating multi-agent system designs
Assessing agent reliability and observability

Review Process

Discover - Explore folder structure at $ARGUMENTS
Analyze - Check against architecture patterns
Evaluate - Score each category
Report - Generate findings with recommendations

Folder Structure Best Practices

Recommended Agent Project Structure

agent-project/
├── src/
│   ├── agents/              # Agent definitions
│   │   ├── base.py          # Base agent class
│   │   ├── planner.py       # Planning agent
│   │   └── executor.py      # Execution agent
│   ├── tools/               # Tool implementations
│   │   ├── __init__.py
│   │   ├── base.py          # Tool base class/interface
│   │   ├── search.py        # Search tool
│   │   └── code.py          # Code execution tool
│   ├── memory/              # Memory/state management
│   │   ├── short_term.py    # Conversation context
│   │   ├── long_term.py     # Persistent storage
│   │   └── vector_store.py  # Embeddings/RAG
│   ├── prompts/             # Prompt templates
│   │   ├── system.py        # System prompts
│   │   └── templates/       # Jinja/string templates
│   ├── orchestration/       # Multi-agent coordination
│   │   ├── router.py        # Request routing
│   │   └── workflow.py      # Agent workflows
│   ├── models/              # Data models/schemas
│   │   ├── messages.py      # Message types
│   │   └── state.py         # State schemas
│   └── utils/               # Shared utilities
│       ├── logging.py       # Structured logging
│       └── retry.py         # Retry logic
├── config/                  # Configuration
│   ├── default.yaml         # Default settings
│   └── prompts/             # External prompt files
├── tests/                   # Test suite
│   ├── unit/
│   ├── integration/
│   └── fixtures/            # Test data
└── scripts/                 # CLI/automation

Structure Checklist

Component	Required	Check
Agent definitions separated	Yes	[ ]
Tools in dedicated module	Yes	[ ]
Prompts externalized	Recommended	[ ]
Configuration separated	Yes	[ ]
Tests present	Yes	[ ]
Clear separation of concerns	Yes	[ ]

Design Pattern Checklist

1. Tool Design

Required Patterns:

Tools have clear input/output schemas
Tool errors return structured error responses
Tools are stateless (no side effects on agent state)
Tool timeouts are configured
Tools validate inputs before execution

BAD:

def search(query):
    return requests.get(f"https://api.com?q={query}").json()

GOOD:

class SearchTool(BaseTool):
    name = "search"
    description = "Search the web for information"

    class InputSchema(BaseModel):
        query: str = Field(..., min_length=1, max_length=500)

    def execute(self, query: str) -> ToolResult:
        try:
            response = self.client.search(query, timeout=10)
            return ToolResult(success=True, data=response)
        except Timeout:
            return ToolResult(success=False, error="Search timed out")
        except Exception as e:
            return ToolResult(success=False, error=str(e))

2. Agent Loop

Required Patterns:

Clear think → act → observe cycle
Maximum iteration limit
Graceful termination conditions
State preserved between iterations
Interrupt/cancel capability

GOOD:

class Agent:
    MAX_ITERATIONS = 10

    async def run(self, task: str) -> AgentResult:
        state = AgentState(task=task)

        for i in range(self.MAX_ITERATIONS):
            if self._should_stop(state):
                break

            # Think
            action = await self.plan(state)

            # Act
            result = await self.execute(action)

            # Observe
            state = self.update_state(state, result)

        return self.finalize(state)

3. Memory Management

Required Patterns:

Conversation history with size limits
Summarization for long conversations
Clear memory lifecycle (create, read, update, delete)
Persistent storage for long-term memory
Vector store for semantic retrieval (if RAG)

Memory Types:

Type	Purpose	Persistence
Working	Current task context	Session
Short-term	Recent conversation	Session
Long-term	User preferences, facts	Persistent
Episodic	Past task summaries	Persistent
Semantic	Embeddings/RAG	Persistent

4. Error Handling

Required Patterns:

Structured error types (not generic exceptions)
Retry with exponential backoff for transient errors
Graceful degradation (fallback behaviors)
Error context preserved for debugging
User-friendly error messages

Error Categories:

Category	Retry	Action
Rate limit	Yes	Exponential backoff
Timeout	Yes	Retry with longer timeout
Auth failure	No	Fail with clear message
Invalid input	No	Return validation error
Tool failure	Maybe	Try alternative tool
Model error	Yes	Retry or fallback model

GOOD:

class AgentError(Exception):
    def __init__(self, message: str, code: str, recoverable: bool = False):
        self.message = message
        self.code = code
        self.recoverable = recoverable

@retry(
    retry=retry_if_exception_type(RateLimitError),
    wait=wait_exponential(multiplier=1, max=60),
    stop=stop_after_attempt(3)
)
async def call_model(self, messages: list) -> str:
    try:
        return await self.client.complete(messages)
    except RateLimitError:
        raise  # Let retry handle it
    except AuthError as e:
        raise AgentError("Authentication failed", "AUTH_ERROR", recoverable=False)

5. State Management

Required Patterns:

Immutable state updates (new state object per update)
State schema validation
State serialization for persistence
Clear state transitions
State versioning for migrations

GOOD:

@dataclass(frozen=True)
class AgentState:
    task: str
    messages: tuple[Message, ...]
    tool_results: tuple[ToolResult, ...]
    iteration: int = 0
    status: Literal["running", "completed", "failed"] = "running"

    def with_message(self, message: Message) -> "AgentState":
        return replace(self, messages=self.messages + (message,))

    def with_tool_result(self, result: ToolResult) -> "AgentState":
        return replace(self, tool_results=self.tool_results + (result,))

6. Multi-Agent Coordination

Patterns (if applicable):

Clear agent roles and responsibilities
Message passing protocol defined
Conflict resolution strategy
Supervisor/orchestrator pattern
Shared state management

Coordination Patterns:

Pattern	Use Case
Supervisor	One agent routes to specialists
Pipeline	Sequential agent processing
Debate	Multiple agents propose, one decides
Swarm	Autonomous agents, shared goals
Hierarchical	Manager → workers structure

7. Prompt Management

Required Patterns:

System prompts externalized (not hardcoded)
Prompt versioning
Variables/templating for dynamic content
Prompt testing/validation
Clear prompt documentation

GOOD:

# prompts/system.yaml
agent_system_prompt:
  version: "1.2"
  template: |
    You are a helpful assistant with access to these tools:
    {% for tool in tools %}
    - {{ tool.name }}: {{ tool.description }}
    {% endfor %}

    Current date: {{ current_date }}
    User preferences: {{ user_prefs }}

8. Observability

Required Patterns:

Logging Checklist:

Event	Log Level	Required Fields
Agent start	INFO	task_id, user_id, task
Tool call	DEBUG	tool_name, inputs, duration
Model call	DEBUG	model, tokens_in, tokens_out, latency
Error	ERROR	error_code, message, stack_trace
Agent complete	INFO	task_id, status, total_duration, total_tokens

GOOD:

logger.info("agent_started", extra={
    "task_id": task_id,
    "user_id": user_id,
    "task_type": task.type,
})

logger.debug("tool_executed", extra={
    "task_id": task_id,
    "tool": tool.name,
    "duration_ms": duration,
    "success": result.success,
})

Configuration Best Practices

Required Configuration

Setting	Type	Description
`model`	string	Model identifier
`max_iterations`	int	Loop limit
`timeout_seconds`	int	Overall timeout
`tool_timeout`	int	Per-tool timeout
`max_tokens`	int	Response limit
`temperature`	float	Model temperature
`retry_attempts`	int	Retry count

Configuration Hierarchy

1. Environment variables (secrets, deployment-specific)
2. Config files (default.yaml, production.yaml)
3. Code defaults (fallbacks only)

GOOD:

class AgentConfig(BaseSettings):
    model: str = "claude-3-sonnet"
    max_iterations: int = 10
    timeout_seconds: int = 300

    class Config:
        env_prefix = "AGENT_"
        env_file = ".env"

Testing Patterns

Test Categories

Type	Coverage	Purpose
Unit	Tools, utilities	Isolated component tests
Integration	Agent + tools	End-to-end flows
Snapshot	Prompts	Detect prompt regressions
Eval	Agent responses	Quality benchmarks

Required Tests

GOOD:

def test_search_tool_timeout():
    tool = SearchTool(timeout=0.001)
    result = tool.execute("test query")
    assert not result.success
    assert "timeout" in result.error.lower()

def test_agent_max_iterations():
    agent = Agent(max_iterations=3)
    # Mock tool that never completes
    agent.tools = [InfiniteLoopTool()]
    result = agent.run("impossible task")
    assert result.iterations == 3
    assert result.status == "max_iterations_reached"

Review Output Format

## Agent Review: [project-name]

### Summary
[1-2 sentence overview]

### Architecture Score

| Category | Score | Notes |
|----------|-------|-------|
| Folder Structure | X/5 | |
| Tool Design | X/5 | |
| Agent Loop | X/5 | |
| Memory Management | X/5 | |
| Error Handling | X/5 | |
| State Management | X/5 | |
| Observability | X/5 | |
| Testing | X/5 | |
| **Overall** | **X/5** | |

### Critical Issues
- [ ] [Issue] - Location: [file]

### Recommendations
- [ ] [Recommendation] - Priority: [High/Medium/Low]

### Strengths
- [What the implementation does well]

Anti-Patterns to Flag

Anti-Pattern	Problem	Fix
God Agent	Single agent does everything	Split by responsibility
Infinite Loop	No termination condition	Add max iterations
Silent Failures	Errors swallowed	Structured error handling
Hardcoded Prompts	Prompts in code	Externalize to files
No Observability	Can't debug production	Add structured logging
Mutable State	Race conditions, bugs	Immutable state updates
No Timeouts	Hanging requests	Configure all timeouts
Missing Validation	Invalid inputs accepted	Schema validation

agent-review

Agent Implementation Review

When to Use This Skill

Review Process

Folder Structure Best Practices

Recommended Agent Project Structure

Structure Checklist

Design Pattern Checklist

1. Tool Design

2. Agent Loop

3. Memory Management

4. Error Handling

5. State Management

6. Multi-Agent Coordination

7. Prompt Management

8. Observability

Configuration Best Practices

Required Configuration

Configuration Hierarchy

Testing Patterns

Test Categories

Required Tests

Review Output Format

Anti-Patterns to Flag

References

More from igbuend/grimbard

tikz

latex

pgfplots

biblatex

ethical-hacking-ethics

codebase-discovery