multi-agent-patterns
Multi-Agent Architecture Patterns
Multi-agent architectures distribute work across multiple LLM instances, each with its own context window. The critical insight: sub-agents exist primarily to isolate context, not to anthropomorphize role division.
Why Multi-Agent?
Context Bottleneck: Single agents fill context with history, documents, and tool outputs. Performance degrades via lost-in-middle effect and attention scarcity.
Token Economics:
| Architecture | Token Multiplier |
|---|---|
| Single agent chat | 1× baseline |
| Single agent + tools | ~4× baseline |
| Multi-agent system | ~15× baseline |
Parallelization: Research tasks can search multiple sources simultaneously. Total time approaches longest subtask, not sum.
Architectural Patterns
Pattern 1: Supervisor/Orchestrator
User Query -> Supervisor -> [Specialist, Specialist] -> Aggregation -> Output
Use when: Clear decomposition, coordination needed, human oversight important.
The Telephone Game Problem: Supervisors paraphrase sub-agent responses incorrectly.
Fix: forward_message tool lets sub-agents respond directly:
def forward_message(message: str, to_user: bool = True):
"""Forward sub-agent response directly to user."""
if to_user:
return {"type": "direct_response", "content": message}
Pattern 2: Peer-to-Peer/Swarm
def transfer_to_agent_b():
return agent_b # Handoff via function return
agent_a = Agent(name="Agent A", functions=[transfer_to_agent_b])
Use when: Flexible exploration, rigid planning counterproductive, emergent requirements.
Pattern 3: Hierarchical
Strategy Layer -> Planning Layer -> Execution Layer
Use when: Large-scale projects, enterprise workflows, clear separation of concerns.
Context Isolation
Primary purpose of multi-agent: context isolation.
Mechanisms:
- Full context delegation: Complex tasks needing full understanding
- Instruction passing: Simple, well-defined subtasks
- File system memory: Shared state without context bloat
Consensus and Coordination
Weighted Voting: Weight by confidence or expertise.
Debate Protocols: Agents critique each other's outputs. Adversarial critique often yields higher accuracy than collaborative consensus.
Trigger-Based Intervention:
- Stall triggers: No progress detection
- Sycophancy triggers: Mimicking without reasoning
Failure Modes
| Failure | Mitigation |
|---|---|
| Supervisor Bottleneck | Output schema constraints, checkpointing |
| Coordination Overhead | Clear handoff protocols, batch results |
| Divergence | Objective boundaries, convergence checks |
| Error Propagation | Output validation, retry with circuit breakers |
Example: Research Team
Supervisor
├── Researcher (web search, document retrieval)
├── Analyzer (data analysis, statistics)
├── Fact-checker (verification, validation)
└── Writer (report generation)
Best Practices
- Design for context isolation as primary benefit
- Choose pattern based on coordination needs, not org metaphor
- Implement explicit handoff protocols with state passing
- Use weighted voting or debate for consensus
- Monitor for supervisor bottlenecks
- Validate outputs before passing between agents
- Set time-to-live limits to prevent infinite loops