multi-agent-patterns
Multi-Agent Patterns Skill
Overview
This skill addresses multi-agent system design, covering scenarios where supervisor patterns, swarm architectures, or agent coordination strategies are needed. Core insight: "Sub-agents exist primarily to isolate context, not to anthropomorphize role division."
Quick Start
- Identify need - Why multiple agents? (context limits, parallelism, specialization)
- Choose pattern - Supervisor, peer-to-peer, or hierarchical
- Design communication - Message passing, handoffs, state sharing
- Implement safeguards - Validation, timeouts, conflict resolution
- Monitor - Token usage, bottlenecks, failures
When to Use
- Context window limits prevent single-agent solutions
- Tasks benefit from parallel execution
- Different domains require specialized knowledge
- Complex workflows need coordination
- Resilience through redundancy is required
Three Primary Patterns
1. Supervisor/Orchestrator
Structure: Central coordinator delegates to specialists and synthesizes results.
[Supervisor]
/ | \
[Agent A] [Agent B] [Agent C]
↑ ↑ ↑
└────────┴─────────┘
Results flow up
Best for:
- Tasks with clear decomposition
- Human oversight needs
- Sequential dependencies
- Quality control requirements
Key consideration: The "telephone game problem" emerges when supervisors paraphrase sub-agent responses incorrectly.
Solution: Implement forward_message tool enabling direct sub-agent-to-user communication:
def forward_message(agent_id: str, message: str, to: str = "user"):
"""Forward agent message directly without supervisor interpretation."""
return {"from": agent_id, "message": message, "forwarded": True}
2. Peer-to-Peer/Swarm
Structure: No central control; agents communicate directly through protocols.
[Agent A] ←→ [Agent B]
↑↓ ↑↓
[Agent C] ←→ [Agent D]
Best for:
- Flexible exploration
- Emergent problem-solving
- Parallel processing
- Resilient architectures
Key requirements:
- Predefined communication protocols
- Explicit handoff mechanisms
- Shared state management
- Conflict resolution rules
3. Hierarchical
Structure: Layers of agents with strategy, planning, and execution tiers.
[Strategy Layer]
↓
[Planning Layer]
/ | \
[Exec A] [Exec B] [Exec C]
Best for:
- Complex organizational workflows
- Multi-level abstraction
- Clear separation of concerns
- Enterprise-scale systems
Layer responsibilities:
- Strategy: Goals, priorities, resource allocation
- Planning: Task decomposition, scheduling, coordination
- Execution: Actual work, reporting, feedback
Token Economics
Reality check: Multi-agent systems consume ~15x baseline tokens compared to single-agent approaches.
| Approach | Token Multiplier | Use Case |
|---|---|---|
| Single Agent | 1x | Simple, focused tasks |
| 2-3 Agents | 3-5x | Moderate complexity |
| Full Swarm | 10-20x | Complex, parallel work |
Optimization strategies:
- Model selection often provides larger gains than more agents
- Use smaller models for routine tasks
- Reserve large models for synthesis and decisions
- Implement aggressive context compression
Communication Patterns
Message Passing
class AgentMessage:
sender: str
recipient: str
content: str
message_type: Literal["request", "response", "broadcast"]
requires_ack: bool = False
Handoff Protocol
class Handoff:
from_agent: str
to_agent: str
context: dict # Compressed relevant state
task: str
expected_output: str
timeout_seconds: int = 300
State Sharing
class SharedState:
version: int
last_updated: datetime
data: dict
lock_holder: Optional[str] = None
def acquire_lock(self, agent_id: str) -> bool: ...
def release_lock(self, agent_id: str) -> bool: ...
def update(self, agent_id: str, changes: dict) -> bool: ...
Implementation Guidance
Validation Requirements
- Validate outputs before inter-agent transfer
- Check message format and completeness
- Verify agent capabilities before assignment
- Validate state consistency after updates
Consensus Mechanisms
| Mechanism | Description | Best For |
|---|---|---|
| Simple Majority | >50% agreement | Quick decisions |
| Weighted Voting | Votes weighted by confidence | Quality-sensitive |
| Quorum | Minimum respondents required | Fault tolerance |
| Leader Election | Designated decision maker | Speed |
Recommendation: Implement weighted voting rather than simple majority:
def weighted_consensus(votes: List[Vote]) -> Decision:
weighted_sum = sum(v.confidence * v.value for v in votes)
total_weight = sum(v.confidence for v in votes)
return Decision(value=weighted_sum / total_weight)
Safeguards
-
Execution TTL - Prevent infinite loops:
max_execution_time = 300 # seconds max_iterations = 100 -
Checkpoint Monitoring - Detect supervisor bottlenecks:
checkpoint_interval = 30 # seconds alert_threshold = 3 # missed checkpoints -
Circuit Breaker - Handle cascading failures:
failure_threshold = 3 recovery_timeout = 60 # seconds
Best Practices
Do
- Start with simplest pattern that works
- Define explicit handoff protocols
- Include state management from the start
- Monitor token usage per agent
- Implement graceful degradation
- Log all inter-agent communication
Don't
- Use multi-agent for single-agent problems
- Assume agents will coordinate implicitly
- Ignore token costs during design
- Skip validation between agents
- Create deeply nested hierarchies
- Forget timeout handling
Error Handling
| Error | Cause | Solution |
|---|---|---|
| Agent timeout | Task too complex | Break into subtasks, extend timeout |
| Conflicting outputs | Ambiguous task | Clarify requirements, add validation |
| Lost messages | Network/state issues | Implement acknowledgments, retry |
| Infinite loop | Missing termination | Add TTL, iteration limits |
| Supervisor bottleneck | Too many reports | Add intermediate aggregators |
Metrics
| Metric | Target | Description |
|---|---|---|
| Task completion rate | >95% | Successfully completed tasks |
| Token efficiency | >0.5 | Output value / tokens used |
| Coordination overhead | <30% | Tokens for coordination vs. work |
| Agent utilization | >70% | Active time vs. waiting |
| Error rate | <5% | Failed inter-agent operations |
Pattern Selection Guide
Is context window sufficient?
├── Yes → Single agent
└── No → Are tasks parallelizable?
├── Yes → Can agents work independently?
│ ├── Yes → Peer-to-peer
│ └── No → Supervisor with parallel workers
└── No → Is there clear hierarchy?
├── Yes → Hierarchical
└── No → Supervisor/Orchestrator
Related Skills
- memory-systems - Cross-session persistence
- parallel-dispatch - Concurrent agent execution
- subagent-driven - Task execution pattern
Version History
- 1.0.0 (2026-01-19): Initial release adapted from Agent-Skills-for-Context-Engineering