orchestrating-agents
Orchestrating Agents
This skill enables programmatic invocations for advanced workflows including parallel processing, task delegation, multi-agent analysis, and cost-optimized cross-vendor routing. Supports four CLI backends:
| CLI | Command | Cost Model | Ecosystem Access | Notes |
|---|---|---|---|---|
| Claude Code | claude |
Per-token (Anthropic API) | Native | Default orchestrator |
| Cursor Agent | agent |
Flat subscription | 37 models (GPT-5.x, Claude 4.x, Gemini 3.x, Grok) | Skill discovery buggy — use as leaf executor |
| Gemini CLI | gemini |
Subscription | Full (loads from ~/.agents/skills/) |
Full ecosystem access |
| Codex CLI | codex |
Subscription | Full (scans repo + ~/.codex/skills/) |
Built-in review command |
When to Use This Skill
Primary use cases:
- Parallel sub-tasks: Break complex analysis into simultaneous independent streams
- Multi-perspective analysis: Get 3-5 different expert viewpoints concurrently
- Cost-optimized delegation: Route tasks to the cheapest capable backend
- Cross-vendor workflows: Use Gemini/Codex for T2 tasks, Claude for T3 validation
- Recursive workflows: Orchestrator coordinating multiple API instances
- High-volume processing: Batch process multiple items concurrently
Trigger patterns:
- "Parallel analysis", "multi-perspective review", "concurrent processing"
- "Delegate subtasks", "coordinate multiple agents"
- "Optimize token cost", "use cheaper model for this"
- "Run analyses from different perspectives"
Cost-Optimized Tier Routing
Route tasks to the cheapest backend that can handle them. See I14-MATO charter for full tier classification.
Tier Model
| Tier | Label | Route To | Use When |
|---|---|---|---|
| T1 | Mechanical | Local scripts, linters, tsc |
Deterministic: formatting, validation, file search |
| T2 | Analytical | gemini, codex, agent, claude --model haiku |
Pattern-following: docs review, style checks, boilerplate, summarization |
| T3 | Strategic | claude --model sonnet or claude --model opus |
Novel judgment: architecture, TDD coaching, security analysis |
Delegation Decision
Is this deterministic? ──yes──► T1 (local script)
│ no
Does it follow established patterns? ──yes──► T2 (gemini/codex/agent/haiku)
│ no
Requires novel judgment? ──yes──► T3 (claude sonnet/opus)
Validation Sandwich Pattern
Cheap agents generate, expensive agents validate. This preserves quality while reducing cost.
T2 Agent (cheap) T3 Agent (expensive)
│ │
│ Produce work │
│────────────────────────►│
│ │ Validate work
│ │ (cheaper than
│ │ produce + validate)
│ Accept / Reject │
│◄────────────────────────│
Works because validation processes only the diff/findings, not the full reasoning chain.
CLI Quick Reference
Claude Code (claude)
# Non-interactive with model selection
claude -p "analyze this code" --model sonnet
claude -p "quick summary" --model haiku
claude -p "complex architecture review" --model opus
# With budget cap
claude -p "task" --model haiku --max-budget-usd 0.05
# JSON output for programmatic consumption
claude -p "list issues" --model sonnet --output-format json
Key flags: -p (print/non-interactive), --model (haiku/sonnet/opus), --max-budget-usd, --output-format (text/json/stream-json).
Cursor Agent (agent)
# Non-interactive with model selection (37 models available)
agent -p "review this diff for style" --model gpt-5.3-codex
agent -p "generate test data" --model gemini-3-flash
agent -p "summarize changes" --model sonnet-4.6
# List available models
agent --list-models
# Read-only modes
agent -p "explain this function" --mode ask
agent -p "propose a refactoring plan" --mode plan
Key flags: -p (print/non-interactive), --model, --mode (plan/ask), --output-format (text/json/stream-json), --force/--yolo (auto-approve).
Known limitations (as of 2026-02): Cannot consistently find user skills. Missing Task tool for subagent dispatch. Use as leaf executor only — embed all context in prompt. Requires --trust flag for non-interactive use. Subject to monthly Pro usage limits — all models rate-limited when quota exhausted.
Gemini CLI (gemini)
# Non-interactive
gemini -p "summarize this file" < file.ts
gemini -p "generate REST endpoint boilerplate" -m gemini-2.5-pro
# With output format
gemini -p "list code issues" -o json
# Auto-approve mode (for trusted operations)
gemini -p "format this code" --approval-mode yolo
Key flags: -p/--prompt (non-interactive), -m/--model, -o/--output-format (text/json/stream-json), --approval-mode (default/auto_edit/yolo/plan).
Ecosystem access: Loads skills from ~/.agents/skills/ and ~/.gemini/skills/. Reads AGENTS.md. Full agent/skill/command discovery confirmed.
Auth note: Requires interactive terminal for OAuth consent on first use. Pre-authenticate by running gemini interactively before using in non-interactive delegation. Cannot be invoked non-interactively from another agent subprocess without prior auth.
Codex CLI (codex)
# Non-interactive execution
codex exec "summarize this code" < file.ts
# Built-in code review
codex exec review
# With model and sandbox control
codex exec -m o3 "generate test data" --sandbox read-only
# Full auto (sandboxed, no approval needed)
codex exec --full-auto "format and lint this file"
Key flags: exec (non-interactive), review (built-in code review), -m/--model, --sandbox (read-only/workspace-write/danger-full-access), --full-auto.
Ecosystem access: Scans repo for agents/skills/commands. Loads skills from ~/.codex/skills/. Full discovery confirmed.
Cross-Vendor Delegation Examples
Example 1: Docs Review — T2 First Pass + T3 Validation
# T2: Gemini does structural/grammar check (subscription cost)
gemini -p "Review this document for spelling, grammar, broken links, and structural completeness. Output a findings list." < README.md > /tmp/docs-t2-findings.txt
# T3: Claude validates content accuracy (per-token cost, but smaller input)
claude -p "Review this document for content accuracy, API correctness, and conceptual clarity. Also review these T2 findings and filter out false positives: $(cat /tmp/docs-t2-findings.txt)" --model sonnet < README.md
Example 2: Code Review — Style Pass + Architecture Pass
# T2: Codex does style/lint review (subscription cost, has built-in review)
codex exec review > /tmp/codex-review.txt
# T3: Claude does architectural review with Codex findings as context
git diff HEAD~1 | claude -p "Review this diff for architectural concerns, subtle bugs, and design pattern issues. Codex already found these style issues (ignore duplicates): $(cat /tmp/codex-review.txt)" --model sonnet
Example 3: Research Delegation
# T2: Gemini gathers raw research (subscription cost)
gemini -p "Research the latest best practices for TypeScript monorepo tooling in 2026. List tools, trade-offs, and recommendations with links." -o text > /tmp/research-raw.txt
# T3: Claude synthesizes and evaluates (per-token, but smaller focused input)
claude -p "Evaluate this research for our project. Given our constraints (TypeScript strict, Vitest, pnpm), which recommendation fits best and why? Research: $(cat /tmp/research-raw.txt)" --model sonnet
Example 4: Cursor Agent as Multi-Model Proxy
# Use Cursor's flat subscription to access GPT-5.3 for free
agent -p "Generate 20 realistic test user records as JSON with name, email, role, created_at" --model gpt-5.3-codex
# Use Cursor to access Gemini 3 Flash for summarization
agent -p "Summarize the key changes in this diff" --model gemini-3-flash < diff.txt
# Use Cursor to access Grok for a different perspective
agent -p "What are the security implications of this code?" --model grok < auth.ts
Programmatic Invocation (Python)
Single Invocation
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent / "skills" / "orchestrating-agents" / "scripts"))
from claude_client import invoke_claude
response = invoke_claude(
prompt="Analyze this code for security vulnerabilities: ...",
)
print(response)
Parallel Multi-Perspective Analysis
from claude_client import invoke_parallel
prompts = [
{
"prompt": "Analyze from security perspective: ...",
"system": "You are a security expert"
},
{
"prompt": "Analyze from performance perspective: ...",
"system": "You are a performance optimization expert"
},
{
"prompt": "Analyze from maintainability perspective: ...",
"system": "You are a software architecture expert"
}
]
results = invoke_parallel(prompts)
for i, result in enumerate(results):
print(f"\n=== Perspective {i+1} ===")
print(result)
Parallel with Shared Context
from claude_client import invoke_parallel
base_context = """
<codebase>
...large codebase or documentation...
</codebase>
"""
prompts = [
{"prompt": "Find security vulnerabilities in the authentication module"},
{"prompt": "Identify performance bottlenecks in the API layer"},
{"prompt": "Suggest refactoring opportunities in the database layer"}
]
results = invoke_parallel(prompts, shared_system=base_context)
Multi-Turn Conversation
from claude_client import ConversationThread
agent = ConversationThread(
system="You are a code refactoring expert with access to the codebase",
cache_system=True
)
response1 = agent.send("Analyze the UserAuth class for issues")
response2 = agent.send("How would you refactor the login method?")
response3 = agent.send("Show me the refactored code")
Streaming and Interruptible Operations
from claude_client import invoke_claude_streaming, invoke_parallel_interruptible, InterruptToken
# Streaming
response = invoke_claude_streaming(
"Write a comprehensive security analysis...",
callback=lambda chunk: print(chunk, end='', flush=True)
)
# Interruptible parallel
token = InterruptToken()
results = invoke_parallel_interruptible(prompts=[...], interrupt_token=token)
# Call token.interrupt() to cancel
Core Functions
invoke_claude()
invoke_claude(
prompt: str | list[dict],
model: str = "claude-sonnet-4-5-20250929",
system: str | list[dict] | None = None,
max_tokens: int = 4096,
temperature: float = 1.0,
streaming: bool = False,
cache_system: bool = False,
cache_prompt: bool = False,
messages: list[dict] | None = None,
**kwargs
) -> str
Note: cache_system, cache_prompt ignored for CLI backends. Optional timeout in kwargs (default 300).
invoke_parallel()
invoke_parallel(
prompts: list[dict],
max_workers: int = 5,
shared_system: str | list[dict] | None = None,
) -> list[str]
ConversationThread
thread = ConversationThread(system="...", cache_system=True)
response = thread.send("message")
Methods: send(), get_messages(), clear(), __len__().
Setup
Prerequisites: Install at least one supported CLI. All four are recommended for cost optimization.
# Claude Code (default orchestrator)
npm install -g @anthropic-ai/claude-code
claude login
# Cursor Agent (flat-rate multi-model proxy)
# Visit https://cursor.com/install for installation instructions
agent login
# Gemini CLI
npm install -g @anthropic-ai/gemini-cli # or via Google's install method
gemini # auth via browser on first run
# Codex CLI
npm install -g @openai/codex
codex login
Verify all backends:
claude -p "Reply OK" --output-format text
agent -p "Reply OK" --output-format text
gemini -p "Reply OK" -o text
codex exec "Reply OK"
Auto-detection: The Python client tries claude → codex → gemini → cursor in order. To force a backend, use invoke_cli(prompt, backend="codex") or any registered backend key.
Pre-flight Backend Check
Verify backend availability once and cache for 1 hour. Use at session start or before dispatch.
from preflight import check_backends
available = check_backends() # {"claude": True, "codex": True, "gemini": False, "cursor": False}
available = check_backends(force=True) # bypass cache, re-probe all
- Probes via
shutil.which(fast, no subprocess) - Cache stored in
/tmp/preflight_backends_cache.jsonwith owner-only permissions - Symlink-safe writes (checks for symlinks before writing)
- Best-effort: cache write failures are silently ignored
/dispatch Command
Route tasks to the cheapest capable backend via /dispatch <task description>. See commands/dispatch/dispatch.md.
Classification: T1 (local scripts) → T2 (gemini/codex) → T3 (claude sonnet/opus).
Fallback chain: gemini → codex → claude-haiku → claude-sonnet.
Cross-Vendor Telemetry
Best-effort invocation telemetry via telemetry_helper.py. POSTs events to Tinybird when configured.
from telemetry_helper import emit_invocation
emit_invocation(backend="codex", prompt_length=150, duration_ms=1200, success=True)
Requires env vars: TB_INGEST_TOKEN, TB_INGEST_URL (HTTPS only — silently refuses HTTP).
Payload: backend, prompt_length, duration_ms, success, timestamp.
Integrated into invoke_cli: Telemetry emits on all exit paths (success, non-zero exit, timeout, FileNotFoundError) to avoid survivorship bias.
Error Handling
from claude_client import invoke_claude, ClaudeInvocationError
try:
response = invoke_claude("Your prompt here")
except ClaudeInvocationError as e:
print(f"API Error: {e}")
print(f"Status: {e.status_code}")
print(f"Details: {e.details}")
except ValueError as e:
print(f"Configuration Error: {e}")
Common errors:
- No CLI found: Install the relevant CLI (see Setup)
- Not authenticated: Run login for the relevant CLI
- Timeout: Increase
timeoutin kwargs (default 300s) - Non-zero exit: Check stderr on
ClaudeInvocationError
Best Practices
- Route by tier — Use the cheapest backend that can handle the task
- Validate with T3 — Cheap agents generate, Claude validates (validation sandwich)
- Use parallel invocations for independent tasks only — Don't parallelize sequential dependencies
- Set appropriate system prompts — Define clear roles/expertise for each instance
- Embed context for Cursor — Cursor can't reliably discover skills; pass full context in prompt
- Test with small batches first — Verify prompts work before scaling
- Prefer
codex exec review— Built-in code review is free with subscription
See Also
- references/api-reference.md — Detailed Python API documentation
- references/workflows.md — Multi-expert review, parallel analysis, recursive delegation
- I14-MATO Charter (under
{CANONICAL_ROOT}/charters/, per/docs/layout) — Full tier classification and roadmap - Claude Code CLI — Official Claude Code docs
- Cursor Agent CLI — Official Cursor CLI docs