orchestrating-agents
Orchestrating Agents
This skill enables programmatic invocations for advanced workflows including parallel processing, task delegation, multi-agent analysis, and cost-optimized cross-vendor routing. Supports four CLI backends:
| CLI | Command | Cost Model | Ecosystem Access | Notes |
|---|---|---|---|---|
| Claude Code | claude |
Per-token (Anthropic API) | Native | Default orchestrator |
| Cursor Agent | agent |
Flat subscription | 37 models (GPT-5.x, Claude 4.x, Gemini 3.x, Grok) | Skill discovery buggy — use as leaf executor |
| Gemini CLI | gemini |
Subscription | Full (loads from ~/.agents/skills/) |
Full ecosystem access |
| Codex CLI | codex |
Subscription | Full (scans repo + ~/.codex/skills/) |
Built-in review command |
When to Use This Skill
Primary use cases:
- Parallel sub-tasks: Break complex analysis into simultaneous independent streams
- Multi-perspective analysis: Get 3-5 different expert viewpoints concurrently
- Cost-optimized delegation: Route tasks to the cheapest capable backend
- Cross-vendor workflows: Use Gemini/Codex for T2 tasks, Claude for T3 validation
- Recursive workflows: Orchestrator coordinating multiple API instances
- High-volume processing: Batch process multiple items concurrently
Trigger patterns:
- "Parallel analysis", "multi-perspective review", "concurrent processing"
- "Delegate subtasks", "coordinate multiple agents"
- "Optimize token cost", "use cheaper model for this"
- "Run analyses from different perspectives"
Cost-Optimized Tier Routing
Route tasks to the cheapest backend that can handle them. See I14-MATO charter for full tier classification.
Tier Model
| Tier | Label | Route To | Use When |
|---|---|---|---|
| T1 | Mechanical | Local scripts, linters, tsc |
Deterministic: formatting, validation, file search |
| T2 | Analytical | gemini, codex, agent, claude --model haiku |
Pattern-following: docs review, style checks, boilerplate, summarization |
| T3 | Strategic | claude --model sonnet or claude --model opus |
Novel judgment: architecture, TDD coaching, security analysis |
Delegation Decision
Is this deterministic? ──yes──► T1 (local script)
│ no
Does it follow established patterns? ──yes──► T2 (gemini/codex/agent/haiku)
│ no
Requires novel judgment? ──yes──► T3 (claude sonnet/opus)
Validation Sandwich Pattern
Cheap agents generate, expensive agents validate. This preserves quality while reducing cost.
T2 Agent (cheap) T3 Agent (expensive)
│ │
│ Produce work │
│────────────────────────►│
│ │ Validate work
│ │ (cheaper than
│ │ produce + validate)
│ Accept / Reject │
│◄────────────────────────│
Works because validation processes only the diff/findings, not the full reasoning chain.
CLI Quick Reference
Claude Code (claude)
# Non-interactive with model selection
claude -p "analyze this code" --model sonnet
claude -p "quick summary" --model haiku
claude -p "complex architecture review" --model opus
# With budget cap
claude -p "task" --model haiku --max-budget-usd 0.05
# JSON output for programmatic consumption
claude -p "list issues" --model sonnet --output-format json
Key flags: -p (print/non-interactive), --model (haiku/sonnet/opus), --max-budget-usd, --output-format (text/json/stream-json).
Cursor Agent (agent)
# Non-interactive with model selection (37 models available)
agent -p "review this diff for style" --model gpt-5.3-codex
agent -p "generate test data" --model gemini-3-flash
agent -p "summarize changes" --model sonnet-4.6
# List available models
agent --list-models
# Read-only modes
agent -p "explain this function" --mode ask
agent -p "propose a refactoring plan" --mode plan
Key flags: -p (print/non-interactive), --model, --mode (plan/ask), --output-format (text/json/stream-json), --force/--yolo (auto-approve).
Known limitations (as of 2026-02): Cannot consistently find user skills. Missing Task tool for subagent dispatch. Use as leaf executor only — embed all context in prompt. Requires --trust flag for non-interactive use. Subject to monthly Pro usage limits — all models rate-limited when quota exhausted.
Gemini CLI (gemini)
# Non-interactive
gemini -p "summarize this file" < file.ts
gemini -p "generate REST endpoint boilerplate" -m gemini-2.5-pro
# With output format
gemini -p "list code issues" -o json
# Auto-approve mode (for trusted operations)
gemini -p "format this code" --approval-mode yolo
Key flags: -p/--prompt (non-interactive), -m/--model, -o/--output-format (text/json/stream-json), --approval-mode (default/auto_edit/yolo/plan).
Ecosystem access: Loads skills from ~/.agents/skills/ and ~/.gemini/skills/. Reads AGENTS.md. Full agent/skill/command discovery confirmed.
Auth note: Requires interactive terminal for OAuth consent on first use. Pre-authenticate by running gemini interactively before using in non-interactive delegation. Cannot be invoked non-interactively from another agent subprocess without prior auth.
Codex CLI (codex)
# Non-interactive execution
codex exec "summarize this code" < file.ts
# Built-in code review
codex exec review
# With model and sandbox control
codex exec -m o3 "generate test data" --sandbox read-only
# Full auto (sandboxed, no approval needed)
codex exec --full-auto "format and lint this file"
Key flags: exec (non-interactive), review (built-in code review), -m/--model, --sandbox (read-only/workspace-write/danger-full-access), --full-auto.
Ecosystem access: Scans repo for agents/skills/commands. Loads skills from ~/.codex/skills/. Full discovery confirmed.
Cross-Vendor Delegation Examples
Example 1: Docs Review — T2 First Pass + T3 Validation
# T2: Gemini does structural/grammar check (subscription cost)
gemini -p "Review this document for spelling, grammar, broken links, and structural completeness. Output a findings list." < README.md > /tmp/docs-t2-findings.txt
# T3: Claude validates content accuracy (per-token cost, but smaller input)
claude -p "Review this document for content accuracy, API correctness, and conceptual clarity. Also review these T2 findings and filter out false positives: $(cat /tmp/docs-t2-findings.txt)" --model sonnet < README.md
Example 2: Code Review — Style Pass + Architecture Pass
# T2: Codex does style/lint review (subscription cost, has built-in review)
codex exec review > /tmp/codex-review.txt
# T3: Claude does architectural review with Codex findings as context
git diff HEAD~1 | claude -p "Review this diff for architectural concerns, subtle bugs, and design pattern issues. Codex already found these style issues (ignore duplicates): $(cat /tmp/codex-review.txt)" --model sonnet
Example 3: Research Delegation
# T2: Gemini gathers raw research (subscription cost)
gemini -p "Research the latest best practices for TypeScript monorepo tooling in 2026. List tools, trade-offs, and recommendations with links." -o text > /tmp/research-raw.txt
# T3: Claude synthesizes and evaluates (per-token, but smaller focused input)
claude -p "Evaluate this research for our project. Given our constraints (TypeScript strict, Vitest, pnpm), which recommendation fits best and why? Research: $(cat /tmp/research-raw.txt)" --model sonnet
Example 4: Cursor Agent as Multi-Model Proxy
# Use Cursor's flat subscription to access GPT-5.3 for free
agent -p "Generate 20 realistic test user records as JSON with name, email, role, created_at" --model gpt-5.3-codex
# Use Cursor to access Gemini 3 Flash for summarization
agent -p "Summarize the key changes in this diff" --model gemini-3-flash < diff.txt
# Use Cursor to access Grok for a different perspective
agent -p "What are the security implications of this code?" --model grok < auth.ts
Programmatic Invocation (Python)
Single Invocation
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent / "skills" / "orchestrating-agents" / "scripts"))
from claude_client import invoke_claude
response = invoke_claude(
prompt="Analyze this code for security vulnerabilities: ...",
)
print(response)
Parallel Multi-Perspective Analysis
from claude_client import invoke_parallel
prompts = [
{
"prompt": "Analyze from security perspective: ...",
"system": "You are a security expert"
},
{
"prompt": "Analyze from performance perspective: ...",
"system": "You are a performance optimization expert"
},
{
"prompt": "Analyze from maintainability perspective: ...",
"system": "You are a software architecture expert"
}
]
results = invoke_parallel(prompts)
for i, result in enumerate(results):
print(f"\n=== Perspective {i+1} ===")
print(result)
Parallel with Shared Context
from claude_client import invoke_parallel
base_context = """
<codebase>
...large codebase or documentation...
</codebase>
"""
prompts = [
{"prompt": "Find security vulnerabilities in the authentication module"},
{"prompt": "Identify performance bottlenecks in the API layer"},
{"prompt": "Suggest refactoring opportunities in the database layer"}
]
results = invoke_parallel(prompts, shared_system=base_context)
Multi-Turn Conversation
from claude_client import ConversationThread
agent = ConversationThread(
system="You are a code refactoring expert with access to the codebase",
cache_system=True
)
response1 = agent.send("Analyze the UserAuth class for issues")
response2 = agent.send("How would you refactor the login method?")
response3 = agent.send("Show me the refactored code")
Streaming and Interruptible Operations
from claude_client import invoke_claude_streaming, invoke_parallel_interruptible, InterruptToken
# Streaming
response = invoke_claude_streaming(
"Write a comprehensive security analysis...",
callback=lambda chunk: print(chunk, end='', flush=True)
)
# Interruptible parallel
token = InterruptToken()
results = invoke_parallel_interruptible(prompts=[...], interrupt_token=token)
# Call token.interrupt() to cancel
Core Functions
invoke_claude()
invoke_claude(
prompt: str | list[dict],
model: str = "claude-sonnet-4-5-20250929",
system: str | list[dict] | None = None,
max_tokens: int = 4096,
temperature: float = 1.0,
streaming: bool = False,
cache_system: bool = False,
cache_prompt: bool = False,
messages: list[dict] | None = None,
**kwargs
) -> str
Note: cache_system, cache_prompt ignored for CLI backends. Optional timeout in kwargs (default 300).
invoke_parallel()
invoke_parallel(
prompts: list[dict],
max_workers: int = 5,
shared_system: str | list[dict] | None = None,
) -> list[str]
ConversationThread
thread = ConversationThread(system="...", cache_system=True)
response = thread.send("message")
Methods: send(), get_messages(), clear(), __len__().
Setup
Prerequisites: Install at least one supported CLI. All four are recommended for cost optimization.
# Claude Code (default orchestrator)
npm install -g @anthropic-ai/claude-code
claude login
# Cursor Agent (flat-rate multi-model proxy)
# Visit https://cursor.com/install for installation instructions
agent login
# Gemini CLI
npm install -g @anthropic-ai/gemini-cli # or via Google's install method
gemini # auth via browser on first run
# Codex CLI
npm install -g @openai/codex
codex login
Verify all backends:
claude -p "Reply OK" --output-format text
agent -p "Reply OK" --output-format text
gemini -p "Reply OK" -o text
codex exec "Reply OK"
Auto-detection: The Python client tries claude → codex → gemini → cursor in order. To force a backend, use invoke_cli(prompt, backend="codex") or any registered backend key.
Pre-flight Backend Check
Verify backend availability once and cache for 1 hour. Use at session start or before dispatch.
from preflight import check_backends
available = check_backends() # {"claude": True, "codex": True, "gemini": False, "cursor": False}
available = check_backends(force=True) # bypass cache, re-probe all
- Probes via
shutil.which(fast, no subprocess) - Cache stored in
/tmp/preflight_backends_cache.jsonwith owner-only permissions - Symlink-safe writes (checks for symlinks before writing)
- Best-effort: cache write failures are silently ignored
/dispatch Command
Route tasks to the cheapest capable backend via /dispatch <task description>. See commands/dispatch/dispatch.md.
Classification: T1 (local scripts) → T2 (gemini/codex) → T3 (claude sonnet/opus).
Fallback chain: gemini → codex → claude-haiku → claude-sonnet.
Cross-Vendor Telemetry
Best-effort invocation telemetry via telemetry_helper.py. POSTs events to Tinybird when configured.
from telemetry_helper import emit_invocation
emit_invocation(backend="codex", prompt_length=150, duration_ms=1200, success=True)
Requires env vars: TB_INGEST_TOKEN, TB_INGEST_URL (HTTPS only — silently refuses HTTP).
Payload: backend, prompt_length, duration_ms, success, timestamp.
Integrated into invoke_cli: Telemetry emits on all exit paths (success, non-zero exit, timeout, FileNotFoundError) to avoid survivorship bias.
Error Handling
from claude_client import invoke_claude, ClaudeInvocationError
try:
response = invoke_claude("Your prompt here")
except ClaudeInvocationError as e:
print(f"API Error: {e}")
print(f"Status: {e.status_code}")
print(f"Details: {e.details}")
except ValueError as e:
print(f"Configuration Error: {e}")
Common errors:
- No CLI found: Install the relevant CLI (see Setup)
- Not authenticated: Run login for the relevant CLI
- Timeout: Increase
timeoutin kwargs (default 300s) - Non-zero exit: Check stderr on
ClaudeInvocationError
Best Practices
- Route by tier — Use the cheapest backend that can handle the task
- Validate with T3 — Cheap agents generate, Claude validates (validation sandwich)
- Use parallel invocations for independent tasks only — Don't parallelize sequential dependencies
- Set appropriate system prompts — Define clear roles/expertise for each instance
- Embed context for Cursor — Cursor can't reliably discover skills; pass full context in prompt
- Test with small batches first — Verify prompts work before scaling
- Prefer
codex exec review— Built-in code review is free with subscription
See Also
- references/api-reference.md — Detailed Python API documentation
- references/workflows.md — Multi-expert review, parallel analysis, recursive delegation
- I14-MATO Charter (under
{CANONICAL_ROOT}/charters/, per/docs/layout) — Full tier classification and roadmap - Claude Code CLI — Official Claude Code docs
- Cursor Agent CLI — Official Cursor CLI docs
More from arielperez82/agents-and-skills
algorithmic-art
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
16problem-solving
Apply systematic problem-solving techniques for complexity spirals (simplification
15markdown-syntax-fundamentals
Use when writing or editing markdown files. Covers headings, text formatting, lists, links, images, code blocks, and blockquotes.
15research
Use when you need to research, analyze, and plan technical solutions that are scalable, secure, and maintainable.
14brainstorming
You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation.
14docs-seeker
Search technical documentation using executable scripts to detect query
14