model-routing
Model Routing Intelligence
Select the right Claude model for each task to optimize the cost/quality tradeoff.
Goal
Eliminate wasted spend by routing tasks to the cheapest model that produces acceptable quality, while ensuring complex tasks get the reasoning depth they need.
Decision Matrix
Task → Model mapping
| Task Type | Recommended Model | Reasoning |
|---|---|---|
| Architecture decisions | Opus 4.6 | Needs deep multi-step reasoning, hidden coupling detection |
| Complex debugging | Opus 4.6 | Root cause analysis requires holding many hypotheses |
| Security review | Opus 4.6 | Must not miss subtle vulnerabilities |
| Standard implementation | Sonnet 4.6 | Best balance of speed, quality, and cost for code generation |
| Code review | Sonnet 4.6 | Good pattern recognition at reasonable cost |
| Refactoring | Sonnet 4.6 | Mechanical transformations with quality checks |
| Test writing | Sonnet 4.6 | Formulaic but needs understanding of code under test |
| File search / grep | Haiku 4.5 | Simple lookup, no deep reasoning needed |
| Documentation lookup | Haiku 4.5 | Reading and summarizing existing content |
| Commit message generation | Haiku 4.5 | Short, formulaic output |
| Simple Q&A | Haiku 4.5 | Direct answers, no complex analysis |
| Research subagents | Haiku 4.5 | Exploration tasks that return summaries |
Complexity signals
Use these signals to decide when to escalate from Sonnet to Opus:
- Multiple interacting systems or modules
- Non-obvious failure modes
- "Why does this work?" questions
- Tasks where a wrong answer is expensive to fix
- Cross-cutting concerns (auth, caching, observability)
- Migration or backward-compatibility requirements
Use these signals to downgrade from Sonnet to Haiku:
- Single-file changes
- Mechanical transformations (rename, reformat)
- Reading and summarizing (no generation)
- Answering factual questions about code
Cost Tables
Per-token pricing (USD per million tokens)
| Model | Input | Output | Cache Write | Cache Read |
|---|---|---|---|---|
| Opus 4.6 | $15.00 | $75.00 | $18.75 | $1.50 |
| Sonnet 4.6 | $3.00 | $15.00 | $3.75 | $0.30 |
| Haiku 4.5 | $0.80 | $4.00 | $1.00 | $0.08 |
Cost multipliers
| Comparison | Input | Output |
|---|---|---|
| Opus vs Sonnet | 5x | 5x |
| Sonnet vs Haiku | 3.75x | 3.75x |
| Opus vs Haiku | 18.75x | 18.75x |
Typical session costs
| Task | Model | Est. Tokens (in/out) | Est. Cost |
|---|---|---|---|
| Simple bug fix | Sonnet | 50k/10k | ~$0.30 |
| Feature implementation | Sonnet | 200k/50k | ~$1.35 |
| Architecture review | Opus | 200k/30k | ~$5.25 |
| Quick lookup | Haiku | 20k/2k | ~$0.02 |
| Research subagent | Haiku | 80k/10k | ~$0.10 |
| Full code review (council) | Mixed | 500k/100k | ~$3-8 |
Subagent Model Assignment
Orchestration patterns
When using cc-orchestrate or spawning subagents, assign models by role:
Research agents → Haiku (cheap exploration, summary return)
Implementation agents → Sonnet (code generation quality)
Review/audit agents → Sonnet or Opus (depends on risk)
Architecture agents → Opus (deep reasoning required)
Example: builder-validator template
builder agent → Sonnet 4.6 (writes code)
validator agent → Sonnet 4.6 (reviews code)
Example: research-council template
researcher agents (3x) → Haiku 4.5 (parallel exploration)
synthesizer agent → Sonnet 4.6 (combines findings)
Budget Planning
Setting a session budget
Before starting a task, estimate cost:
- Classify the task using the decision matrix above
- Estimate token volume based on file count and task scope
- Calculate cost using the pricing table
- Set model with
/modelorclaude -m
Token estimation rules of thumb
| Content Type | Tokens per Line |
|---|---|
| TypeScript/JavaScript | ~10 |
| Python | ~8 |
| JSON/YAML | ~6 |
| Markdown | ~5 |
| Minified code | ~15 |
Cost control techniques
- Start with Haiku for research, switch to Sonnet for implementation
- Use subagents to isolate expensive research from main context
- Compact early at 60-70% context to avoid expensive re-reads
- Limit tool output — avoid
cat-ing entire large files; use Grep with limits - Batch related tasks to benefit from prompt caching (cache read = 10% of input cost)
- Use
--max-turnsin headless mode to cap automated sessions
Model switching workflow
# Start with research on Haiku
/model claude-haiku-4-5-20251001
# "Find all files related to auth, summarize the architecture"
# Switch to Sonnet for implementation
/model claude-sonnet-4-6
# "Implement the new auth middleware based on the research above"
# Switch to Opus for the tricky part
/model claude-opus-4-6
# "Review the session handling for race conditions and edge cases"
Environment Variables
CLAUDE_MODEL=claude-sonnet-4-6 # Default model for sessions
ANTHROPIC_MODEL=claude-sonnet-4-6 # Alternative env var
Settings Configuration
{
"model": "claude-sonnet-4-6",
"smallFastModel": "claude-haiku-4-5-20251001"
}
The smallFastModel is used for internal operations like skill matching and context compression. Keep it on Haiku for cost efficiency.
Anti-patterns
- Using Opus for everything — 5x the cost of Sonnet with marginal quality improvement on simple tasks
- Using Haiku for complex implementation — saves money but produces lower-quality code that needs more iterations
- Not using subagents — research in main context inflates token count for every subsequent turn
- Re-reading large files — each read costs tokens; anchor important content instead
- Ignoring cache hits — restructure prompts to maximize cache read tokens (10% of input cost)
More from lobbi-docs/claude
design-system
Apply and manage the AI-powered design system with 50+ curated styles
126gcp
Google Cloud Platform services including GKE, Cloud Run, Cloud Storage, BigQuery, and Pub/Sub. Activate for GCP infrastructure, Google Cloud deployment, and GCP integration.
73kanban
Kanban methodology including boards, WIP limits, flow metrics, and continuous delivery. Activate for Kanban boards, workflow visualization, and lean project management.
63debugging
Debugging techniques for Python, JavaScript, and distributed systems. Activate for troubleshooting, error analysis, log investigation, and performance debugging. Includes extended thinking integration for complex debugging scenarios.
59citations-retrieval
Document citations and RAG (Retrieval-Augmented Generation) patterns for Claude. Activate for source attribution, document grounding, citation extraction, and contextual retrieval.
48batch-processing
Message Batches API for Claude with 50% cost savings on bulk processing. Activate for batch jobs, JSONL processing, bulk analysis, and cost optimization.
46