Model Routing Intelligence

Select the right Claude model for each task to optimize the cost/quality tradeoff.

Goal

Eliminate wasted spend by routing tasks to the cheapest model that produces acceptable quality, while ensuring complex tasks get the reasoning depth they need.

Decision Matrix

Task → Model mapping

Task Type	Recommended Model	Reasoning
Architecture decisions	Opus 4.6	Needs deep multi-step reasoning, hidden coupling detection
Complex debugging	Opus 4.6	Root cause analysis requires holding many hypotheses
Security review	Opus 4.6	Must not miss subtle vulnerabilities
Standard implementation	Sonnet 4.6	Best balance of speed, quality, and cost for code generation
Code review	Sonnet 4.6	Good pattern recognition at reasonable cost
Refactoring	Sonnet 4.6	Mechanical transformations with quality checks
Test writing	Sonnet 4.6	Formulaic but needs understanding of code under test
File search / grep	Haiku 4.5	Simple lookup, no deep reasoning needed
Documentation lookup	Haiku 4.5	Reading and summarizing existing content
Commit message generation	Haiku 4.5	Short, formulaic output
Simple Q&A	Haiku 4.5	Direct answers, no complex analysis
Research subagents	Haiku 4.5	Exploration tasks that return summaries

Complexity signals

Use these signals to decide when to escalate from Sonnet to Opus:

Multiple interacting systems or modules
Non-obvious failure modes
"Why does this work?" questions
Tasks where a wrong answer is expensive to fix
Cross-cutting concerns (auth, caching, observability)
Migration or backward-compatibility requirements

Use these signals to downgrade from Sonnet to Haiku:

Single-file changes
Mechanical transformations (rename, reformat)
Reading and summarizing (no generation)
Answering factual questions about code

Cost Tables

Per-token pricing (USD per million tokens)

Model	Input	Output	Cache Write	Cache Read
Opus 4.6	$15.00	$75.00	$18.75	$1.50
Sonnet 4.6	$3.00	$15.00	$3.75	$0.30
Haiku 4.5	$0.80	$4.00	$1.00	$0.08

Cost multipliers

Comparison	Input	Output
Opus vs Sonnet	5x	5x
Sonnet vs Haiku	3.75x	3.75x
Opus vs Haiku	18.75x	18.75x

Typical session costs

Task	Model	Est. Tokens (in/out)	Est. Cost
Simple bug fix	Sonnet	50k/10k	~$0.30
Feature implementation	Sonnet	200k/50k	~$1.35
Architecture review	Opus	200k/30k	~$5.25
Quick lookup	Haiku	20k/2k	~$0.02
Research subagent	Haiku	80k/10k	~$0.10
Full code review (council)	Mixed	500k/100k	~$3-8

Subagent Model Assignment

Orchestration patterns

When using cc-orchestrate or spawning subagents, assign models by role:

Research agents     → Haiku (cheap exploration, summary return)
Implementation agents → Sonnet (code generation quality)
Review/audit agents → Sonnet or Opus (depends on risk)
Architecture agents → Opus (deep reasoning required)

Example: builder-validator template

builder agent   → Sonnet 4.6 (writes code)
validator agent → Sonnet 4.6 (reviews code)

Example: research-council template

researcher agents (3x) → Haiku 4.5 (parallel exploration)
synthesizer agent      → Sonnet 4.6 (combines findings)

Budget Planning

Setting a session budget

Before starting a task, estimate cost:

Classify the task using the decision matrix above
Estimate token volume based on file count and task scope
Calculate cost using the pricing table
Set model with /model or claude -m

Token estimation rules of thumb

Content Type	Tokens per Line
TypeScript/JavaScript	~10
Python	~8
JSON/YAML	~6
Markdown	~5
Minified code	~15

Cost control techniques

Start with Haiku for research, switch to Sonnet for implementation
Use subagents to isolate expensive research from main context
Compact early at 60-70% context to avoid expensive re-reads
Limit tool output — avoid cat-ing entire large files; use Grep with limits
Batch related tasks to benefit from prompt caching (cache read = 10% of input cost)
Use --max-turns in headless mode to cap automated sessions

Model switching workflow

# Start with research on Haiku
/model claude-haiku-4-5-20251001
# "Find all files related to auth, summarize the architecture"

# Switch to Sonnet for implementation
/model claude-sonnet-4-6
# "Implement the new auth middleware based on the research above"

# Switch to Opus for the tricky part
/model claude-opus-4-6
# "Review the session handling for race conditions and edge cases"

Environment Variables

CLAUDE_MODEL=claude-sonnet-4-6          # Default model for sessions
ANTHROPIC_MODEL=claude-sonnet-4-6       # Alternative env var

Settings Configuration

{
  "model": "claude-sonnet-4-6",
  "smallFastModel": "claude-haiku-4-5-20251001"
}

The smallFastModel is used for internal operations like skill matching and context compression. Keep it on Haiku for cost efficiency.

Anti-patterns

Using Opus for everything — 5x the cost of Sonnet with marginal quality improvement on simple tasks
Using Haiku for complex implementation — saves money but produces lower-quality code that needs more iterations
Not using subagents — research in main context inflates token count for every subsequent turn
Re-reading large files — each read costs tokens; anchor important content instead
Ignoring cache hits — restructure prompts to maximize cache read tokens (10% of input cost)

model-routing