model-routing

Installation
SKILL.md

Model Routing Intelligence

Select the right Claude model for each task to optimize the cost/quality tradeoff.

Goal

Eliminate wasted spend by routing tasks to the cheapest model that produces acceptable quality, while ensuring complex tasks get the reasoning depth they need.

Decision Matrix

Task → Model mapping

Task Type Recommended Model Reasoning
Architecture decisions Opus 4.6 Needs deep multi-step reasoning, hidden coupling detection
Complex debugging Opus 4.6 Root cause analysis requires holding many hypotheses
Security review Opus 4.6 Must not miss subtle vulnerabilities
Standard implementation Sonnet 4.6 Best balance of speed, quality, and cost for code generation
Code review Sonnet 4.6 Good pattern recognition at reasonable cost
Refactoring Sonnet 4.6 Mechanical transformations with quality checks
Test writing Sonnet 4.6 Formulaic but needs understanding of code under test
File search / grep Haiku 4.5 Simple lookup, no deep reasoning needed
Documentation lookup Haiku 4.5 Reading and summarizing existing content
Commit message generation Haiku 4.5 Short, formulaic output
Simple Q&A Haiku 4.5 Direct answers, no complex analysis
Research subagents Haiku 4.5 Exploration tasks that return summaries

Complexity signals

Use these signals to decide when to escalate from Sonnet to Opus:

  • Multiple interacting systems or modules
  • Non-obvious failure modes
  • "Why does this work?" questions
  • Tasks where a wrong answer is expensive to fix
  • Cross-cutting concerns (auth, caching, observability)
  • Migration or backward-compatibility requirements

Use these signals to downgrade from Sonnet to Haiku:

  • Single-file changes
  • Mechanical transformations (rename, reformat)
  • Reading and summarizing (no generation)
  • Answering factual questions about code

Cost Tables

Per-token pricing (USD per million tokens)

Model Input Output Cache Write Cache Read
Opus 4.6 $15.00 $75.00 $18.75 $1.50
Sonnet 4.6 $3.00 $15.00 $3.75 $0.30
Haiku 4.5 $0.80 $4.00 $1.00 $0.08

Cost multipliers

Comparison Input Output
Opus vs Sonnet 5x 5x
Sonnet vs Haiku 3.75x 3.75x
Opus vs Haiku 18.75x 18.75x

Typical session costs

Task Model Est. Tokens (in/out) Est. Cost
Simple bug fix Sonnet 50k/10k ~$0.30
Feature implementation Sonnet 200k/50k ~$1.35
Architecture review Opus 200k/30k ~$5.25
Quick lookup Haiku 20k/2k ~$0.02
Research subagent Haiku 80k/10k ~$0.10
Full code review (council) Mixed 500k/100k ~$3-8

Subagent Model Assignment

Orchestration patterns

When using cc-orchestrate or spawning subagents, assign models by role:

Research agents     → Haiku (cheap exploration, summary return)
Implementation agents → Sonnet (code generation quality)
Review/audit agents → Sonnet or Opus (depends on risk)
Architecture agents → Opus (deep reasoning required)

Example: builder-validator template

builder agent   → Sonnet 4.6 (writes code)
validator agent → Sonnet 4.6 (reviews code)

Example: research-council template

researcher agents (3x) → Haiku 4.5 (parallel exploration)
synthesizer agent      → Sonnet 4.6 (combines findings)

Budget Planning

Setting a session budget

Before starting a task, estimate cost:

  1. Classify the task using the decision matrix above
  2. Estimate token volume based on file count and task scope
  3. Calculate cost using the pricing table
  4. Set model with /model or claude -m

Token estimation rules of thumb

Content Type Tokens per Line
TypeScript/JavaScript ~10
Python ~8
JSON/YAML ~6
Markdown ~5
Minified code ~15

Cost control techniques

  1. Start with Haiku for research, switch to Sonnet for implementation
  2. Use subagents to isolate expensive research from main context
  3. Compact early at 60-70% context to avoid expensive re-reads
  4. Limit tool output — avoid cat-ing entire large files; use Grep with limits
  5. Batch related tasks to benefit from prompt caching (cache read = 10% of input cost)
  6. Use --max-turns in headless mode to cap automated sessions

Model switching workflow

# Start with research on Haiku
/model claude-haiku-4-5-20251001
# "Find all files related to auth, summarize the architecture"

# Switch to Sonnet for implementation
/model claude-sonnet-4-6
# "Implement the new auth middleware based on the research above"

# Switch to Opus for the tricky part
/model claude-opus-4-6
# "Review the session handling for race conditions and edge cases"

Environment Variables

CLAUDE_MODEL=claude-sonnet-4-6          # Default model for sessions
ANTHROPIC_MODEL=claude-sonnet-4-6       # Alternative env var

Settings Configuration

{
  "model": "claude-sonnet-4-6",
  "smallFastModel": "claude-haiku-4-5-20251001"
}

The smallFastModel is used for internal operations like skill matching and context compression. Keep it on Haiku for cost efficiency.

Anti-patterns

  • Using Opus for everything — 5x the cost of Sonnet with marginal quality improvement on simple tasks
  • Using Haiku for complex implementation — saves money but produces lower-quality code that needs more iterations
  • Not using subagents — research in main context inflates token count for every subsequent turn
  • Re-reading large files — each read costs tokens; anchor important content instead
  • Ignoring cache hits — restructure prompts to maximize cache read tokens (10% of input cost)
Related skills
Installs
4
GitHub Stars
11
First Seen
Mar 21, 2026