llm-router

SKILL.md

LLM Router

Selects the optimal LLM model for each task. The single biggest cost lever in multi-agent systems — intelligent routing saves 45-85% while maintaining 95%+ of top-model quality.


When to Use

Use for:

  • Deciding which model to call for a specific task
  • Assigning models to DAG nodes in agent workflows
  • Optimizing LLM API costs across a system
  • Building cascading try-cheap-first patterns

NOT for:

  • Prompt engineering (use prompt-engineer)
  • Model fine-tuning or training
  • Comparing model architectures (academic research)

Routing Decision Tree

flowchart TD
  A{Task type?} -->|Classify / validate / format / extract| T1["Tier 1: Haiku, GPT-4o-mini (~$0.001)"]
  A -->|Write / implement / review / synthesize| T2["Tier 2: Sonnet, GPT-4o (~$0.01)"]
  A -->|Reason / architect / judge / decompose| T3["Tier 3: Opus, o1 (~$0.10)"]
  
  T1 --> Q1{Quality sufficient?}
  Q1 -->|Yes| Done1[Use cheap model]
  Q1 -->|No| T2
  
  T2 --> Q2{Quality sufficient?}
  Q2 -->|Yes| Done2[Use balanced model]
  Q2 -->|No| T3

Tier Assignment Table

Task Type Tier Models Cost/Call Why This Tier
Classify input type 1 Haiku, GPT-4o-mini ~$0.001 Deterministic categorization
Validate schema/format 1 Haiku, GPT-4o-mini ~$0.001 Mechanical checking
Format output / template 1 Haiku, GPT-4o-mini ~$0.001 Structured transformation
Extract structured data 1 Haiku, GPT-4o-mini ~$0.001 Pattern matching
Summarize text 1-2 Haiku → Sonnet ~$0.001-0.01 Short summaries: Haiku; nuanced: Sonnet
Write content/docs 2 Sonnet, GPT-4o ~$0.01 Creative quality matters
Implement code 2 Sonnet, GPT-4o ~$0.01 Correctness + style
Review code/diffs 2 Sonnet, GPT-4o ~$0.01 Needs judgment, not just pattern matching
Research synthesis 2 Sonnet, GPT-4o ~$0.01 Multi-source reasoning
Decompose ambiguous problem 3 Opus, o1 ~$0.10 Requires deep understanding
Design architecture 3 Opus, o1 ~$0.10 Complex system reasoning
Judge output quality 3 Opus, o1 ~$0.10 Meta-reasoning about quality
Plan multi-step strategy 3 Opus, o1 ~$0.10 Long-horizon planning

Three Routing Strategies

Strategy 1: Static Tier Assignment (Start Here)

Assign model by task type at DAG design time. No runtime logic. Gets 60-70% of possible savings.

nodes:
  - id: classify
    model: claude-haiku-4-5     # Tier 1: $0.001
  - id: implement
    model: claude-sonnet-4-5    # Tier 2: $0.01  
  - id: evaluate
    model: claude-opus-4-5      # Tier 3: $0.10

Strategy 2: Cascading (Try Cheap First)

Try the cheap model; if quality is below threshold, escalate. Adds ~1s latency but saves 50-80% on nodes where cheap succeeds.

1. Execute with Tier 1 model
2. Quick quality check (also Tier 1 — costs ~$0.001)
3. If quality ≥ threshold → done
4. If quality < threshold → re-execute with Tier 2

Best for nodes where you're genuinely unsure which tier is needed.

Strategy 3: Adaptive (Learn from History)

Record success/failure per task type per model. Over time, the router learns:

  • "Classification nodes always succeed on Haiku" → stay cheap
  • "Code review nodes fail on Haiku 40% of the time" → upgrade to Sonnet
  • "Architecture nodes succeed on Sonnet 90% of the time" → don't need Opus

Gets 75-85% savings after ~100 executions of training data.


Provider Selection

Once model tier is chosen, select the provider:

Model Class Provider Options Selection Criteria
Haiku-class Anthropic, AWS Bedrock Latency, regional availability
Sonnet-class Anthropic, AWS Bedrock, GCP Vertex Cost, rate limits
Opus-class Anthropic Only provider
GPT-4o-class OpenAI, Azure OpenAI Rate limits, compliance
Open-source Ollama (local), Together.ai, Fireworks Cost ($0), latency, GPU availability

Cost Impact Example

10-node DAG, "refactor a codebase":

Strategy Mix Cost Savings
All Opus 10× $0.10 $1.00
All Sonnet 10× $0.01 $0.10 90%
Static tiers 4× Haiku + 4× Sonnet + 2× Opus $0.24 76%
Cascading 6× Haiku + 3× Sonnet + 1× Opus $0.14 86%
Adaptive (trained) Dynamic ~$0.08 92%

Anti-Patterns

Always Use the Best Model

Wrong: Route everything to Opus/o1 "for quality." Reality: 60%+ of typical DAG nodes are classification, validation, or formatting — tasks where Haiku performs identically to Opus. You're burning money.

Always Use the Cheapest Model

Wrong: Route everything to Haiku "for cost." Reality: Complex reasoning, architecture design, and quality judgment genuinely need stronger models. Haiku will produce plausible-looking but subtly wrong output on hard tasks.

Ignoring Latency

Wrong: Only optimizing for cost, ignoring that Opus takes 5-10x longer than Haiku. Reality: In a 10-node DAG, model choice affects total execution time as much as cost. Route time-critical paths to faster models.

No Feedback Loop

Wrong: Setting model tiers once and never adjusting. Reality: As models improve (Haiku gets smarter every generation), tasks that needed Sonnet last month may work on Haiku today. Record outcomes and adapt.

Weekly Installs
8
GitHub Stars
51
First Seen
5 days ago
Installed on
kimi-cli8
gemini-cli8
amp8
cline8
github-copilot8
codex8