Intelligent Router

Quick Setup

New to this skill? Start here:

Copy config.json.example to config.json (or customize the included config.json)
Edit config.json to add your available models with their costs and tiers
Test the CLI: python scripts/router.py classify "your task description"
Use in your agent: Reference models from your config when spawning sub-agents

Example config entry:

{
  "id": "anthropic/claude-sonnet-4",
  "alias": "Claude Sonnet",
  "tier": "COMPLEX",
  "input_cost_per_m": 3.00,
  "output_cost_per_m": 15.00,
  "capabilities": ["text", "code", "reasoning", "vision"]
}

Overview

This skill teaches AI agents how to intelligently route sub-agent tasks to different LLM models based on task complexity, cost, and capability requirements. The goal is to reduce costs by routing simple tasks to cheaper models while preserving quality for complex work.

When to Use This Skill

Use this skill whenever you:

Spawn sub-agents or delegate tasks to other models
Need to choose between different LLM options
Want to optimize costs without sacrificing quality
Handle tasks with varying complexity levels
Need to estimate costs before execution

Core Routing Logic

1. Four-Tier Classification System

Tasks are classified into four tiers based on complexity and risk:

Tier	Description	Example Tasks	Model Characteristics
🟢 SIMPLE	Routine, low-risk operations	Monitoring, status checks, API calls, summarization	Cheapest available, good for repetitive tasks
🟡 MEDIUM	Moderate complexity work	Code fixes, research, small patches, data analysis	Balanced cost/quality, good general purpose
🟠 COMPLEX	Multi-component development	Feature builds, debugging, architecture, multi-file changes	High-quality reasoning, excellent code generation
🔴 CRITICAL	High-stakes operations	Security audits, production deploys, financial operations	Best available model, maximum reliability

2. Model Selection Strategy

Configure your available models in config.json and assign them to tiers. The router will automatically recommend models based on task classification.

Configuration pattern:

{
  "models": [
    {
      "id": "{provider}/{model-name}",
      "alias": "Human-Friendly Name",
      "tier": "SIMPLE|MEDIUM|COMPLEX|CRITICAL",
      "provider": "anthropic|openai|google|local|etc",
      "input_cost_per_m": 0.50,
      "output_cost_per_m": 1.50,
      "context_window": 128000,
      "capabilities": ["text", "code", "reasoning", "vision"],
      "notes": "Optional notes about strengths/limitations"
    }
  ]
}

Tier recommendations:

SIMPLE tier: Models under $0.50/M input, optimized for speed/cost
MEDIUM tier: Models $0.50-$3.00/M input, good at coding/analysis
COMPLEX tier: Models $3.00-$5.00/M input, excellent reasoning
CRITICAL tier: Best available models, cost secondary to quality

3. Coding Task Routing (Important!)

For coding tasks specifically:

Simple code tasks (lint fixes, small patches, single-file changes)
- Use MEDIUM tier model as primary coder
- Consider spawning a SIMPLE tier model as QA reviewer
- Cost check: Only use coder+QA if combined cost < using COMPLEX tier directly
Complex code tasks (multi-file builds, architecture, debugging)
- Use COMPLEX or CRITICAL tier directly
- Skip delegation — premium models are more reliable and cost-effective for complex work
- QA review unnecessary when using top-tier models

Decision flow for coding:

IF task is simple code (lint, patch, single file):
  → {medium_model} as coder + optional {simple_model} QA
  → Only if (coder + QA cost) < {complex_model} solo

IF task is complex code (multi-file, architecture):
  → {complex_model} or {critical_model} directly
  → Skip delegation, skip QA — the model IS the quality

4. Usage Pattern

When spawning sub-agents, use the model parameter with IDs from your config.json:

# Use the router CLI to classify first (optional but recommended)
# $ python scripts/router.py classify "check GitHub notifications"
# → Recommends: SIMPLE tier, {simple_model}

# Simple task — monitoring
sessions_spawn(
    task="Check GitHub notifications and summarize recent activity",
    model="{simple_model}",  # Use your configured SIMPLE tier model ID
    label="github-monitor"
)

# Medium task — code fix
sessions_spawn(
    task="Fix lint errors in utils.js and write changes to disk",
    model="{medium_model}",  # Use your configured MEDIUM tier model ID
    label="lint-fix"
)

# Complex task — feature development
sessions_spawn(
    task="Build authentication system with JWT, middleware, tests",
    model="{complex_model}",  # Use your configured COMPLEX tier model ID
    label="auth-feature"
)

# Critical task — security audit
sessions_spawn(
    task="Security audit of authentication code for vulnerabilities",
    model="{critical_model}",  # Use your configured CRITICAL tier model ID
    label="security-audit"
)

5. Cost Awareness

Understanding cost structures helps optimize routing decisions:

Typical cost ranges per million tokens (input/output):

SIMPLE tier: $0.10-$0.50 / $0.10-$1.50
MEDIUM tier: $0.40-$3.00 / $0.40-$15.00
COMPLEX tier: $3.00-$5.00 / $1.30-$25.00
CRITICAL tier: $5.00+ / $25.00+

Cost estimation:

# Estimate cost before running
python scripts/router.py cost-estimate "build authentication system"
# Output: Tier: COMPLEX, Est. cost: $0.024 USD

Rule of thumb:

High-volume repetitive tasks → cheaper models
One-off complex critical work → premium models
When in doubt → estimate cost of both options and compare

6. Fallback & Escalation Strategy

If a model produces unsatisfactory results:

Identify the issue: Model limitation vs task misclassification
Escalate one tier: Try the next tier up for the same task
Document failures: Note model-specific limitations for future routing
Consider capabilities: Check if model has required capabilities (vision, function-calling, etc.)
Review classification: Was the task properly classified initially?

Escalation path:

SIMPLE → MEDIUM → COMPLEX → CRITICAL

7. Decision Heuristics

Quick classification rules for common patterns:

SIMPLE tier indicators:

Keywords: check, monitor, fetch, get, status, list, summarize
High-frequency operations (heartbeats, polling)
Well-defined API calls with minimal logic
Data extraction without analysis

MEDIUM tier indicators:

Keywords: fix, patch, update, research, analyze, test
Code changes under ~50 lines
Single-file modifications
Research and documentation tasks

COMPLEX tier indicators:

Keywords: build, create, architect, debug, design, integrate
Multi-file changes or new features
Complex debugging or troubleshooting
System design and architecture work

CRITICAL tier indicators:

Keywords: security, production, deploy, financial, audit
Security-sensitive operations
Production deployments
Financial or legal analysis
High-stakes decision-making

When in doubt: Go one tier up. Under-speccing costs more in retries than over-speccing costs in model quality.

8. Extended Thinking Modes

Some models support extended thinking/reasoning which improves quality but increases cost:

Models with thinking support:

Anthropic Claude models: Use thinking="on" or thinking="budget_tokens:5000"
DeepSeek R1 variants: Built-in chain-of-thought reasoning
OpenAI o1/o3 models: Native reasoning capabilities

When to use thinking:

COMPLEX tier tasks requiring deep reasoning
CRITICAL tier tasks where accuracy is paramount
Multi-step logical problems
Architecture and design decisions

When to avoid thinking:

SIMPLE tier tasks (wasteful)
MEDIUM tier routine operations
High-frequency repetitive tasks
Tasks where thinking tokens would 2-5x the cost unnecessarily

# Enable thinking for complex architectural work
sessions_spawn(
    task="Design scalable microservices architecture for payment system",
    model="{complex_model}",
    thinking="on",  # or thinking="budget_tokens:5000"
    label="architecture-design"
)

Advanced Patterns

Pattern 1: Two-Phase Processing

For large or uncertain tasks, use a cheaper model for initial work, then refine with a better model.

Note: Sub-agents are asynchronous — results come back as notifications, not synchronous returns.

# Phase 1: Draft with cheaper model
sessions_spawn(
    task="Draft initial API design document outline",
    model="{simple_model}",
    label="draft-phase"
)

# Wait for draft-phase to complete and write output...

# Phase 2: Refine with capable model (after Phase 1 finishes)
sessions_spawn(
    task="Review and refine the draft at /tmp/api-draft.md, add detailed specs",
    model="{medium_model}",
    label="refine-phase"
)

Savings: Process bulk content with cheap model, only use expensive model for refinement.

Pattern 2: Batch Processing

Group multiple similar SIMPLE tasks together to reduce overhead:

# Instead of spawning 10 separate agents:
tasks = [
    "Check server1 status",
    "Check server2 status",
    # ... 10 tasks
]

# Batch them:
sessions_spawn(
    task=f"Run these checks: {', '.join(tasks)}. Report any issues.",
    model="{simple_model}",
    label="batch-monitoring"
)

Pattern 3: Tiered Escalation

Start with MEDIUM tier, escalate to COMPLEX if needed:

# Try MEDIUM first
sessions_spawn(
    task="Debug intermittent test failures in test_auth.py",
    model="{medium_model}",
    label="debug-attempt-1"
)

# If insufficient, escalate:
if debug_failed:
    sessions_spawn(
        task="Deep debug of test_auth.py failures (previous attempt incomplete)",
        model="{complex_model}",
        label="debug-attempt-2"
    )

Pattern 4: Cost-Benefit Analysis

Before routing, consider:

Criticality: How bad is failure? → Higher criticality = higher tier
Cost delta: What's the price difference between tiers? → Small delta = lean toward higher tier
Retry costs: Will failures require retries? → High retry cost = start with higher tier
Time sensitivity: How urgent is completion? → Urgent = higher tier for speed/reliability

Using the Router CLI

The included router.py script helps with classification and cost estimation:

# Classify a task and get model recommendation
python scripts/router.py classify "fix authentication bug in login.py"
# Output: Classification: MEDIUM
#         Recommended Model: {medium_model}

# List all configured models
python scripts/router.py models
# Output: Models grouped by tier

# Check configuration health
python scripts/router.py health
# Output: Validates config.json structure

# Estimate task cost
python scripts/router.py cost-estimate "build payment processing system"
# Output: Tier: COMPLEX, Est. tokens: 5000/3000, Cost: $0.060 USD

Integration tip: Run router.py classify before spawning agents to validate your tier selection.

Configuration Guide

Setting Up Your Models

Inventory your models: List all LLM providers and models you have access to
Gather pricing: Find input/output costs per million tokens from provider docs
Assign tiers: Map models to SIMPLE/MEDIUM/COMPLEX/CRITICAL based on capability
Document capabilities: Note what each model can do (vision, function-calling, etc.)
Add notes: Include limitations or special characteristics

Example Multi-Provider Config

{
  "models": [
    {
      "id": "local/ollama-qwen-1.5b",
      "alias": "Local Qwen",
      "tier": "SIMPLE",
      "provider": "ollama",
      "input_cost_per_m": 0.00,
      "output_cost_per_m": 0.00,
      "context_window": 32768,
      "capabilities": ["text"],
      "notes": "Free local model, good for testing and simple tasks"
    },
    {
      "id": "openai/gpt-4o-mini",
      "alias": "GPT-4o Mini",
      "tier": "MEDIUM",
      "provider": "openai",
      "input_cost_per_m": 0.15,
      "output_cost_per_m": 0.60,
      "context_window": 128000,
      "capabilities": ["text", "code", "vision"],
      "notes": "Great balance of cost and capability"
    },
    {
      "id": "anthropic/claude-sonnet-4",
      "alias": "Claude Sonnet",
      "tier": "COMPLEX",
      "provider": "anthropic",
      "input_cost_per_m": 3.00,
      "output_cost_per_m": 15.00,
      "context_window": 200000,
      "capabilities": ["text", "code", "reasoning", "vision"],
      "notes": "Excellent for complex coding and analysis"
    },
    {
      "id": "anthropic/claude-opus-4",
      "alias": "Claude Opus",
      "tier": "CRITICAL",
      "provider": "anthropic",
      "input_cost_per_m": 15.00,
      "output_cost_per_m": 75.00,
      "context_window": 200000,
      "capabilities": ["text", "code", "reasoning", "vision", "function-calling"],
      "notes": "Best available for critical operations"
    }
  ]
}

Validation Checklist

At least one model per tier (SIMPLE, MEDIUM, COMPLEX, CRITICAL)
All models have required fields (id, alias, tier, costs, capabilities)
Model IDs match your actual provider/model format
Costs are accurate per million tokens
Tiers make sense relative to each other (SIMPLE cheaper than CRITICAL)
Run python scripts/router.py health to validate

Resources

For additional guidance:

references/model-catalog.md - Guide to evaluating and selecting models for each tier
references/examples.md - Real-world routing patterns and examples
config.json - Your model configuration (customize this!)

Quick Reference Card

Classification:

"check/monitor/fetch" → SIMPLE tier
"fix/patch/research" → MEDIUM tier
"build/debug/architect" → COMPLEX tier
"security/production/financial" → CRITICAL tier

Coding tasks:

Simple code → {medium_model} + optional {simple_model} QA
Complex code → {complex_model} or {critical_model} directly

Cost optimization:

High-volume tasks → cheaper models
One-off complex tasks → premium models
When uncertain → estimate both options

General rule: When in doubt, go one tier up — retries cost more than quality.

intelligent-router