openrouter

SKILL.md

OpenRouter API for AI Agents

Expert guidance for AI agents integrating with OpenRouter API - unified access to 400+ models from 90+ providers.

When to use this skill:

  • Making chat completions via OpenRouter API
  • Selecting appropriate models and variants
  • Implementing streaming responses
  • Using tool/function calling
  • Enforcing structured outputs
  • Integrating web search
  • Handling multimodal inputs (images, audio, video, PDFs)
  • Managing model routing and fallbacks
  • Handling errors and retries
  • Optimizing cost and performance

API Basics

Making a Request

Endpoint: POST https://openrouter.ai/api/v1/chat/completions

Headers (required):

{
  'Authorization': `Bearer ${apiKey}`,
  'Content-Type': 'application/json',
  // Optional: for app attribution
  'HTTP-Referer': 'https://your-app.com',
  'X-Title': 'Your App Name'
}

Minimal request structure:

const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      { role: 'user', content: 'Your prompt here' }
    ]
  })
});

Response Structure

Non-streaming response:

{
  "id": "gen-abc123",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Response text here"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  },
  "model": "anthropic/claude-3.5-sonnet"
}

Key fields:

  • choices[0].message.content - The assistant's response
  • choices[0].finish_reason - Why generation stopped (stop, length, tool_calls, etc.)
  • usage - Token counts and cost information
  • model - Actual model used (may differ from requested)

When to Use Streaming vs Non-Streaming

Use streaming (stream: true) when:

  • Real-time responses needed (chat interfaces, interactive tools)
  • Latency matters (user-facing applications)
  • Large responses expected (long-form content)
  • Want to show progressive output

Use non-streaming when:

  • Processing in background (batch jobs, async tasks)
  • Need complete response before processing
  • Building to an API/endpoint
  • Response is short (few tokens)

Streaming basics:

const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: { /* ... */ },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [{ role: 'user', content: '...' }],
    stream: true
  })
});

for await (const chunk of response.body) {
  const text = new TextDecoder().decode(chunk);
  const lines = text.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6); // Remove 'data: '
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      // Accumulate or display content
    }
  }
}

Model Selection

Model Identifier Format

Format: provider/model-name[:variant]

Examples:

  • anthropic/claude-3.5-sonnet - Specific model
  • openai/gpt-4o:online - With web search enabled
  • google/gemini-2.0-flash:free - Free tier variant

Model Variants and When to Use Them

Variant Use When Tradeoffs
:free Cost is primary concern, testing, prototyping Rate limits, lower quality models
:online Need current information, real-time data Higher cost, web search latency
:extended Large context window needed May be slower, higher cost
:thinking Complex reasoning, multi-step problems Higher token usage, slower
:nitro Speed is critical May have quality tradeoffs
:exacto Need specific provider No fallbacks, may be less available

Default Model Choices by Task

General purpose: anthropic/claude-3.5-sonnet or openai/gpt-4o

  • Balanced quality, speed, cost
  • Good for most tasks

Coding: anthropic/claude-3.5-sonnet or openai/gpt-4o

  • Strong code generation and understanding
  • Good reasoning

Complex reasoning: anthropic/claude-opus-4:thinking or openai/o3

  • Deep reasoning capabilities
  • Higher cost, slower

Fast responses: openai/gpt-4o-mini:nitro or google/gemini-2.0-flash

  • Minimal latency
  • Good for real-time applications

Cost-sensitive: google/gemini-2.0-flash:free or meta-llama/llama-3.1-70b:free

  • No cost with limits
  • Good for high-volume, lower-complexity tasks

Current information: anthropic/claude-3.5-sonnet:online or google/gemini-2.5-pro:online

  • Web search built-in
  • Real-time data

Large context: anthropic/claude-3.5-sonnet:extended or google/gemini-2.5-pro:extended

  • 200K+ context windows
  • Document analysis, codebase understanding

Provider Routing Preferences

Default behavior: OpenRouter automatically selects best provider

Explicit provider order:

{
  provider: {
    order: ['anthropic', 'openai', 'google'],
    allow_fallbacks: true,
    sort: 'price' // 'price', 'latency', or 'throughput'
  }
}

When to set provider order:

  • Have preferred provider arrangements
  • Need to optimize for specific metric (cost, speed)
  • Want to exclude certain providers
  • Have BYOK (Bring Your Own Key) for specific providers

Model Fallbacks

Automatic fallback - try multiple models in order:

{
  models: [
    'anthropic/claude-3.5-sonnet',
    'openai/gpt-4o',
    'google/gemini-2.0-flash'
  ]
}

When to use fallbacks:

  • High reliability required
  • Multiple providers acceptable
  • Want graceful degradation
  • Avoid single point of failure

Fallback behavior:

  • Tries first model
  • Falls to next on error (5xx, 429, timeout)
  • Uses whichever succeeds
  • Returns which model was used in model field

Parameters You Need

Core Parameters

model (string, optional)

  • Which model to use
  • Default: user's default model
  • Always specify for consistency

messages (Message[], required)

  • Conversation history
  • Structure: { role: 'user'|'assistant'|'system', content: string | ContentPart[] }
  • For multimodal: content can be array of text and image_url parts

stream (boolean, default: false)

  • Enable Server-Sent Events streaming
  • Use for real-time responses

temperature (float, 0.0-2.0, default: 1.0)

  • Controls randomness
  • 0.0-0.3: Deterministic, factual responses (code, precise answers)
  • 0.4-0.7: Balanced (general use)
  • 0.8-1.2: Creative (brainstorming, creative writing)
  • 1.3-2.0: Highly creative, unpredictable (experimental)

max_tokens (integer, optional)

  • Maximum tokens to generate
  • Always set to control cost and prevent runaway responses
  • Typical: 100-500 for short, 1000-2000 for long responses
  • Model limit: context_length - prompt_length

top_p (float, 0.0-1.0, default: 1.0)

  • Nucleus sampling - limits to top probability mass
  • Use instead of temperature when you want predictable diversity
  • 0.9-0.95: Common settings for quality

top_k (integer, 0+, default: 0/disabled)

  • Limit to K most likely tokens
  • 1: Always most likely (deterministic)
  • 40-50: Balanced
  • Not available for OpenAI models

Sampling Strategy Guidelines

For code generation: temperature: 0.1-0.3, top_p: 0.95 For factual responses: temperature: 0.0-0.2 For creative writing: temperature: 0.8-1.2 For brainstorming: temperature: 1.0-1.5 For chat: temperature: 0.6-0.8

Tool Calling Parameters

tools (Tool[], default: [])

  • Available functions for model to call
  • Structure:
{
  type: 'function',
  function: {
    name: 'function_name',
    description: 'What it does',
    parameters: { /* JSON Schema */ }
  }
}

tool_choice (string | object, default: 'auto')

  • Control when tools are called
  • 'auto': Model decides (default)
  • 'none': Never call tools
  • 'required': Must call a tool
  • { type: 'function', function: { name: 'specific_tool' } }: Force specific tool

parallel_tool_calls (boolean, default: true)

  • Allow multiple tools simultaneously
  • Set false for sequential execution

When to use tools:

  • Need to query external APIs (weather, search, database)
  • Need to perform calculations or data processing
  • Building agentic systems
  • Need structured data extraction

Structured Output Parameters

response_format (object, optional)

  • Enforce specific output format

JSON object mode:

{ type: 'json_object' }
  • Model returns valid JSON
  • Must also instruct model in system message

JSON Schema mode (strict):

{
  type: 'json_schema',
  json_schema: {
    name: 'schema_name',
    strict: true,
    schema: { /* JSON Schema */ }
  }
}
  • Model returns JSON matching exact schema
  • Use when structure is critical (APIs, data processing)

When to use structured outputs:

  • Need predictable response format
  • Integrating with systems (APIs, databases)
  • Data extraction
  • Form filling

Web Search Parameters

Enable via model variant (simplest):

{ model: 'anthropic/claude-3.5-sonnet:online' }

Enable via plugin:

{
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5
  }]
}

When to use web search:

  • Need current information (news, prices, events)
  • User asks about recent developments
  • Need factual verification
  • Topic requires real-time data

Other Important Parameters

user (string, optional)

  • Stable identifier for end-user
  • Set when you have user IDs
  • Helps with abuse detection and caching

session_id (string, optional)

  • Group related requests
  • Set for conversation tracking
  • Improves caching and observability

metadata (Record<string, string>, optional)

  • Custom metadata (max 16 key-value pairs)
  • Use for analytics and tracking
  • Keys: max 64 chars, Values: max 512 chars

stop (string | string[], optional)

  • Stop sequences to halt generation
  • Common: ['\n\n', '###', 'END']

Handling Responses

Non-Streaming Responses

Extract content:

const response = await fetch(/* ... */);
const data = await response.json();

const content = data.choices[0].message.content;
const finishReason = data.choices[0].finish_reason;
const usage = data.usage;

Check for tool calls:

const toolCalls = data.choices[0].message.tool_calls;
if (toolCalls) {
  // Model wants to call tools
  for (const toolCall of toolCalls) {
    const { name, arguments: args } = toolCall.function;
    const parsedArgs = JSON.parse(args);
    // Execute tool...
  }
}

Streaming Responses

Process SSE stream:

let fullContent = '';
const response = await fetch(/* ... */);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6);
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      fullContent += content;
      // Process incrementally...
    }

    // Handle usage in final chunk
    if (parsed.usage) {
      console.log('Usage:', parsed.usage);
    }
  }
}

Handle streaming tool calls:

// Tool calls stream across multiple chunks
let currentToolCall = null;
let toolArgs = '';

for (const parsed of chunks) {
  const toolCallChunk = parsed.choices?.[0]?.delta?.tool_calls?.[0];

  if (toolCallChunk?.function?.name) {
    currentToolCall = { id: toolCallChunk.id, ...toolCallChunk.function };
  }

  if (toolCallChunk?.function?.arguments) {
    toolArgs += toolCallChunk.function.arguments;
  }

  if (parsed.choices?.[0]?.finish_reason === 'tool_calls' && currentToolCall) {
    // Complete tool call
    currentToolCall.arguments = toolArgs;
    // Execute tool...
  }
}

Usage and Cost Tracking

const { usage } = data;
console.log(`Prompt: ${usage.prompt_tokens}`);
console.log(`Completion: ${usage.completion_tokens}`);
console.log(`Total: ${usage.total_tokens}`);

// Cost (if available)
if (usage.cost) {
  console.log(`Cost: $${usage.cost.toFixed(6)}`);
}

// Detailed breakdown
console.log(usage.prompt_tokens_details);
console.log(usage.completion_tokens_details);

Error Handling

Common HTTP Status Codes

400 Bad Request

  • Invalid request format
  • Missing required fields
  • Parameter out of range
  • Fix: Validate request structure and parameters

401 Unauthorized

  • Missing or invalid API key
  • Fix: Check API key format and permissions

403 Forbidden

  • Insufficient permissions
  • Model not allowed
  • Fix: Check guardrails, model access, API key permissions

402 Payment Required

  • Insufficient credits
  • Fix: Add credits to account

408 Request Timeout

  • Request took too long
  • Fix: Reduce prompt length, use streaming, try simpler model

429 Rate Limited

  • Too many requests
  • Fix: Implement exponential backoff, reduce request rate

502 Bad Gateway

  • Provider error
  • Fix: Use model fallbacks, retry with different model

503 Service Unavailable

  • Service overloaded
  • Fix: Retry with backoff, use fallbacks

Retry Strategy

Exponential backoff:

async function requestWithRetry(url, body, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await fetch(url, body);

      if (response.ok) {
        return await response.json();
      }

      // Retry on rate limit or server errors
      if (response.status === 429 || response.status >= 500) {
        const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }

      // Don't retry other errors
      return response;
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
      const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}

Retryable status codes: 408, 429, 502, 503 Do not retry: 400, 401, 403, 402

Graceful Degradation

Use model fallbacks:

{
  models: [
    'anthropic/claude-3.5-sonnet',  // Primary
    'openai/gpt-4o',                // Fallback 1
    'google/gemini-2.0-flash'        // Fallback 2
  ]
}

Handle partial failures:

  • Log errors but continue
  • Fall back to simpler features
  • Use cached responses when available
  • Provide degraded experience rather than failing completely

Advanced Features

When to Use Tool Calling

Good use cases:

  • Querying external APIs (weather, stock prices, databases)
  • Performing calculations or data processing
  • Extracting structured data from unstructured text
  • Building agentic systems with multiple steps
  • When decisions require external information

Implementation pattern:

  1. Define tools with clear descriptions and parameters
  2. Send request with tools array
  3. Check if tool_calls present in response
  4. Execute tools with parsed arguments
  5. Send tool results back in a new request
  6. Repeat until model provides final answer

See: references/ADVANCED_PATTERNS.md for complete agentic loop implementation

When to Use Structured Outputs

Good use cases:

  • API responses (need specific schema)
  • Data extraction (forms, documents)
  • Configuration files (JSON, YAML)
  • Database operations (structured queries)
  • When downstream processing requires specific format

Implementation pattern:

  1. Define JSON Schema for desired output
  2. Set response_format: { type: 'json_schema', json_schema: { ... } }
  3. Instruct model to produce JSON (system or user message)
  4. Validate response against schema
  5. Handle parsing errors gracefully

Add response healing for robustness:

{
  response_format: { /* ... */ },
  plugins: [{ id: 'response-healing' }]
}

When to Use Web Search

Good use cases:

  • User asks about recent events, news, or current data
  • Need verification of facts
  • Questions with time-sensitive information
  • Topic requires up-to-date information
  • User explicitly requests current information

Simple implementation (variant):

{
  model: 'anthropic/claude-3.5-sonnet:online'
}

Advanced implementation (plugin):

{
  model: 'openrouter.ai/auto',
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5,
    engine: 'exa' // or 'native'
  }]
}

When to Use Multimodal Inputs

Images (vision):

  • OCR, image understanding, visual analysis
  • Models: openai/gpt-4o, anthropic/claude-3.5-sonnet, google/gemini-2.5-pro

Audio:

  • Speech-to-text, audio analysis
  • Models with audio support

Video:

  • Video understanding, frame analysis
  • Models with video support

PDFs:

  • Document parsing, content extraction
  • Requires file-parser plugin

Implementation: See references/ADVANCED_PATTERNS.md for multimodal patterns


Best Practices for AI

Default Model Selection

Start with: anthropic/claude-3.5-sonnet or openai/gpt-4o

  • Good balance of quality, speed, cost
  • Strong at most tasks
  • Wide compatibility

Switch based on needs:

  • Need speed → openai/gpt-4o-mini:nitro or google/gemini-2.0-flash
  • Complex reasoning → anthropic/claude-opus-4:thinking
  • Need web search → :online variant
  • Large context → :extended variant
  • Cost-sensitive → :free variant

Default Parameters

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [...],
  temperature: 0.6,  // Balanced creativity
  max_tokens: 1000,   // Reasonable length
  top_p: 0.95        // Common for quality
}

Adjust based on task:

  • Code: temperature: 0.2
  • Creative: temperature: 1.0
  • Factual: temperature: 0.0-0.3

When to Prefer Streaming

Always prefer streaming when:

  • User-facing (chat, interactive tools)
  • Response length unknown
  • Want progressive feedback
  • Latency matters

Use non-streaming when:

  • Batch processing
  • Need complete response before acting
  • Building API endpoints
  • Very short responses (< 50 tokens)

When to Enable Specific Features

Tools: Enable when you need external data or actions Structured outputs: Enable when response format matters Web search: Enable when current information needed Streaming: Enable for user-facing, real-time responses Model fallbacks: Enable when reliability critical Provider routing: Enable when you have preferences or constraints

Cost Optimization Patterns

Use free models for:

  • Testing and prototyping
  • Low-complexity tasks
  • High-volume, low-value operations

Use routing to optimize:

{
  provider: {
    order: ['openai', 'anthropic'],
    sort: 'price',  // Optimize for cost
    allow_fallbacks: true
  }
}

Set max_tokens to prevent runaway responses Use caching via user and session_id parameters Enable prompt caching when supported

Performance Optimization

Reduce latency:

  • Use :nitro variants for speed
  • Use streaming for perceived speed
  • Set user ID for caching benefits
  • Choose faster models (mini, flash) when quality allows

Increase throughput:

  • Use provider routing with sort: 'throughput'
  • Parallelize independent requests
  • Use streaming to reduce wait time

Optimize for specific metrics:

{
  provider: {
    sort: 'latency'  // or 'price' or 'throughput'
  }
}

Progressive Disclosure

For detailed reference information, consult:

Parameters Reference

File: references/PARAMETERS.md

  • Complete parameter reference (50+ parameters)
  • Types, ranges, defaults
  • Parameter support by model
  • Usage examples

Error Codes Reference

File: references/ERROR_CODES.md

  • All HTTP status codes
  • Error response structure
  • Error metadata types
  • Native finish reasons
  • Retry strategies

Model Selection Guide

File: references/MODEL_SELECTION.md

  • Model families and capabilities
  • Model variants explained
  • Selection criteria by use case
  • Model capability matrix
  • Provider routing preferences

Routing Strategies

File: references/ROUTING_STRATEGIES.md

  • Model fallbacks configuration
  • Provider selection patterns
  • Auto router setup
  • Routing by use case (cost, latency, quality)

Advanced Patterns

File: references/ADVANCED_PATTERNS.md

  • Tool calling with agentic loops
  • Structured outputs implementation
  • Web search integration
  • Multimodal handling
  • Streaming patterns
  • Framework integrations

Working Examples

File: references/EXAMPLES.md

  • TypeScript patterns for common tasks
  • Python examples
  • cURL examples
  • Advanced patterns
  • Framework integration examples

Ready-to-Use Templates

Directory: templates/

  • basic-request.ts - Minimal working request
  • streaming-request.ts - SSE streaming with cancellation
  • tool-calling.ts - Complete agentic loop with tools
  • structured-output.ts - JSON Schema enforcement
  • error-handling.ts - Robust retry logic

Quick Reference

Minimal Request

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: 'Your prompt' }]
}

With Streaming

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  stream: true
}

With Tools

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  tools: [{ type: 'function', function: { name, description, parameters } }],
  tool_choice: 'auto'
}

With Structured Output

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'system', content: 'Output JSON only...' }],
  response_format: { type: 'json_object' }
}

With Web Search

{
  model: 'anthropic/claude-3.5-sonnet:online',
  messages: [{ role: 'user', content: '...' }]
}

With Model Fallbacks

{
  models: ['anthropic/claude-3.5-sonnet', 'openai/gpt-4o'],
  messages: [{ role: 'user', content: '...' }]
}

Remember: OpenRouter is OpenAI-compatible. Use the OpenAI SDK with baseURL: 'https://openrouter.ai/api/v1' for a familiar experience.

Weekly Installs
7
GitHub Stars
2
First Seen
Feb 1, 2026
Installed on
opencode7
gemini-cli7
kimi-cli7
claude-code6
github-copilot6
amp6