OpenRouter API for AI Agents

Expert guidance for AI agents integrating with OpenRouter API - unified access to 400+ models from 90+ providers.

When to use this skill:

Making chat completions via OpenRouter API
Selecting appropriate models and variants
Implementing streaming responses
Using tool/function calling
Enforcing structured outputs
Integrating web search
Handling multimodal inputs (images, audio, video, PDFs)
Managing model routing and fallbacks
Handling errors and retries
Optimizing cost and performance

API Basics

Making a Request

Endpoint: POST https://openrouter.ai/api/v1/chat/completions

Headers (required):

{
  'Authorization': `Bearer ${apiKey}`,
  'Content-Type': 'application/json',
  // Optional: for app attribution
  'HTTP-Referer': 'https://your-app.com',
  'X-Title': 'Your App Name'
}

Minimal request structure:

const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      { role: 'user', content: 'Your prompt here' }
    ]
  })
});

Response Structure

Non-streaming response:

{
  "id": "gen-abc123",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Response text here"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  },
  "model": "anthropic/claude-3.5-sonnet"
}

Key fields:

choices[0].message.content - The assistant's response
choices[0].finish_reason - Why generation stopped (stop, length, tool_calls, etc.)
usage - Token counts and cost information
model - Actual model used (may differ from requested)

When to Use Streaming vs Non-Streaming

Use streaming (stream: true) when:

Real-time responses needed (chat interfaces, interactive tools)
Latency matters (user-facing applications)
Large responses expected (long-form content)
Want to show progressive output

Use non-streaming when:

Processing in background (batch jobs, async tasks)
Need complete response before processing
Building to an API/endpoint
Response is short (few tokens)

Streaming basics:

const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: { /* ... */ },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [{ role: 'user', content: '...' }],
    stream: true
  })
});

for await (const chunk of response.body) {
  const text = new TextDecoder().decode(chunk);
  const lines = text.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6); // Remove 'data: '
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      // Accumulate or display content
    }
  }
}

Model Selection

Model Identifier Format

Format: provider/model-name[:variant]

Examples:

anthropic/claude-3.5-sonnet - Specific model
openai/gpt-4o:online - With web search enabled
google/gemini-2.0-flash:free - Free tier variant

Model Variants and When to Use Them

Variant	Use When	Tradeoffs
`:free`	Cost is primary concern, testing, prototyping	Rate limits, lower quality models
`:online`	Need current information, real-time data	Higher cost, web search latency
`:extended`	Large context window needed	May be slower, higher cost
`:thinking`	Complex reasoning, multi-step problems	Higher token usage, slower
`:nitro`	Speed is critical	May have quality tradeoffs
`:exacto`	Need specific provider	No fallbacks, may be less available

Default Model Choices by Task

General purpose: anthropic/claude-3.5-sonnet or openai/gpt-4o

Balanced quality, speed, cost
Good for most tasks

Coding: anthropic/claude-3.5-sonnet or openai/gpt-4o

Strong code generation and understanding
Good reasoning

Complex reasoning: anthropic/claude-opus-4:thinking or openai/o3

Deep reasoning capabilities
Higher cost, slower

Fast responses: openai/gpt-4o-mini:nitro or google/gemini-2.0-flash

Minimal latency
Good for real-time applications

Cost-sensitive: google/gemini-2.0-flash:free or meta-llama/llama-3.1-70b:free

No cost with limits
Good for high-volume, lower-complexity tasks

Current information: anthropic/claude-3.5-sonnet:online or google/gemini-2.5-pro:online

Web search built-in
Real-time data

Large context: anthropic/claude-3.5-sonnet:extended or google/gemini-2.5-pro:extended

200K+ context windows
Document analysis, codebase understanding

Provider Routing Preferences

Default behavior: OpenRouter automatically selects best provider

Explicit provider order:

{
  provider: {
    order: ['anthropic', 'openai', 'google'],
    allow_fallbacks: true,
    sort: 'price' // 'price', 'latency', or 'throughput'
  }
}

When to set provider order:

Have preferred provider arrangements
Need to optimize for specific metric (cost, speed)
Want to exclude certain providers
Have BYOK (Bring Your Own Key) for specific providers

Model Fallbacks

Automatic fallback - try multiple models in order:

{
  models: [
    'anthropic/claude-3.5-sonnet',
    'openai/gpt-4o',
    'google/gemini-2.0-flash'
  ]
}

When to use fallbacks:

High reliability required
Multiple providers acceptable
Want graceful degradation
Avoid single point of failure

Fallback behavior:

Tries first model
Falls to next on error (5xx, 429, timeout)
Uses whichever succeeds
Returns which model was used in model field

Parameters You Need

Core Parameters

model (string, optional)

Which model to use
Default: user's default model
Always specify for consistency

messages (Message[], required)

Conversation history
Structure: { role: 'user'|'assistant'|'system', content: string | ContentPart[] }
For multimodal: content can be array of text and image_url parts

stream (boolean, default: false)

Enable Server-Sent Events streaming
Use for real-time responses

temperature (float, 0.0-2.0, default: 1.0)

Controls randomness
0.0-0.3: Deterministic, factual responses (code, precise answers)
0.4-0.7: Balanced (general use)
0.8-1.2: Creative (brainstorming, creative writing)
1.3-2.0: Highly creative, unpredictable (experimental)

max_tokens (integer, optional)

Maximum tokens to generate
Always set to control cost and prevent runaway responses
Typical: 100-500 for short, 1000-2000 for long responses
Model limit: context_length - prompt_length

top_p (float, 0.0-1.0, default: 1.0)

Nucleus sampling - limits to top probability mass
Use instead of temperature when you want predictable diversity
0.9-0.95: Common settings for quality

top_k (integer, 0+, default: 0/disabled)

Limit to K most likely tokens
1: Always most likely (deterministic)
40-50: Balanced
Not available for OpenAI models

Sampling Strategy Guidelines

For code generation: temperature: 0.1-0.3, top_p: 0.95 For factual responses: temperature: 0.0-0.2 For creative writing: temperature: 0.8-1.2 For brainstorming: temperature: 1.0-1.5 For chat: temperature: 0.6-0.8

Tool Calling Parameters

tools (Tool[], default: [])

Available functions for model to call
Structure:

{
  type: 'function',
  function: {
    name: 'function_name',
    description: 'What it does',
    parameters: { /* JSON Schema */ }
  }
}

tool_choice (string | object, default: 'auto')

Control when tools are called
'auto': Model decides (default)
'none': Never call tools
'required': Must call a tool
{ type: 'function', function: { name: 'specific_tool' } }: Force specific tool

parallel_tool_calls (boolean, default: true)

Allow multiple tools simultaneously
Set false for sequential execution

When to use tools:

Need to query external APIs (weather, search, database)
Need to perform calculations or data processing
Building agentic systems
Need structured data extraction

Structured Output Parameters

response_format (object, optional)

Enforce specific output format

JSON object mode:

{ type: 'json_object' }

Model returns valid JSON
Must also instruct model in system message

JSON Schema mode (strict):

{
  type: 'json_schema',
  json_schema: {
    name: 'schema_name',
    strict: true,
    schema: { /* JSON Schema */ }
  }
}

Model returns JSON matching exact schema
Use when structure is critical (APIs, data processing)

When to use structured outputs:

Need predictable response format
Integrating with systems (APIs, databases)
Data extraction
Form filling

Web Search Parameters

Enable via model variant (simplest):

{ model: 'anthropic/claude-3.5-sonnet:online' }

Enable via plugin:

{
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5
  }]
}

When to use web search:

Need current information (news, prices, events)
User asks about recent developments
Need factual verification
Topic requires real-time data

Other Important Parameters

user (string, optional)

Stable identifier for end-user
Set when you have user IDs
Helps with abuse detection and caching

session_id (string, optional)

Group related requests
Set for conversation tracking
Improves caching and observability

metadata (Record<string, string>, optional)

Custom metadata (max 16 key-value pairs)
Use for analytics and tracking
Keys: max 64 chars, Values: max 512 chars

stop (string | string[], optional)

Stop sequences to halt generation
Common: ['\n\n', '###', 'END']

Handling Responses

Non-Streaming Responses

Extract content:

const response = await fetch(/* ... */);
const data = await response.json();

const content = data.choices[0].message.content;
const finishReason = data.choices[0].finish_reason;
const usage = data.usage;

Check for tool calls:

const toolCalls = data.choices[0].message.tool_calls;
if (toolCalls) {
  // Model wants to call tools
  for (const toolCall of toolCalls) {
    const { name, arguments: args } = toolCall.function;
    const parsedArgs = JSON.parse(args);
    // Execute tool...
  }
}

Streaming Responses

Process SSE stream:

let fullContent = '';
const response = await fetch(/* ... */);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6);
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      fullContent += content;
      // Process incrementally...
    }

    // Handle usage in final chunk
    if (parsed.usage) {
      console.log('Usage:', parsed.usage);
    }
  }
}

Handle streaming tool calls:

// Tool calls stream across multiple chunks
let currentToolCall = null;
let toolArgs = '';

for (const parsed of chunks) {
  const toolCallChunk = parsed.choices?.[0]?.delta?.tool_calls?.[0];

  if (toolCallChunk?.function?.name) {
    currentToolCall = { id: toolCallChunk.id, ...toolCallChunk.function };
  }

  if (toolCallChunk?.function?.arguments) {
    toolArgs += toolCallChunk.function.arguments;
  }

  if (parsed.choices?.[0]?.finish_reason === 'tool_calls' && currentToolCall) {
    // Complete tool call
    currentToolCall.arguments = toolArgs;
    // Execute tool...
  }
}

Usage and Cost Tracking

const { usage } = data;
console.log(`Prompt: ${usage.prompt_tokens}`);
console.log(`Completion: ${usage.completion_tokens}`);
console.log(`Total: ${usage.total_tokens}`);

// Cost (if available)
if (usage.cost) {
  console.log(`Cost: $${usage.cost.toFixed(6)}`);
}

// Detailed breakdown
console.log(usage.prompt_tokens_details);
console.log(usage.completion_tokens_details);

Error Handling

Common HTTP Status Codes

400 Bad Request

Invalid request format
Missing required fields
Parameter out of range
Fix: Validate request structure and parameters

401 Unauthorized

Missing or invalid API key
Fix: Check API key format and permissions

403 Forbidden

Insufficient permissions
Model not allowed
Fix: Check guardrails, model access, API key permissions

402 Payment Required

Insufficient credits
Fix: Add credits to account

408 Request Timeout

Request took too long
Fix: Reduce prompt length, use streaming, try simpler model

429 Rate Limited

Too many requests
Fix: Implement exponential backoff, reduce request rate

502 Bad Gateway

Provider error
Fix: Use model fallbacks, retry with different model

503 Service Unavailable

Service overloaded
Fix: Retry with backoff, use fallbacks

Retry Strategy

Exponential backoff:

async function requestWithRetry(url, body, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await fetch(url, body);

      if (response.ok) {
        return await response.json();
      }

      // Retry on rate limit or server errors
      if (response.status === 429 || response.status >= 500) {
        const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }

      // Don't retry other errors
      return response;
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
      const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}

Retryable status codes: 408, 429, 502, 503 Do not retry: 400, 401, 403, 402

Graceful Degradation

Use model fallbacks:

{
  models: [
    'anthropic/claude-3.5-sonnet',  // Primary
    'openai/gpt-4o',                // Fallback 1
    'google/gemini-2.0-flash'        // Fallback 2
  ]
}

Handle partial failures:

Log errors but continue
Fall back to simpler features
Use cached responses when available
Provide degraded experience rather than failing completely

Advanced Features

When to Use Tool Calling

Good use cases:

Querying external APIs (weather, stock prices, databases)
Performing calculations or data processing
Extracting structured data from unstructured text
Building agentic systems with multiple steps
When decisions require external information

Implementation pattern:

Define tools with clear descriptions and parameters
Send request with tools array
Check if tool_calls present in response
Execute tools with parsed arguments
Send tool results back in a new request
Repeat until model provides final answer

See: references/ADVANCED_PATTERNS.md for complete agentic loop implementation

When to Use Structured Outputs

Good use cases:

API responses (need specific schema)
Data extraction (forms, documents)
Configuration files (JSON, YAML)
Database operations (structured queries)
When downstream processing requires specific format

Implementation pattern:

Define JSON Schema for desired output
Set response_format: { type: 'json_schema', json_schema: { ... } }
Instruct model to produce JSON (system or user message)
Validate response against schema
Handle parsing errors gracefully

Add response healing for robustness:

{
  response_format: { /* ... */ },
  plugins: [{ id: 'response-healing' }]
}

When to Use Web Search

Good use cases:

User asks about recent events, news, or current data
Need verification of facts
Questions with time-sensitive information
Topic requires up-to-date information
User explicitly requests current information

Simple implementation (variant):

{
  model: 'anthropic/claude-3.5-sonnet:online'
}

Advanced implementation (plugin):

{
  model: 'openrouter.ai/auto',
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5,
    engine: 'exa' // or 'native'
  }]
}

When to Use Multimodal Inputs

Images (vision):

OCR, image understanding, visual analysis
Models: openai/gpt-4o, anthropic/claude-3.5-sonnet, google/gemini-2.5-pro

Audio:

Speech-to-text, audio analysis
Models with audio support

Video:

Video understanding, frame analysis
Models with video support

PDFs:

Document parsing, content extraction
Requires file-parser plugin

Implementation: See references/ADVANCED_PATTERNS.md for multimodal patterns

Best Practices for AI

Default Model Selection

Start with: anthropic/claude-3.5-sonnet or openai/gpt-4o

Good balance of quality, speed, cost
Strong at most tasks
Wide compatibility

Switch based on needs:

Need speed → openai/gpt-4o-mini:nitro or google/gemini-2.0-flash
Complex reasoning → anthropic/claude-opus-4:thinking
Need web search → :online variant
Large context → :extended variant
Cost-sensitive → :free variant

Default Parameters

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [...],
  temperature: 0.6,  // Balanced creativity
  max_tokens: 1000,   // Reasonable length
  top_p: 0.95        // Common for quality
}

Adjust based on task:

Code: temperature: 0.2
Creative: temperature: 1.0
Factual: temperature: 0.0-0.3

When to Prefer Streaming

Always prefer streaming when:

User-facing (chat, interactive tools)
Response length unknown
Want progressive feedback
Latency matters

Use non-streaming when:

Batch processing
Need complete response before acting
Building API endpoints
Very short responses (< 50 tokens)

When to Enable Specific Features

Tools: Enable when you need external data or actions Structured outputs: Enable when response format matters Web search: Enable when current information needed Streaming: Enable for user-facing, real-time responses Model fallbacks: Enable when reliability critical Provider routing: Enable when you have preferences or constraints

Cost Optimization Patterns

Use free models for:

Testing and prototyping
Low-complexity tasks
High-volume, low-value operations

Use routing to optimize:

{
  provider: {
    order: ['openai', 'anthropic'],
    sort: 'price',  // Optimize for cost
    allow_fallbacks: true
  }
}

Set max_tokens to prevent runaway responses Use caching via user and session_id parameters Enable prompt caching when supported

Performance Optimization

Reduce latency:

Use :nitro variants for speed
Use streaming for perceived speed
Set user ID for caching benefits
Choose faster models (mini, flash) when quality allows

Increase throughput:

Use provider routing with sort: 'throughput'
Parallelize independent requests
Use streaming to reduce wait time

Optimize for specific metrics:

{
  provider: {
    sort: 'latency'  // or 'price' or 'throughput'
  }
}

Progressive Disclosure

For detailed reference information, consult:

Parameters Reference

File: references/PARAMETERS.md

Complete parameter reference (50+ parameters)
Types, ranges, defaults
Parameter support by model
Usage examples

Error Codes Reference

File: references/ERROR_CODES.md

All HTTP status codes
Error response structure
Error metadata types
Native finish reasons
Retry strategies

Model Selection Guide

File: references/MODEL_SELECTION.md

Model families and capabilities
Model variants explained
Selection criteria by use case
Model capability matrix
Provider routing preferences

Routing Strategies

File: references/ROUTING_STRATEGIES.md

Model fallbacks configuration
Provider selection patterns
Auto router setup
Routing by use case (cost, latency, quality)

Advanced Patterns

File: references/ADVANCED_PATTERNS.md

Tool calling with agentic loops
Structured outputs implementation
Web search integration
Multimodal handling
Streaming patterns
Framework integrations

Working Examples

File: references/EXAMPLES.md

TypeScript patterns for common tasks
Python examples
cURL examples
Advanced patterns
Framework integration examples

Ready-to-Use Templates

Directory: templates/

basic-request.ts - Minimal working request
streaming-request.ts - SSE streaming with cancellation
tool-calling.ts - Complete agentic loop with tools
structured-output.ts - JSON Schema enforcement
error-handling.ts - Robust retry logic

Quick Reference

Minimal Request

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: 'Your prompt' }]
}

With Streaming

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  stream: true
}

With Tools

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  tools: [{ type: 'function', function: { name, description, parameters } }],
  tool_choice: 'auto'
}

With Structured Output

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'system', content: 'Output JSON only...' }],
  response_format: { type: 'json_object' }
}

With Web Search

{
  model: 'anthropic/claude-3.5-sonnet:online',
  messages: [{ role: 'user', content: '...' }]
}

With Model Fallbacks

{
  models: ['anthropic/claude-3.5-sonnet', 'openai/gpt-4o'],
  messages: [{ role: 'user', content: '...' }]
}

Remember: OpenRouter is OpenAI-compatible. Use the OpenAI SDK with baseURL: 'https://openrouter.ai/api/v1' for a familiar experience.