openrouter
OpenRouter API for AI Agents
Expert guidance for AI agents integrating with OpenRouter API - unified access to 400+ models from 90+ providers.
When to use this skill:
- Making chat completions via OpenRouter API
- Selecting appropriate models and variants
- Implementing streaming responses
- Using tool/function calling
- Enforcing structured outputs
- Integrating web search
- Handling multimodal inputs (images, audio, video, PDFs)
- Managing model routing and fallbacks
- Handling errors and retries
- Optimizing cost and performance
API Basics
Making a Request
Endpoint: POST https://openrouter.ai/api/v1/chat/completions
Headers (required):
{
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
// Optional: for app attribution
'HTTP-Referer': 'https://your-app.com',
'X-Title': 'Your App Name'
}
Minimal request structure:
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [
{ role: 'user', content: 'Your prompt here' }
]
})
});
Response Structure
Non-streaming response:
{
"id": "gen-abc123",
"choices": [{
"message": {
"role": "assistant",
"content": "Response text here"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
},
"model": "anthropic/claude-3.5-sonnet"
}
Key fields:
choices[0].message.content- The assistant's responsechoices[0].finish_reason- Why generation stopped (stop, length, tool_calls, etc.)usage- Token counts and cost informationmodel- Actual model used (may differ from requested)
When to Use Streaming vs Non-Streaming
Use streaming (stream: true) when:
- Real-time responses needed (chat interfaces, interactive tools)
- Latency matters (user-facing applications)
- Large responses expected (long-form content)
- Want to show progressive output
Use non-streaming when:
- Processing in background (batch jobs, async tasks)
- Need complete response before processing
- Building to an API/endpoint
- Response is short (few tokens)
Streaming basics:
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
stream: true
})
});
for await (const chunk of response.body) {
const text = new TextDecoder().decode(chunk);
const lines = text.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6); // Remove 'data: '
if (data === '[DONE]') break;
const parsed = JSON.parse(data);
const content = parsed.choices?.[0]?.delta?.content;
if (content) {
// Accumulate or display content
}
}
}
Model Selection
Model Identifier Format
Format: provider/model-name[:variant]
Examples:
anthropic/claude-3.5-sonnet- Specific modelopenai/gpt-4o:online- With web search enabledgoogle/gemini-2.0-flash:free- Free tier variant
Model Variants and When to Use Them
| Variant | Use When | Tradeoffs |
|---|---|---|
:free |
Cost is primary concern, testing, prototyping | Rate limits, lower quality models |
:online |
Need current information, real-time data | Higher cost, web search latency |
:extended |
Large context window needed | May be slower, higher cost |
:thinking |
Complex reasoning, multi-step problems | Higher token usage, slower |
:nitro |
Speed is critical | May have quality tradeoffs |
:exacto |
Need specific provider | No fallbacks, may be less available |
Default Model Choices by Task
General purpose: anthropic/claude-3.5-sonnet or openai/gpt-4o
- Balanced quality, speed, cost
- Good for most tasks
Coding: anthropic/claude-3.5-sonnet or openai/gpt-4o
- Strong code generation and understanding
- Good reasoning
Complex reasoning: anthropic/claude-opus-4:thinking or openai/o3
- Deep reasoning capabilities
- Higher cost, slower
Fast responses: openai/gpt-4o-mini:nitro or google/gemini-2.0-flash
- Minimal latency
- Good for real-time applications
Cost-sensitive: google/gemini-2.0-flash:free or meta-llama/llama-3.1-70b:free
- No cost with limits
- Good for high-volume, lower-complexity tasks
Current information: anthropic/claude-3.5-sonnet:online or google/gemini-2.5-pro:online
- Web search built-in
- Real-time data
Large context: anthropic/claude-3.5-sonnet:extended or google/gemini-2.5-pro:extended
- 200K+ context windows
- Document analysis, codebase understanding
Provider Routing Preferences
Default behavior: OpenRouter automatically selects best provider
Explicit provider order:
{
provider: {
order: ['anthropic', 'openai', 'google'],
allow_fallbacks: true,
sort: 'price' // 'price', 'latency', or 'throughput'
}
}
When to set provider order:
- Have preferred provider arrangements
- Need to optimize for specific metric (cost, speed)
- Want to exclude certain providers
- Have BYOK (Bring Your Own Key) for specific providers
Model Fallbacks
Automatic fallback - try multiple models in order:
{
models: [
'anthropic/claude-3.5-sonnet',
'openai/gpt-4o',
'google/gemini-2.0-flash'
]
}
When to use fallbacks:
- High reliability required
- Multiple providers acceptable
- Want graceful degradation
- Avoid single point of failure
Fallback behavior:
- Tries first model
- Falls to next on error (5xx, 429, timeout)
- Uses whichever succeeds
- Returns which model was used in
modelfield
Parameters You Need
Core Parameters
model (string, optional)
- Which model to use
- Default: user's default model
- Always specify for consistency
messages (Message[], required)
- Conversation history
- Structure:
{ role: 'user'|'assistant'|'system', content: string | ContentPart[] } - For multimodal: content can be array of text and image_url parts
stream (boolean, default: false)
- Enable Server-Sent Events streaming
- Use for real-time responses
temperature (float, 0.0-2.0, default: 1.0)
- Controls randomness
- 0.0-0.3: Deterministic, factual responses (code, precise answers)
- 0.4-0.7: Balanced (general use)
- 0.8-1.2: Creative (brainstorming, creative writing)
- 1.3-2.0: Highly creative, unpredictable (experimental)
max_tokens (integer, optional)
- Maximum tokens to generate
- Always set to control cost and prevent runaway responses
- Typical: 100-500 for short, 1000-2000 for long responses
- Model limit: context_length - prompt_length
top_p (float, 0.0-1.0, default: 1.0)
- Nucleus sampling - limits to top probability mass
- Use instead of temperature when you want predictable diversity
- 0.9-0.95: Common settings for quality
top_k (integer, 0+, default: 0/disabled)
- Limit to K most likely tokens
- 1: Always most likely (deterministic)
- 40-50: Balanced
- Not available for OpenAI models
Sampling Strategy Guidelines
For code generation: temperature: 0.1-0.3, top_p: 0.95
For factual responses: temperature: 0.0-0.2
For creative writing: temperature: 0.8-1.2
For brainstorming: temperature: 1.0-1.5
For chat: temperature: 0.6-0.8
Tool Calling Parameters
tools (Tool[], default: [])
- Available functions for model to call
- Structure:
{
type: 'function',
function: {
name: 'function_name',
description: 'What it does',
parameters: { /* JSON Schema */ }
}
}
tool_choice (string | object, default: 'auto')
- Control when tools are called
'auto': Model decides (default)'none': Never call tools'required': Must call a tool{ type: 'function', function: { name: 'specific_tool' } }: Force specific tool
parallel_tool_calls (boolean, default: true)
- Allow multiple tools simultaneously
- Set
falsefor sequential execution
When to use tools:
- Need to query external APIs (weather, search, database)
- Need to perform calculations or data processing
- Building agentic systems
- Need structured data extraction
Structured Output Parameters
response_format (object, optional)
- Enforce specific output format
JSON object mode:
{ type: 'json_object' }
- Model returns valid JSON
- Must also instruct model in system message
JSON Schema mode (strict):
{
type: 'json_schema',
json_schema: {
name: 'schema_name',
strict: true,
schema: { /* JSON Schema */ }
}
}
- Model returns JSON matching exact schema
- Use when structure is critical (APIs, data processing)
When to use structured outputs:
- Need predictable response format
- Integrating with systems (APIs, databases)
- Data extraction
- Form filling
Web Search Parameters
Enable via model variant (simplest):
{ model: 'anthropic/claude-3.5-sonnet:online' }
Enable via plugin:
{
plugins: [{
id: 'web',
enabled: true,
max_results: 5
}]
}
When to use web search:
- Need current information (news, prices, events)
- User asks about recent developments
- Need factual verification
- Topic requires real-time data
Other Important Parameters
user (string, optional)
- Stable identifier for end-user
- Set when you have user IDs
- Helps with abuse detection and caching
session_id (string, optional)
- Group related requests
- Set for conversation tracking
- Improves caching and observability
metadata (Record<string, string>, optional)
- Custom metadata (max 16 key-value pairs)
- Use for analytics and tracking
- Keys: max 64 chars, Values: max 512 chars
stop (string | string[], optional)
- Stop sequences to halt generation
- Common:
['\n\n', '###', 'END']
Handling Responses
Non-Streaming Responses
Extract content:
const response = await fetch(/* ... */);
const data = await response.json();
const content = data.choices[0].message.content;
const finishReason = data.choices[0].finish_reason;
const usage = data.usage;
Check for tool calls:
const toolCalls = data.choices[0].message.tool_calls;
if (toolCalls) {
// Model wants to call tools
for (const toolCall of toolCalls) {
const { name, arguments: args } = toolCall.function;
const parsedArgs = JSON.parse(args);
// Execute tool...
}
}
Streaming Responses
Process SSE stream:
let fullContent = '';
const response = await fetch(/* ... */);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6);
if (data === '[DONE]') break;
const parsed = JSON.parse(data);
const content = parsed.choices?.[0]?.delta?.content;
if (content) {
fullContent += content;
// Process incrementally...
}
// Handle usage in final chunk
if (parsed.usage) {
console.log('Usage:', parsed.usage);
}
}
}
Handle streaming tool calls:
// Tool calls stream across multiple chunks
let currentToolCall = null;
let toolArgs = '';
for (const parsed of chunks) {
const toolCallChunk = parsed.choices?.[0]?.delta?.tool_calls?.[0];
if (toolCallChunk?.function?.name) {
currentToolCall = { id: toolCallChunk.id, ...toolCallChunk.function };
}
if (toolCallChunk?.function?.arguments) {
toolArgs += toolCallChunk.function.arguments;
}
if (parsed.choices?.[0]?.finish_reason === 'tool_calls' && currentToolCall) {
// Complete tool call
currentToolCall.arguments = toolArgs;
// Execute tool...
}
}
Usage and Cost Tracking
const { usage } = data;
console.log(`Prompt: ${usage.prompt_tokens}`);
console.log(`Completion: ${usage.completion_tokens}`);
console.log(`Total: ${usage.total_tokens}`);
// Cost (if available)
if (usage.cost) {
console.log(`Cost: $${usage.cost.toFixed(6)}`);
}
// Detailed breakdown
console.log(usage.prompt_tokens_details);
console.log(usage.completion_tokens_details);
Error Handling
Common HTTP Status Codes
400 Bad Request
- Invalid request format
- Missing required fields
- Parameter out of range
- Fix: Validate request structure and parameters
401 Unauthorized
- Missing or invalid API key
- Fix: Check API key format and permissions
403 Forbidden
- Insufficient permissions
- Model not allowed
- Fix: Check guardrails, model access, API key permissions
402 Payment Required
- Insufficient credits
- Fix: Add credits to account
408 Request Timeout
- Request took too long
- Fix: Reduce prompt length, use streaming, try simpler model
429 Rate Limited
- Too many requests
- Fix: Implement exponential backoff, reduce request rate
502 Bad Gateway
- Provider error
- Fix: Use model fallbacks, retry with different model
503 Service Unavailable
- Service overloaded
- Fix: Retry with backoff, use fallbacks
Retry Strategy
Exponential backoff:
async function requestWithRetry(url, body, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetch(url, body);
if (response.ok) {
return await response.json();
}
// Retry on rate limit or server errors
if (response.status === 429 || response.status >= 500) {
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
// Don't retry other errors
return response;
} catch (error) {
if (attempt === maxRetries - 1) throw error;
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
}
Retryable status codes: 408, 429, 502, 503 Do not retry: 400, 401, 403, 402
Graceful Degradation
Use model fallbacks:
{
models: [
'anthropic/claude-3.5-sonnet', // Primary
'openai/gpt-4o', // Fallback 1
'google/gemini-2.0-flash' // Fallback 2
]
}
Handle partial failures:
- Log errors but continue
- Fall back to simpler features
- Use cached responses when available
- Provide degraded experience rather than failing completely
Advanced Features
When to Use Tool Calling
Good use cases:
- Querying external APIs (weather, stock prices, databases)
- Performing calculations or data processing
- Extracting structured data from unstructured text
- Building agentic systems with multiple steps
- When decisions require external information
Implementation pattern:
- Define tools with clear descriptions and parameters
- Send request with
toolsarray - Check if
tool_callspresent in response - Execute tools with parsed arguments
- Send tool results back in a new request
- Repeat until model provides final answer
See: references/ADVANCED_PATTERNS.md for complete agentic loop implementation
When to Use Structured Outputs
Good use cases:
- API responses (need specific schema)
- Data extraction (forms, documents)
- Configuration files (JSON, YAML)
- Database operations (structured queries)
- When downstream processing requires specific format
Implementation pattern:
- Define JSON Schema for desired output
- Set
response_format: { type: 'json_schema', json_schema: { ... } } - Instruct model to produce JSON (system or user message)
- Validate response against schema
- Handle parsing errors gracefully
Add response healing for robustness:
{
response_format: { /* ... */ },
plugins: [{ id: 'response-healing' }]
}
When to Use Web Search
Good use cases:
- User asks about recent events, news, or current data
- Need verification of facts
- Questions with time-sensitive information
- Topic requires up-to-date information
- User explicitly requests current information
Simple implementation (variant):
{
model: 'anthropic/claude-3.5-sonnet:online'
}
Advanced implementation (plugin):
{
model: 'openrouter.ai/auto',
plugins: [{
id: 'web',
enabled: true,
max_results: 5,
engine: 'exa' // or 'native'
}]
}
When to Use Multimodal Inputs
Images (vision):
- OCR, image understanding, visual analysis
- Models:
openai/gpt-4o,anthropic/claude-3.5-sonnet,google/gemini-2.5-pro
Audio:
- Speech-to-text, audio analysis
- Models with audio support
Video:
- Video understanding, frame analysis
- Models with video support
PDFs:
- Document parsing, content extraction
- Requires
file-parserplugin
Implementation: See references/ADVANCED_PATTERNS.md for multimodal patterns
Best Practices for AI
Default Model Selection
Start with: anthropic/claude-3.5-sonnet or openai/gpt-4o
- Good balance of quality, speed, cost
- Strong at most tasks
- Wide compatibility
Switch based on needs:
- Need speed →
openai/gpt-4o-mini:nitroorgoogle/gemini-2.0-flash - Complex reasoning →
anthropic/claude-opus-4:thinking - Need web search →
:onlinevariant - Large context →
:extendedvariant - Cost-sensitive →
:freevariant
Default Parameters
{
model: 'anthropic/claude-3.5-sonnet',
messages: [...],
temperature: 0.6, // Balanced creativity
max_tokens: 1000, // Reasonable length
top_p: 0.95 // Common for quality
}
Adjust based on task:
- Code:
temperature: 0.2 - Creative:
temperature: 1.0 - Factual:
temperature: 0.0-0.3
When to Prefer Streaming
Always prefer streaming when:
- User-facing (chat, interactive tools)
- Response length unknown
- Want progressive feedback
- Latency matters
Use non-streaming when:
- Batch processing
- Need complete response before acting
- Building API endpoints
- Very short responses (< 50 tokens)
When to Enable Specific Features
Tools: Enable when you need external data or actions Structured outputs: Enable when response format matters Web search: Enable when current information needed Streaming: Enable for user-facing, real-time responses Model fallbacks: Enable when reliability critical Provider routing: Enable when you have preferences or constraints
Cost Optimization Patterns
Use free models for:
- Testing and prototyping
- Low-complexity tasks
- High-volume, low-value operations
Use routing to optimize:
{
provider: {
order: ['openai', 'anthropic'],
sort: 'price', // Optimize for cost
allow_fallbacks: true
}
}
Set max_tokens to prevent runaway responses
Use caching via user and session_id parameters
Enable prompt caching when supported
Performance Optimization
Reduce latency:
- Use
:nitrovariants for speed - Use streaming for perceived speed
- Set
userID for caching benefits - Choose faster models (mini, flash) when quality allows
Increase throughput:
- Use provider routing with
sort: 'throughput' - Parallelize independent requests
- Use streaming to reduce wait time
Optimize for specific metrics:
{
provider: {
sort: 'latency' // or 'price' or 'throughput'
}
}
Progressive Disclosure
For detailed reference information, consult:
Parameters Reference
File: references/PARAMETERS.md
- Complete parameter reference (50+ parameters)
- Types, ranges, defaults
- Parameter support by model
- Usage examples
Error Codes Reference
File: references/ERROR_CODES.md
- All HTTP status codes
- Error response structure
- Error metadata types
- Native finish reasons
- Retry strategies
Model Selection Guide
File: references/MODEL_SELECTION.md
- Model families and capabilities
- Model variants explained
- Selection criteria by use case
- Model capability matrix
- Provider routing preferences
Routing Strategies
File: references/ROUTING_STRATEGIES.md
- Model fallbacks configuration
- Provider selection patterns
- Auto router setup
- Routing by use case (cost, latency, quality)
Advanced Patterns
File: references/ADVANCED_PATTERNS.md
- Tool calling with agentic loops
- Structured outputs implementation
- Web search integration
- Multimodal handling
- Streaming patterns
- Framework integrations
Working Examples
File: references/EXAMPLES.md
- TypeScript patterns for common tasks
- Python examples
- cURL examples
- Advanced patterns
- Framework integration examples
Ready-to-Use Templates
Directory: templates/
basic-request.ts- Minimal working requeststreaming-request.ts- SSE streaming with cancellationtool-calling.ts- Complete agentic loop with toolsstructured-output.ts- JSON Schema enforcementerror-handling.ts- Robust retry logic
Quick Reference
Minimal Request
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: 'Your prompt' }]
}
With Streaming
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
stream: true
}
With Tools
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
tools: [{ type: 'function', function: { name, description, parameters } }],
tool_choice: 'auto'
}
With Structured Output
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'system', content: 'Output JSON only...' }],
response_format: { type: 'json_object' }
}
With Web Search
{
model: 'anthropic/claude-3.5-sonnet:online',
messages: [{ role: 'user', content: '...' }]
}
With Model Fallbacks
{
models: ['anthropic/claude-3.5-sonnet', 'openai/gpt-4o'],
messages: [{ role: 'user', content: '...' }]
}
Remember: OpenRouter is OpenAI-compatible. Use the OpenAI SDK with baseURL: 'https://openrouter.ai/api/v1' for a familiar experience.