litellm
LiteLLM
Expert guidance for unified LLM API access across providers.
Triggers
Use this skill when:
- Calling multiple LLM providers with a unified interface
- Building multi-provider AI applications
- Implementing fallbacks and retries across providers
- Deploying an LLM proxy or gateway
- Managing costs across different LLM providers
- Keywords: litellm, unified api, multi-provider, llm proxy, gateway, fallback
Installation
pip install litellm
Quick Start
from litellm import completion
# OpenAI
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
# Anthropic
response = completion(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Hello!"}]
)
# Azure OpenAI
response = completion(
model="azure/gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
api_base="https://my-resource.openai.azure.com",
api_key="your-key",
api_version="2024-02-01"
)
# Ollama
response = completion(
model="ollama/llama3.1",
messages=[{"role": "user", "content": "Hello!"}],
api_base="http://localhost:11434"
)
Streaming
from litellm import completion
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Async
import asyncio
from litellm import acompletion
async def main():
response = await acompletion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
asyncio.run(main())
Embeddings
from litellm import embedding
# OpenAI
response = embedding(
model="text-embedding-3-small",
input=["Hello world"]
)
# Cohere
response = embedding(
model="cohere/embed-english-v3.0",
input=["Hello world"]
)
embeddings = response.data[0].embedding
Function Calling
from litellm import completion
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
}]
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=tools,
tool_choice="auto"
)
if response.choices[0].message.tool_calls:
for tool_call in response.choices[0].message.tool_calls:
print(f"Function: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")
Fallbacks & Retries
from litellm import completion
import litellm
# Enable fallbacks
litellm.set_verbose = True
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
fallbacks=["claude-3-5-sonnet-20241022", "gpt-3.5-turbo"],
num_retries=3
)
Router (Load Balancing)
from litellm import Router
router = Router(
model_list=[
{
"model_name": "gpt-4",
"litellm_params": {
"model": "azure/gpt-4-deployment",
"api_base": "https://us-east.openai.azure.com",
"api_key": "key1"
}
},
{
"model_name": "gpt-4",
"litellm_params": {
"model": "azure/gpt-4-deployment",
"api_base": "https://us-west.openai.azure.com",
"api_key": "key2"
}
}
],
routing_strategy="least-busy" # or "simple-shuffle", "latency-based-routing"
)
response = router.completion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
Proxy Server
Configuration (config.yaml)
model_list:
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_base: https://my-resource.openai.azure.com
api_key: os.environ/AZURE_API_KEY
- model_name: claude
litellm_params:
model: claude-3-5-sonnet-20241022
api_key: os.environ/ANTHROPIC_API_KEY
litellm_settings:
drop_params: true
set_verbose: false
general_settings:
master_key: sk-1234
database_url: postgresql://user:pass@localhost/litellm
Run Proxy
# Start server
litellm --config config.yaml --port 4000
# Or with Docker
docker run -p 4000:4000 \
-v $(pwd)/config.yaml:/app/config.yaml \
ghcr.io/berriai/litellm:main-latest \
--config /app/config.yaml
Use Proxy
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:4000",
api_key="sk-1234"
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
Cost Tracking
from litellm import completion
import litellm
litellm.success_callback = ["langfuse"] # or "langsmith", "helicone"
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
# Access cost
print(f"Cost: ${response._hidden_params['response_cost']}")
Budget Management
from litellm import BudgetManager
budget = BudgetManager(project_name="my-project")
# Set budget
budget.create_budget(
total_budget=100, # $100
user="user-123",
duration="monthly"
)
# Check before request
if budget.get_current_cost("user-123") < budget.get_total_budget("user-123"):
response = completion(model="gpt-4o", messages=[...])
budget.update_cost(response._hidden_params['response_cost'], "user-123")
Model Aliases
import litellm
litellm.model_alias_map = {
"fast": "gpt-3.5-turbo",
"smart": "gpt-4o",
"cheap": "claude-3-haiku-20240307"
}
response = completion(
model="smart", # Uses gpt-4o
messages=[{"role": "user", "content": "Hello"}]
)
Resources
More from housegarofalo/claude-code-base
mqtt-iot
Configure MQTT brokers (Mosquitto, EMQX) for IoT messaging, device communication, and smart home integration. Manage topics, QoS levels, authentication, and bridging. Use when setting up IoT messaging, smart home communication, or device-to-cloud connectivity. (project)
22postgresql
Design, optimize, and manage PostgreSQL databases. Covers indexing, pgvector for AI embeddings, JSON operations, full-text search, and query optimization. Use when working with PostgreSQL, database design, or building data-intensive applications.
6home-assistant
Ultimate Home Assistant skill - complete administration, wireless protocols (Zigbee/ZHA/Z2M, Z-Wave JS, Thread, Matter), ESPHome device building, advanced troubleshooting, performance optimization, security hardening, custom integration development, and professional dashboard design. Covers configuration, REST API, automation debugging, database optimization, SSL/TLS, Jinja2 templating, and HACS custom cards. Use for any HA task.
6testing
Comprehensive testing skill covering unit, integration, and E2E testing with pytest, Jest, Cypress, and Playwright. Use for writing tests, improving coverage, debugging test failures, and setting up testing infrastructure.
5power-automate
Expert guidance for Power Automate development including cloud flows, desktop flows, Dataverse connector, expression functions, custom connectors, error handling, and child flow patterns. Use when building automated workflows, writing flow expressions, creating custom connectors from OpenAPI, or implementing error handling patterns.
5mobile-pwa
Build Progressive Web Apps with offline support, push notifications, and native-like experiences. Covers service workers, Web App Manifest, caching strategies, IndexedDB, background sync, and installability. Use for mobile-first web apps, offline-capable applications, and app-like experiences.
5