chat
Chat Completions — Sarvam AI
[!IMPORTANT] Auth:
api-subscription-keyheader — NOTAuthorization: Bearer. Base URL:https://api.sarvam.ai/v1
Models
| Model | Context | Best For |
|---|---|---|
sarvam-105b |
128K | Complex reasoning, coding, agentic workflows |
sarvam-30b |
64K | Real-time chat, voice agents, conversational AI |
sarvam-105b-32k |
32K | Cost-efficient 105B |
sarvam-30b-16k |
16K | Cost-efficient 30B |
Quick Start (Python)
from sarvamai import SarvamAI
client = SarvamAI()
response = client.chat.completions(
model="sarvam-30b",
messages=[{"role": "user", "content": "भारत की राजधानी क्या है?"}]
)
print(response.choices[0].message.content)
Streaming (Python)
for chunk in client.chat.completions(
model="sarvam-30b",
messages=[{"role": "user", "content": "Write a poem about India"}],
stream=True
):
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Quick Start (JavaScript/TypeScript)
import { SarvamAIClient } from "sarvamai";
const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });
const response = await client.chat.completions({
model: "sarvam-30b",
messages: [{ role: "user", content: "भारत की राजधानी क्या है?" }]
});
console.log(response.choices[0].message.content);
OpenAI-Compatible (both languages)
from openai import OpenAI
client = OpenAI(api_key="your-key", base_url="https://api.sarvam.ai/v1")
response = client.chat.completions.create(model="sarvam-30b", messages=[...])
Gotchas
| Gotcha | Detail |
|---|---|
| SDK method | Python: client.chat.completions(...), JS: client.chat.completions({...}) — no .create() in either. OpenAI SDK uses .create() as usual. |
| JS constructor | new SarvamAIClient({ apiSubscriptionKey: "..." }) — NOT SarvamAI(). Key is passed explicitly. |
content can be None |
Models produce reasoning_content before content. If max_tokens is too low, reasoning consumes the budget and content is None. Omit max_tokens or set 500+. Check reasoning_content as fallback. |
| reasoning_effort | reasoning_effort="low"|"medium"|"high" for thinking mode. NOT thinking=True. |
Full Docs
Fetch detailed parameters, tool calling, streaming, and examples from:
- https://docs.sarvam.ai/llms.txt — comprehensive docs index
- Chat Completion Guide
- Model Specs
- Rate Limits
More from sarvamai/skills
speech-to-text
Transcribe audio to text using Sarvam AI's Saaras model. Handles speech recognition, transcription, and voice interfaces for 23 Indian languages. Supports 5 output modes, auto language detection, WebSocket streaming, and batch diarization. Use when converting speech to text or building voice-enabled apps.
168text-to-speech
Convert text to natural speech using Sarvam AI's Bulbul v3 model. Handles audio generation, voiceovers, and voice interfaces for 11 Indian languages with 30+ voices. Supports REST, HTTP streaming, WebSocket, and pronunciation dictionaries. Use when generating spoken audio from text.
55translate
Translate text between English and Indian languages using Sarvam AI (Sarvam-Translate, Mayura). Handles content translation and app localization across 22+ languages with mode control, script options, and numeral formats. Use when translating or localizing content for Indian users.
55voice-agents
Build conversational voice agents using Sarvam AI with LiveKit or Pipecat. Handles voice assistants, phone bots, IVR, and real-time conversational AI for Indian languages. Integrates Sarvam STT (Saaras v3), TTS (Bulbul v3), and LLM (Sarvam-30B) with low-latency streaming. Use when creating voice-enabled applications or real-time speech pipelines.
41