voice-agents
Voice Agents — Sarvam AI
[!IMPORTANT] Auth:
api-subscription-keyheader — NOTAuthorization: Bearer. Env var:SARVAM_API_KEY
LiveKit Quick Start
pip install livekit-agents livekit-plugins-sarvam livekit-plugins-silero
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import sarvam, silero
class VoiceAssistant(Agent):
def __init__(self):
super().__init__(
vad=silero.VAD.load(),
stt=sarvam.STT(model="saaras:v3"),
llm=sarvam.LLM(model="sarvam-30b"),
tts=sarvam.TTS(model="bulbul:v3", voice="shubh")
)
async def on_enter(self, session: AgentSession):
await session.say("नमस्ते! मैं आपकी कैसे मदद कर सकती हूं?")
async def entrypoint(ctx: JobContext):
agent = VoiceAssistant()
await agent.start(ctx)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
Pipecat Quick Start
pip install pipecat-ai "pipecat-ai[sarvam,silero,daily]"
from pipecat.pipeline import Pipeline
from pipecat.services.sarvam import SarvamSTT, SarvamTTS, SarvamLLM
from pipecat.vad.silero import SileroVAD
from pipecat.transports.local import LocalAudioTransport
transport = LocalAudioTransport()
pipeline = Pipeline([
transport.input(), SileroVAD(),
SarvamSTT(model="saaras:v3"),
SarvamLLM(model="sarvam-30b", system_prompt="You are a helpful voice assistant."),
SarvamTTS(model="bulbul:v3", voice="shubh"),
transport.output()
])
JavaScript/TypeScript Note
LiveKit and Pipecat agents are Python-only. For JS/TS voice pipelines, use the individual SDK methods directly:
import { SarvamAIClient } from "sarvamai";
const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });
// STT: client.speechToText.transcribe({...})
// TTS: client.textToSpeech.convertStream({...}) // returns BinaryResponse
// LLM: client.chat.completions({...})
Gotchas
| Gotcha | Detail |
|---|---|
Use sarvam-30b |
Best latency for voice. Only use sarvam-105b when reasoning quality matters more than speed. |
max_tokens budget |
Sarvam models reason internally. Don't set low max_tokens or content will be None. Omit or set 500+. |
| TTS pitch/loudness | NOT supported on Bulbul v3 — API returns 400. Only pace works. |
| STT WebSocket codecs | Only wav/pcm — no MP3/AAC/OGG for streaming. |
| HTTP Stream for TTS | convert_stream returns binary audio directly (no base64), better for pipelines. |
Full Docs
Fetch framework integration guides, environment setup, and advanced patterns from:
- https://docs.sarvam.ai/llms.txt — comprehensive docs index
- LiveKit Guide
- Pipecat Guide
- Rate Limits
More from sarvamai/skills
speech-to-text
Transcribe audio to text using Sarvam AI's Saaras model. Handles speech recognition, transcription, and voice interfaces for 23 Indian languages. Supports 5 output modes, auto language detection, WebSocket streaming, and batch diarization. Use when converting speech to text or building voice-enabled apps.
168text-to-speech
Convert text to natural speech using Sarvam AI's Bulbul v3 model. Handles audio generation, voiceovers, and voice interfaces for 11 Indian languages with 30+ voices. Supports REST, HTTP streaming, WebSocket, and pronunciation dictionaries. Use when generating spoken audio from text.
55translate
Translate text between English and Indian languages using Sarvam AI (Sarvam-Translate, Mayura). Handles content translation and app localization across 22+ languages with mode control, script options, and numeral formats. Use when translating or localizing content for Indian users.
55chat
Chat completions using Sarvam AI LLMs (Sarvam-105B, Sarvam-30B). Handles AI chat, text generation, reasoning, coding, and multilingual conversations in Indian languages. OpenAI-compatible API. Use when building chatbots, Q&A systems, agents, or any LLM feature targeting Indian users.
44