skills/sarvamai/skills/voice-agents

voice-agents

SKILL.md

Voice Agents — Sarvam AI

[!IMPORTANT] Auth: api-subscription-key header — NOT Authorization: Bearer. Env var: SARVAM_API_KEY

LiveKit Quick Start

pip install livekit-agents livekit-plugins-sarvam livekit-plugins-silero
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import sarvam, silero

class VoiceAssistant(Agent):
    def __init__(self):
        super().__init__(
            vad=silero.VAD.load(),
            stt=sarvam.STT(model="saaras:v3"),
            llm=sarvam.LLM(model="sarvam-30b"),
            tts=sarvam.TTS(model="bulbul:v3", voice="shubh")
        )

    async def on_enter(self, session: AgentSession):
        await session.say("नमस्ते! मैं आपकी कैसे मदद कर सकती हूं?")

async def entrypoint(ctx: JobContext):
    agent = VoiceAssistant()
    await agent.start(ctx)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Pipecat Quick Start

pip install pipecat-ai "pipecat-ai[sarvam,silero,daily]"
from pipecat.pipeline import Pipeline
from pipecat.services.sarvam import SarvamSTT, SarvamTTS, SarvamLLM
from pipecat.vad.silero import SileroVAD
from pipecat.transports.local import LocalAudioTransport

transport = LocalAudioTransport()
pipeline = Pipeline([
    transport.input(), SileroVAD(),
    SarvamSTT(model="saaras:v3"),
    SarvamLLM(model="sarvam-30b", system_prompt="You are a helpful voice assistant."),
    SarvamTTS(model="bulbul:v3", voice="shubh"),
    transport.output()
])

JavaScript/TypeScript Note

LiveKit and Pipecat agents are Python-only. For JS/TS voice pipelines, use the individual SDK methods directly:

import { SarvamAIClient } from "sarvamai";
const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });

// STT: client.speechToText.transcribe({...})
// TTS: client.textToSpeech.convertStream({...})  // returns BinaryResponse
// LLM: client.chat.completions({...})

Gotchas

Gotcha Detail
Use sarvam-30b Best latency for voice. Only use sarvam-105b when reasoning quality matters more than speed.
max_tokens budget Sarvam models reason internally. Don't set low max_tokens or content will be None. Omit or set 500+.
TTS pitch/loudness NOT supported on Bulbul v3 — API returns 400. Only pace works.
STT WebSocket codecs Only wav/pcm — no MP3/AAC/OGG for streaming.
HTTP Stream for TTS convert_stream returns binary audio directly (no base64), better for pipelines.

Full Docs

Fetch framework integration guides, environment setup, and advanced patterns from:

Weekly Installs
15
Repository
sarvamai/skills
GitHub Stars
45
First Seen
Feb 12, 2026
Installed on
opencode15
github-copilot15
codex15
kimi-cli15
gemini-cli15
amp15