skills/sarvamai/skills/text-to-speech

text-to-speech

SKILL.md

Text-to-Speech — Bulbul

[!IMPORTANT] Auth: api-subscription-key header — NOT Authorization: Bearer. Base URL: https://api.sarvam.ai/v1

Model

bulbul:v3 — 11 languages, 30+ voices (default: shubh), REST/HTTP stream/WebSocket.

Quick Start (Python)

from sarvamai import SarvamAI
from sarvamai.play import save

client = SarvamAI()

response = client.text_to_speech.convert(
    text="नमस्ते, आप कैसे हैं?",
    target_language_code="hi-IN",
    model="bulbul:v3",
    speaker="shubh"
)
save(response, "output.wav")

# HTTP Stream (lower latency, binary audio)
chunks = []
for chunk in client.text_to_speech.convert_stream(
    text="Hello from Sarvam AI",
    target_language_code="en-IN",
    speaker="shubh",
    model="bulbul:v3"
):
    chunks.append(chunk)
audio = b"".join(chunks)

Quick Start (JavaScript/TypeScript)

import { SarvamAIClient } from "sarvamai";
import { writeFile } from "fs/promises";

const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });

// REST
const response = await client.textToSpeech.convert({
    text: "नमस्ते, आप कैसे हैं?",
    target_language_code: "hi-IN",
    model: "bulbul:v3",
    speaker: "shubh"
});

// HTTP Stream (lower latency, returns BinaryResponse)
const streamResponse = await client.textToSpeech.convertStream({
    text: "Hello from Sarvam AI",
    target_language_code: "en-IN",
    speaker: "shubh",
    model: "bulbul:v3"
});
const bytes = await streamResponse.bytes();
await writeFile("output.wav", bytes);

WebSocket Streaming

import asyncio
from sarvamai import AsyncSarvamAI

async def tts_stream():
    client = AsyncSarvamAI()
    async with client.text_to_speech_streaming.connect(model="bulbul:v3") as ws:
        await ws.configure(target_language_code="hi-IN", speaker="shubh")
        await ws.convert("Your text here")
        await ws.flush()
        async for message in ws:
            pass  # base64 audio chunks

asyncio.run(tts_stream())

Character Limits

Method Max Text
REST (convert) 2,500 chars
HTTP Stream (convert_stream) 3,500 chars
WebSocket 2,500 chars/msg

Gotchas

Gotcha Detail
JS method name client.textToSpeech.convert({...}) and .convertStream({...}) — camelCase. Stream returns BinaryResponse with .stream(), .bytes(), .blob().
pitch/loudness rejected SDK accepts these but API returns 400 for v3. Only pace (0.5–2.0) works.
v2 voices incompatible anushka, abhilash, arya, etc. don't work with v3. Use shubh (default).
Sample rate >24kHz 32kHz, 44.1kHz, 48kHz only via REST, not streaming.
REST response Base64-encoded audio in response.audios[0]. Use sarvamai.play.save() or base64.b64decode().
Pronunciation dictionary dict_id param teaches custom word pronunciations. Create via client.pronunciation_dictionary.create(file=f).

Full Docs

Fetch voice catalog, streaming protocol, pronunciation dictionary CRUD, and codec options from:

Weekly Installs
18
Repository
sarvamai/skills
GitHub Stars
45
First Seen
Feb 8, 2026
Installed on
gemini-cli18
amp18
github-copilot18
opencode18
codex18
kimi-cli18