text-to-speech
SKILL.md
Text-to-Speech — Bulbul
[!IMPORTANT] Auth:
api-subscription-keyheader — NOTAuthorization: Bearer. Base URL:https://api.sarvam.ai/v1
Model
bulbul:v3 — 11 languages, 30+ voices (default: shubh), REST/HTTP stream/WebSocket.
Quick Start (Python)
from sarvamai import SarvamAI
from sarvamai.play import save
client = SarvamAI()
response = client.text_to_speech.convert(
text="नमस्ते, आप कैसे हैं?",
target_language_code="hi-IN",
model="bulbul:v3",
speaker="shubh"
)
save(response, "output.wav")
# HTTP Stream (lower latency, binary audio)
chunks = []
for chunk in client.text_to_speech.convert_stream(
text="Hello from Sarvam AI",
target_language_code="en-IN",
speaker="shubh",
model="bulbul:v3"
):
chunks.append(chunk)
audio = b"".join(chunks)
Quick Start (JavaScript/TypeScript)
import { SarvamAIClient } from "sarvamai";
import { writeFile } from "fs/promises";
const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });
// REST
const response = await client.textToSpeech.convert({
text: "नमस्ते, आप कैसे हैं?",
target_language_code: "hi-IN",
model: "bulbul:v3",
speaker: "shubh"
});
// HTTP Stream (lower latency, returns BinaryResponse)
const streamResponse = await client.textToSpeech.convertStream({
text: "Hello from Sarvam AI",
target_language_code: "en-IN",
speaker: "shubh",
model: "bulbul:v3"
});
const bytes = await streamResponse.bytes();
await writeFile("output.wav", bytes);
WebSocket Streaming
import asyncio
from sarvamai import AsyncSarvamAI
async def tts_stream():
client = AsyncSarvamAI()
async with client.text_to_speech_streaming.connect(model="bulbul:v3") as ws:
await ws.configure(target_language_code="hi-IN", speaker="shubh")
await ws.convert("Your text here")
await ws.flush()
async for message in ws:
pass # base64 audio chunks
asyncio.run(tts_stream())
Character Limits
| Method | Max Text |
|---|---|
REST (convert) |
2,500 chars |
HTTP Stream (convert_stream) |
3,500 chars |
| WebSocket | 2,500 chars/msg |
Gotchas
| Gotcha | Detail |
|---|---|
| JS method name | client.textToSpeech.convert({...}) and .convertStream({...}) — camelCase. Stream returns BinaryResponse with .stream(), .bytes(), .blob(). |
pitch/loudness rejected |
SDK accepts these but API returns 400 for v3. Only pace (0.5–2.0) works. |
| v2 voices incompatible | anushka, abhilash, arya, etc. don't work with v3. Use shubh (default). |
| Sample rate >24kHz | 32kHz, 44.1kHz, 48kHz only via REST, not streaming. |
| REST response | Base64-encoded audio in response.audios[0]. Use sarvamai.play.save() or base64.b64decode(). |
| Pronunciation dictionary | dict_id param teaches custom word pronunciations. Create via client.pronunciation_dictionary.create(file=f). |
Full Docs
Fetch voice catalog, streaming protocol, pronunciation dictionary CRUD, and codec options from:
- https://docs.sarvam.ai/llms.txt — comprehensive docs index
- TTS Overview
- Voice Catalog
- HTTP Stream
- Pronunciation Dictionary
- Rate Limits
Weekly Installs
18
Repository
sarvamai/skillsGitHub Stars
45
First Seen
Feb 8, 2026
Security Audits
Installed on
gemini-cli18
amp18
github-copilot18
opencode18
codex18
kimi-cli18