assemblyai
AssemblyAI Speech-to-Text and Voice AI
AssemblyAI provides speech-to-text APIs, audio intelligence models, and an LLM Gateway for applying language models to transcripts. This skill corrects common mistakes that training data gets wrong — deprecated APIs, discontinued SDKs, and non-obvious auth patterns.
Authentication
All endpoints use the same header:
Authorization: YOUR_API_KEY
NOT Authorization: Bearer ... — just the raw API key, no Bearer prefix. This is the #1 mistake.
Base URLs
| Service | US | EU |
|---|---|---|
| REST API | https://api.assemblyai.com |
https://api.eu.assemblyai.com |
| LLM Gateway | https://llm-gateway.assemblyai.com/v1 |
https://llm-gateway.eu.assemblyai.com/v1 |
| Streaming v3 | wss://streaming.assemblyai.com/v3/ws |
wss://streaming.eu.assemblyai.com/v3/ws |
| Streaming v2 (legacy) | wss://api.assemblyai.com/v2/realtime/ws |
— |
| Voice Agent API | wss://agents.assemblyai.com/v1/ws |
— |
SDKs
| Language | Package | Status |
|---|---|---|
| Python | pip install assemblyai |
Active |
| JavaScript/TypeScript | npm i assemblyai |
Active |
| Ruby | assemblyai gem |
Active |
| Java | assemblyai-java-sdk |
Discontinued April 2025 |
| Go | assemblyai-go-sdk |
Discontinued April 2025 |
| C# .NET | AssemblyAI NuGet |
Discontinued April 2025 |
Only Python, JS/TS, and Ruby SDKs are maintained. For Java, Go, or C#, use the REST API directly.
Speech-to-Text Models
Pre-Recorded
| Model | Languages | Best For |
|---|---|---|
| Universal-3 Pro | 6 (en, es, de, fr, pt, it) | Highest accuracy, promptable transcription, keyterms up to 1,000 words |
| Universal-2 | 99 | Broadest language coverage, keyterms up to 200 words |
Use speech_models as a priority list with fallback: ["universal-3-pro", "universal-2"].
Streaming
| Model | Languages | Best For |
|---|---|---|
| universal-streaming-english | 1 (English) | Voice agents, ~300ms latency |
| universal-streaming-multilingual | 6 | Per-utterance language detection |
| whisper-rt | 99+ | Broadest streaming language support, auto-detect only |
| u3-rt-pro | 6 | Voice agents — punctuation-based turn detection, promptable |
Medical Mode (Add-On)
domain: "medical-v1" enables Medical Mode — an add-on that improves accuracy for medical terminology (medications, procedures, conditions, dosages). Works with both pre-recorded and streaming models.
- Pre-recorded: Universal-3 Pro (
domain: "medical-v1"in request body), Universal-2 - Streaming: u3-rt-pro, universal-streaming-english, universal-streaming-multilingual
- Supported languages: English, Spanish, German, French (4 languages only)
- Billed as a separate add-on. If used with an unsupported language, the API ignores
domainand returns a warning — transcript still completes and you are NOT charged for Medical Mode.
Prompting (Universal-3 Pro only)
Two mutually exclusive customization parameters:
prompt(string, up to 1500 words): Natural language instructions for transcription stylekeyterms_prompt(string[], up to 1000 terms): Domain vocabulary for proper nouns, brands, technical terms
Prompting best practices:
- Use positive, authoritative instructions — NEVER use negative phrasing ("Don't", "Avoid", "Never") as the model gets confused
- Limit to 3-6 instructions for best results
- Prefix critical instructions with "Non-negotiable:" or "Required:"
LeMUR is Deprecated
LeMUR is deprecated (sunset March 31, 2026 — already sunset). Use the LLM Gateway instead. The LLM Gateway is an OpenAI-compatible API. Key difference: you pass transcript text directly in messages (no transcript_ids). Transcribe first, then include transcript.text in your prompt.
See references/llm-gateway.md for models, tool calling, structured outputs, and examples.
Key Gotchas
| Gotcha | Details |
|---|---|
prompt + keyterms_prompt |
Mutually exclusive — use one or the other |
summarization / auto_chapters |
Deprecated. Use LLM Gateway instead (transcribe → send text to LLM) |
| PII redaction scope | Only redacts words in text — other feature outputs (entities, summaries) may still expose sensitive data |
| Upload key scoping | Files uploaded with one API key project cannot be transcribed with a different project's key |
| Structured outputs | Supported by OpenAI, Gemini, Claude 4.5+, Qwen, and Kimi — Claude 3.x does NOT support json_schema structured outputs |
| U3 Pro turn detection | Uses punctuation (. ? !), NOT confidence thresholds — end_of_turn_confidence_threshold has no effect |
| Negative prompts | Never use "Don't" or "Avoid" in prompts — rephrase as positive instructions |
| PII audio redaction method | override_audio_redaction_method: "silence" replaces PII with silence instead of default beep |
| Language detection | Requires minimum 15 seconds of spoken audio for reliable results |
| LLM Gateway EU region | Only Anthropic Claude and Google Gemini models available — OpenAI models are NOT supported in EU |
| Disfluencies | disfluencies: true works on Universal-2 only; for U3 Pro, use prompting instead |
| Medical Mode unsupported language | API silently skips Medical Mode and does not charge for it — check for warning in response |
| Voice Agent API URL | The Voice Agent endpoint is wss://agents.assemblyai.com/v1/ws — NOT /v1/voice (renamed April 2026), /v1/realtime (older), or speech-to-speech.us.assemblyai.com (very old) |
Voice Agent tool.call field |
The argument dict is named arguments, not args (renamed April 2026) |
| Voice Agent turn detection fields | Use min_silence (default 600ms) and max_silence (default 1500ms) under session.input.turn_detection — min_turn_silence/max_turn_silence are the streaming/LiveKit/Pipecat field names, not Voice Agent API |
Common Mistakes
| Mistake | Correction |
|---|---|
Authorization: Bearer KEY |
Authorization: KEY (no Bearer prefix) — BUT the Voice Agent API (agents.assemblyai.com) uses Authorization: Bearer KEY |
| Using LeMUR API | Deprecated. Use LLM Gateway instead |
Using summarization or auto_chapters |
Deprecated. Use LLM Gateway instead (transcribe then summarize via LLM) |
LeMUR transcript_ids with LLM Gateway |
Pass transcript text in messages, not IDs |
anthropic/claude-... model IDs |
No provider prefix: claude-sonnet-4-5-20250929 not anthropic/claude-sonnet-4-5-20250929 |
| Using Java/Go/C# SDKs | Discontinued. Use Python, JS/TS, Ruby, or raw API |
word_boost parameter |
Use keyterms_prompt instead |
| Hardcoding v2 streaming URL | v3 (/v3/ws) is current; v2 still works but is legacy |
Omitting speech_models / speech_model |
Required — no default exists. Omitting causes the request to fail. Use ["universal-3-pro", "universal-2"] for pre-recorded, "u3-rt-pro" for streaming |
aai.SpeechModel.universal_3_pro in Python SDK |
Use raw strings: "universal-3-pro", "universal-2" — these enum aliases don't exist in the SDK |
S2S session.update without "session" key |
Must wrap config: {"type":"session.update","session":{...}} |
S2S tool schema using {"function":{...}} nesting |
S2S tools are flat: {"type":"function","name":"...","description":"...","parameters":{...}} |
| Voice Agent S2S URL | Correct URL: wss://agents.assemblyai.com/v1/ws — not /v1/voice (renamed April 2026), /v1/realtime (older), or speech-to-speech.us.assemblyai.com (very old) |
Voice Agent tool.call args field |
Renamed to arguments — event["arguments"] is the parameter dict |
Medical Mode domain: "medical" |
Correct value is domain: "medical-v1" |
LLM Gateway tool result role: "function_call_output" |
Correct role is "tool" — use {"role": "tool", "tool_call_id": "...", "content": "..."} |
Reference Files
Read the relevant reference file based on what the user needs:
| File | When to read |
|---|---|
references/python-sdk.md |
Python SDK patterns and examples |
references/js-sdk.md |
JavaScript/TypeScript SDK patterns |
references/streaming.md |
Real-time/streaming STT, v3 protocol, temp tokens, error codes |
references/voice-agents.md |
Voice agent integrations: LiveKit, Pipecat, turn detection, latency optimization |
references/llm-gateway.md |
Applying LLMs to transcripts, tool calling, available models |
references/speech-understanding.md |
Translation, speaker identification, custom formatting |
references/audio-intelligence.md |
PII redaction, diarization, summarization, sentiment, chapters |
references/api-reference.md |
Full parameter list, export endpoints, webhooks, upload, PII policies |