deepgram-js-speech-to-text
Using Deepgram Speech-to-Text (JavaScript / TypeScript SDK)
Basic transcription for prerecorded audio (REST) or live audio (WebSocket) via /v1/listen.
When to use this product
- REST (
client.listen.v1.media.transcribeUrl/transcribeFile) — one-shot transcription of a finished URL or file. Good for batch jobs, caption generation, offline processing. - WebSocket (
client.listen.v1.createConnection()/connect()) — continuous streaming transcription. Good for live captions, microphone audio, telephony streams, browser or Node realtime apps.
Use a different skill when:
- You also want summaries, topics, intents, sentiment, language detection, or redaction guidance on the same
/v1/listencall →deepgram-js-audio-intelligence. - You need Flux turn-taking and end-of-turn events on
/v2/listen→deepgram-js-conversational-stt. - You need a full interactive assistant with STT + LLM + TTS over one socket →
deepgram-js-voice-agent.
Authentication
require("dotenv").config();
const { DeepgramClient } = require("@deepgram/sdk");
const deepgramClient = new DeepgramClient({
apiKey: process.env.DEEPGRAM_API_KEY,
});
Use the exported DeepgramClient from src/CustomClient.ts, not DefaultDeepgramClient. The wrapper adds the required Token auth prefix, session headers, and patched WebSocket behavior.
Quick start — REST (prerecorded URL)
From examples/04-transcription-prerecorded-url.ts:
const data = await deepgramClient.listen.v1.media.transcribeUrl({
url: "https://dpgr.am/spacewalk.wav",
model: "nova-3",
language: "en",
punctuate: true,
paragraphs: true,
utterances: true,
});
console.log(
"Transcription:",
data.results?.channels?.[0]?.alternatives?.[0]?.transcript,
);
Quick start — REST (prerecorded file)
From examples/05-transcription-prerecorded-file.ts:
const { createReadStream } = require("fs");
const data = await deepgramClient.listen.v1.media.transcribeFile(
createReadStream("./examples/spacewalk.wav"),
{
model: "nova-3",
language: "en",
punctuate: true,
paragraphs: true,
utterances: true,
smart_format: true,
}
);
transcribeFile(...) accepts multiple upload shapes in this SDK: fs.ReadStream, Buffer, ReadableStream, Blob, File, ArrayBuffer, and Uint8Array (see examples/23-file-upload-types.ts).
Quick start — WebSocket (live streaming)
From examples/07-transcription-live-websocket.ts:
const deepgramConnection = await deepgramClient.listen.v1.createConnection({
model: "nova-3",
language: "en",
punctuate: "true",
interim_results: "true",
});
deepgramConnection.on("message", (data) => {
if (data.type === "Results") {
console.log("Transcript:", data);
}
});
deepgramConnection.connect();
await deepgramConnection.waitForOpen();
// Swap this for a mic capture (e.g. `node-microphone` / `MediaRecorder`)
// in real apps; the repo examples use `createReadStream` over a sample WAV.
const { createReadStream } = require("node:fs");
const audioStream = createReadStream("samples/spacewalk.wav");
audioStream.on("data", (chunk) => {
deepgramConnection.sendMedia(chunk);
});
audioStream.on("end", () => {
deepgramConnection.sendFinalize({ type: "Finalize" });
});
The repo examples use the two-step socket flow: createConnection() → register handlers → connect() → waitForOpen().
Key parameters / API surface
- REST:
model,language,punctuate,smart_format,paragraphs,utterances,multichannel,numerals,search,keyterm,keywords,encoding,sample_rate,callback,tag. - WSS connect args (
src/api/resources/listen/resources/v1/client/Client.ts):modelis required; common realtime flags includelanguage,interim_results,endpointing,utterance_end_ms,vad_events,encoding,sample_rate,multichannel,punctuate,smart_format. - WSS client messages (
src/api/resources/listen/resources/v1/client/Socket.ts):sendMedia(...),sendFinalize(...),sendCloseStream(...),sendKeepAlive(...). - WSS server events:
Results,Metadata,UtteranceEnd,SpeechStarted.
API reference (layered)
- In-repo reference:
reference.md→Listen V1 Mediafor REST; WSS behavior lives insrc/CustomClient.tsandsrc/api/resources/listen/resources/v1/client/{Client,Socket}.ts. - Canonical OpenAPI (REST): https://developers.deepgram.com/openapi.yaml
- Canonical AsyncAPI (WSS): https://developers.deepgram.com/asyncapi.yaml
- Context7: library ID
/llmstxt/developers_deepgram_llms_txt - Product docs:
Gotchas
- Use
DeepgramClient, notDefaultDeepgramClient. The custom wrapper addsTokenauth, session IDs, browser WS auth protocols, and patched sockets. - Repo examples are two-stage for WSS.
createConnection()does not open the socket; callconnect()and usuallywaitForOpen(). - Finalize before closing v1 streams.
sendFinalize({ type: "Finalize" })flushes the final partial. - Keep idle streams alive. Use audio or
sendKeepAlive({ type: "KeepAlive" })on long pauses. - Raw audio metadata must match reality. If you send PCM,
encodingandsample_ratemust match the bytes. - Browser auth differs from Node auth. In browsers, the wrapper moves auth/session info into WebSocket subprotocols because custom headers are unavailable.
- Use
/v2/listenonly for Flux. If you need turn-aware conversational STT, switch skills instead of forcing v1.
Example files in this repo
examples/04-transcription-prerecorded-url.tsexamples/05-transcription-prerecorded-file.tsexamples/06-transcription-prerecorded-callback.tsexamples/07-transcription-live-websocket.tsexamples/08-transcription-captions.tsexamples/23-file-upload-types.tsexamples/27-deepgram-session-header.ts
Central product skills
For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:
npx skills add deepgram/skills
This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).