skills/deepgram/deepgram-js-sdk/deepgram-js-text-to-speech

deepgram-js-text-to-speech

Installation
SKILL.md

Using Deepgram Text-to-Speech (JavaScript / TypeScript SDK)

Convert text to audio with one-shot REST generation or low-latency streaming synthesis via /v1/speak.

When to use this product

  • REST (client.speak.v1.audio.generate) — render finished text into an audio response. Best for downloadable files, pre-generated prompts, batch synthesis.
  • WebSocket (client.speak.v1.createConnection() / connect()) — stream text in and receive audio out with lower latency. Best when an LLM is still producing tokens.

Use a different skill when:

  • You need the agent to also listen, think, and handle barge-in → deepgram-js-voice-agent.

Authentication

require("dotenv").config();

const { DeepgramClient } = require("@deepgram/sdk");

const deepgramClient = new DeepgramClient({
  apiKey: process.env.DEEPGRAM_API_KEY,
});

The repo examples use require("../dist/cjs/index.js"), but application code should normally import from @deepgram/sdk.

Quick start — REST (one-shot)

From examples/10-text-to-speech-single.ts:

const data = await deepgramClient.speak.v1.audio.generate({
  text: "Hello, this is a test of Deepgram's text-to-speech API.",
  model: "aura-2-thalia-en",
  encoding: "linear16",
  container: "wav",
});

console.log("Audio generated successfully", data);

generate(...) returns a BinaryResponse, not JSON. See examples/25-binary-response.ts for .stream(), .arrayBuffer(), .blob(), and .bytes() handling.

Quick start — WebSocket (streaming)

From examples/11-text-to-speech-streaming.ts:

const deepgramConnection = await deepgramClient.speak.v1.createConnection({
  model: "aura-2-thalia-en",
  encoding: "linear16",
});

deepgramConnection.on("message", (data) => {
  if (typeof data === "string" || data instanceof ArrayBuffer || data instanceof Blob) {
    console.log("Audio received");
  } else if (data.type === "Flushed") {
    deepgramConnection.close();
  }
});

deepgramConnection.connect();
await deepgramConnection.waitForOpen();

deepgramConnection.sendText({ type: "Speak", text: "Hello from streaming TTS." });
deepgramConnection.sendFlush({ type: "Flush" });

Key parameters / API surface

  • REST & WSS: model, encoding, sample_rate, container, bit_rate, callback, callback_method, tag, mip_opt_out.
  • REST response surface (examples/25-binary-response.ts): response.stream(), response.arrayBuffer(), response.blob(), response.bytes(), response.bodyUsed.
  • WSS client messages (src/api/resources/speak/resources/v1/client/Socket.ts): sendText(...), sendFlush(...), sendClear(...), sendClose(...).
  • WSS server events: binary audio payloads plus Metadata, Flushed, Cleared, Warning.

Limitations

Unlike the Python SDK, this repo does not include a hand-written TextBuilder helper. If you want incremental token buffering before sendText(...), build that helper in your application layer.

API reference (layered)

  1. In-repo reference: reference.mdSpeak V1 Audio for REST; WSS behavior lives in src/CustomClient.ts and src/api/resources/speak/resources/v1/client/{Client,Socket}.ts.
  2. Canonical OpenAPI (REST): https://developers.deepgram.com/openapi.yaml
  3. Canonical AsyncAPI (WSS): https://developers.deepgram.com/asyncapi.yaml
  4. Context7: library ID /llmstxt/developers_deepgram_llms_txt
  5. Product docs:

Gotchas

  1. REST returns binary, not JSON. Treat the result like a streamed/binary body.
  2. Use the custom client wrapper. src/CustomClient.ts patches binary WebSocket handling; the generated socket assumes JSON too aggressively.
  3. createConnection() is lazy. Register handlers, then call connect() and waitForOpen().
  4. Send Flush after your text. Without sendFlush({ type: "Flush" }), trailing audio may not be emitted promptly.
  5. Streaming text is structured JSON. Send { type: "Speak", text }, not a raw string.
  6. Audio payload shape varies by runtime. The same handler may receive string, ArrayBuffer, or Blob.
  7. Pick encoding/container/sample rate that match your sink. Mismatches show up as static, silence, or unplayable files.

Example files in this repo

  • examples/10-text-to-speech-single.ts
  • examples/11-text-to-speech-streaming.ts
  • examples/25-binary-response.ts

Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

npx skills add deepgram/skills

This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).

Weekly Installs
2
GitHub Stars
260
First Seen
5 days ago