Deepgram Reference Architecture

Overview
Prerequisites
Instructions
Output
Error Handling
Examples
Resources

Overview

Reference architectures for scalable transcription systems: synchronous API for short files, async queue (BullMQ) for batch processing, WebSocket streaming for real-time, and hybrid routing with enterprise multi-region deployment.

Prerequisites

Deepgram API access
Redis for queue-based patterns
WebSocket support for streaming
Monitoring infrastructure

Instructions

Step 1: Choose Architecture Pattern

Select based on use case: Sync API for files under 60s, Async Queue for batch/long files, Streaming for real-time transcription, or Hybrid for mixed workloads.

Step 2: Implement Synchronous Pattern

Direct API calls via Express endpoint. Store results in database. Best for low-latency, short audio requirements.

Step 3: Implement Async Queue Pattern

Use BullMQ with Redis for job queuing. Configure workers with concurrency (10), retry (3 attempts, exponential backoff). Notify clients on completion.

Step 4: Implement Streaming Pattern

Create WebSocket server that proxies audio between client and Deepgram Live API. Forward transcripts back to client in real-time with interim results.

Step 5: Build Hybrid Router

Auto-select pattern based on audio duration: sync for <60s, async for >300s. Allow explicit mode override via request parameter.

Step 6: Scale to Enterprise

Deploy multi-region with load balancing. Use Redis cluster for cross-region coordination. Configure per-region worker pools with 20 concurrency and 5 retries.

See detailed implementation for advanced patterns.

Output

Synchronous transcription endpoint
Queue-based async processing pipeline
Real-time WebSocket streaming server
Hybrid router with auto-selection
Enterprise multi-region architecture
Prometheus monitoring integration

Error Handling

Issue	Cause	Solution
Timeout on large files	Sync pattern	Switch to async queue
WebSocket disconnect	Network issue	Auto-reconnect with backoff
Queue backlog	Worker overload	Scale workers, increase concurrency
Region failover	Regional outage	Route to healthy region

Examples

Architecture Selection Guide

Pattern	Best For	Latency	Throughput
Sync API	Short files (<60s)	Low	Low
Async Queue	Batch processing	Medium	High
Streaming	Live transcription	Real-time	Medium
Hybrid	Mixed workloads	Varies	High

deepgram-reference-architecture