deepgram-reference-architecture
Deepgram Reference Architecture
Contents
Overview
Reference architectures for scalable transcription systems: synchronous API for short files, async queue (BullMQ) for batch processing, WebSocket streaming for real-time, and hybrid routing with enterprise multi-region deployment.
Prerequisites
- Deepgram API access
- Redis for queue-based patterns
- WebSocket support for streaming
- Monitoring infrastructure
Instructions
Step 1: Choose Architecture Pattern
Select based on use case: Sync API for files under 60s, Async Queue for batch/long files, Streaming for real-time transcription, or Hybrid for mixed workloads.
Step 2: Implement Synchronous Pattern
Direct API calls via Express endpoint. Store results in database. Best for low-latency, short audio requirements.
Step 3: Implement Async Queue Pattern
Use BullMQ with Redis for job queuing. Configure workers with concurrency (10), retry (3 attempts, exponential backoff). Notify clients on completion.
Step 4: Implement Streaming Pattern
Create WebSocket server that proxies audio between client and Deepgram Live API. Forward transcripts back to client in real-time with interim results.
Step 5: Build Hybrid Router
Auto-select pattern based on audio duration: sync for <60s, async for >300s. Allow explicit mode override via request parameter.
Step 6: Scale to Enterprise
Deploy multi-region with load balancing. Use Redis cluster for cross-region coordination. Configure per-region worker pools with 20 concurrency and 5 retries.
See detailed implementation for advanced patterns.
Output
- Synchronous transcription endpoint
- Queue-based async processing pipeline
- Real-time WebSocket streaming server
- Hybrid router with auto-selection
- Enterprise multi-region architecture
- Prometheus monitoring integration
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Timeout on large files | Sync pattern | Switch to async queue |
| WebSocket disconnect | Network issue | Auto-reconnect with backoff |
| Queue backlog | Worker overload | Scale workers, increase concurrency |
| Region failover | Regional outage | Route to healthy region |
Examples
Architecture Selection Guide
| Pattern | Best For | Latency | Throughput |
|---|---|---|---|
| Sync API | Short files (<60s) | Low | Low |
| Async Queue | Batch processing | Medium | High |
| Streaming | Live transcription | Real-time | Medium |
| Hybrid | Mixed workloads | Varies | High |