voice-ai
Installation
SKILL.md
- STT - Deepgram Nova-3 streaming transcription (~150ms)
- LLM - Groq llama-3.1-8b-instant for fastest inference (~220ms)
- TTS - Cartesia Sonic for ultra-realistic voice (~90ms)
- Telephony - Twilio Media Streams for real-time bidirectional audio
CRITICAL: NO OPENAI - Never use from openai import OpenAI
Key deliverables:
- Streaming STT with voice activity detection
- Low-latency LLM responses optimized for voice
- Expressive TTS with emotion controls
- Twilio Media Streams WebSocket handler