When to Use

You are building a real-time voice or multimodal AI application that uses Daily or Pipecat-style transports.
You need guidance on low-latency audio, video, text, and AI service orchestration in one pipeline.
You want a capability reference before choosing services, transports, or workflow patterns for an interactive agent.

Capabilities

Pipecat enables agents to build production-ready voice and multimodal AI applications with real-time processing. Agents can orchestrate complex AI service pipelines that handle audio, video, and text simultaneously while maintaining ultra-low latency (500-800ms round-trip). The framework abstracts away the complexity of coordinating multiple AI services, network transports, and audio processing, allowing agents to focus on application logic.

Key capabilities include:

Real-time voice conversations with natural turn-taking and interruption handling
Multimodal processing combining audio, video, images, and text
Integration with 50+ AI services (LLMs, speech recognition, text-to-speech, vision models)
Function calling for external API integration and tool use
Automatic conversation context management with optional summarization
Multiple transport options (WebRTC, WebSocket, Daily, Twilio, Telnyx, etc.)
Production deployment across cloud platforms with built-in scaling

daily

When to Use

Capabilities