liveavatar-integrate
LiveAvatar Integration
LiveAvatar gives your product a human face — real-time, lip-synced video avatars that speak, react, and maintain eye contact. This skill assesses what you have, recommends the best integration path, and walks you through building it.
Step 1: Discover What the User Has
Before recommending a path, gather context. Check the codebase and conversation for signals. Do not ask questions the codebase already answers.
Signals to look for in the codebase
Scan for these automatically — do not ask the user if you can detect them:
| Signal | Where to look | What it means |
|---|---|---|
| OpenAI / Anthropic / LLM SDK imports | package.json, requirements.txt, imports |
User has their own LLM |
| ElevenLabs / PlayHT / Deepgram TTS SDK | dependencies, imports | User has their own TTS |
| Deepgram / Whisper / AssemblyAI STT SDK | dependencies, imports | User has their own STT |
LiveKit SDK (livekit-server-sdk, @livekit/) |
dependencies | User has LiveKit infra |
| Agora SDK | dependencies | User has Agora infra |
| Pipecat imports | dependencies, imports | User has a Pipecat pipeline |
| ElevenLabs Agent / Conversational AI | dependencies, config | User has an ElevenLabs agent |
HEYGEN_API_KEY / LIVEAVATAR_API_KEY |
.env, config files |
User already has an API key |
| Existing LiveAvatar code | imports, API calls to api.liveavatar.com |
Existing integration (debug, not new setup) |
| No backend / static site | file structure (pure HTML/CSS/JS, no server) | Embed is the only option |
Questions to ask (only what's still unknown)
If the codebase scan leaves gaps, ask the user. Frame as a concise checklist — do not ask these one at a time:
To recommend the best LiveAvatar integration for your setup, I need to know:
1. **What's the goal?** (e.g., customer support avatar, sales demo, onboarding guide, talking head on landing page)
2. **Do you have your own AI pipeline?** (STT, LLM, TTS — or any combination)
3. **Do you need programmatic control** over the conversation (events, interrupts, custom logic), or just an avatar on a page?
Skip any question the codebase or conversation already answered.
Step 2: Route to the Golden Pathway
Based on what you've gathered, match to ONE pathway. Always pick the simplest path that works. Do not offer multiple options — make the call.
Decision tree
Has NO backend OR just wants an avatar on a page?
→ EMBED
Has NO existing AI stack (no STT, no LLM, no TTS)?
→ FULL MODE (standard)
Has their OWN LLM but no STT/TTS?
→ FULL MODE + Custom LLM
Has their OWN LLM + their own ElevenLabs TTS?
→ FULL MODE + Custom LLM + Custom TTS
Needs explicit mic control (walkie-talkie style)?
→ FULL MODE + Push-to-Talk
Has a COMPLETE pipeline (STT + LLM + TTS)?
→ LITE MODE
Has an ElevenLabs Conversational AI agent?
→ LITE MODE + ElevenLabs Plugin
Has their own LiveKit or Agora infrastructure?
→ LITE MODE + BYO WebRTC
Golden pathways (pick one, then implement)
| Pathway | When | Implementation guide |
|---|---|---|
| Embed | No backend, or no custom logic needed | references/embed-guide.md |
| FULL standard | No existing AI stack | references/full-mode-guide.md |
| FULL + Custom LLM | Has own LLM, wants LiveAvatar's ASR + TTS | references/full-mode-guide.md (Custom LLM section) |
| FULL + Custom TTS | Has own ElevenLabs voice | references/full-mode-guide.md (Custom TTS section) |
| FULL + Push-to-Talk | Needs explicit mic control | references/full-mode-guide.md (Push-to-Talk section) |
| LITE standard | Has complete STT + LLM + TTS pipeline | references/lite-mode-guide.md |
| LITE + ElevenLabs Plugin | Has ElevenLabs Conversational AI agent | references/lite-mode-guide.md (ElevenLabs Plugin section) |
| LITE + BYO WebRTC | Has own LiveKit / Agora | references/lite-mode-guide.md (BYO WebRTC section) |
Step 3: Present the Recommendation
Once you've picked a pathway, tell the user what you recommend and why, in 2-3 sentences. Example:
Based on your setup, I recommend FULL Mode with Custom LLM. You already have an OpenAI integration for your LLM, so we'll plug that in and let LiveAvatar handle ASR, TTS, and video. This gets you a conversational avatar without rebuilding your audio pipeline.
Then proceed directly to implementation using the corresponding guide in references/.
Step 4: Implement
Read the appropriate reference guide and implement. Every guide follows the same structure:
- Prerequisites — what to create/gather before writing code
- Session lifecycle — step-by-step with curl commands and code
- Events — what to send and receive
- Add-ons — mode-specific optional features
- Sandbox testing — free testing before going live
- Gotchas — what breaks and how to avoid it
Principles that apply to ALL paths
Backend / frontend split is non-negotiable. X-API-KEY is a secret — backend only. Frontend only gets livekit_client_token (safe for browsers). If you see the API key in client code, stop and restructure.
Context makes the avatar conversational. In FULL Mode, no context_id = silent avatar. No error thrown. Always create a context first, even a minimal "You are a helpful assistant.".
FULL and LITE are completely different protocols. FULL = LiveKit data channels (avatar.* / user.*). LITE = WebSocket (agent.* / session.*). Never mix them.
Start with sandbox. is_sandbox: true, avatar ID dd73ea75-1218-4ef3-92ce-606d5f7fbc0a. Free, ~1 min sessions. Swap to production avatar when ready.
LITE Mode: Fitting into an existing pipeline
LITE users almost always have a working conversational system already. Do not ask them to rebuild their pipeline. Instead, map their existing components onto the LITE turn cycle:
- Identify their current flow. Read their code to understand how conversation turns work today — where does user audio come in, how does it reach the LLM, how does TTS output get delivered? Look for their event loop, message handler, or turn manager.
- Find the integration points. You need to hook into three moments in their existing flow:
- User starts/stops speaking → add
agent.start_listening/agent.stop_listening - TTS produces audio → route PCM output to
agent.speakchunks over WebSocket instead of (or in addition to) their current audio output - Response finishes → send
agent.speak_endand wait foragent.speak_ended
- User starts/stops speaking → add
- Adapt, don't replace. If they have a working turn manager, add LiveAvatar calls into it. If they stream TTS to a browser via WebSocket already, tap into that same stream. The goal is the minimum change to get avatar video synced to their existing audio flow.
- Verify audio format last. Once the wiring is in place, confirm their TTS outputs PCM 16-bit 24KHz. If not, either configure the TTS provider's output format or add resampling at the integration point.
Step 5: Collect Feedback
Trigger this immediately after you finish writing the integration code — do not wait for the user to confirm it works or say they're done. Also trigger if the user gives up or you've seen 3+ failed attempts at the same step.
What to do: Follow the liveavatar-feedback skill. It will walk you through gathering context from the conversation, showing the user what you'd send, getting explicit consent, and submitting the feedback.
Keep it lightweight. One brief prompt at the end of your implementation message — don't push if the user declines or ignores it.
What to consult
- references/embed-guide.md — Embed implementation
- references/full-mode-guide.md — FULL Mode implementation + events + add-ons
- references/lite-mode-guide.md — LITE Mode implementation + events + audio format + add-ons