convo-agent-skills
Convo Agent Skills
Build Agora Conversational AI apps in Next.js. Pair with next-best-practices for Next.js conventions.
Discovery — Ask Before Building
IMPORTANT: Before loading any references or writing any code, you MUST use the AskUserQuestion tool to gather requirements. Do NOT skip these questions.
Use AskUserQuestion with these questions:
Question 1:
- question: "What type of agent are you building?"
- header: "Agent type"
- multiSelect: false
- options:
- Voice-only (Real-time voice conversation with AI agent)
- Video + voice (Voice agent with camera/video support)
- Text chat (RTM-based text messaging with AI)
- Full-featured (Voice, video, text, and all features)
Question 2:
- question: "Which additional features do you need?"
- header: "Features"
- multiSelect: true
- options:
- Live transcript / subtitles (Real-time speech-to-text display)
- Settings UI (LLM, TTS, ASR configuration panel)
- Avatar (HeyGen, Anam, or Akool avatar integration)
- Host controls (Mute/unmute participants)
Question 3:
- question: "Are you building from scratch or adding to an existing project?"
- header: "Starting point"
- multiSelect: false
- options:
- From scratch (New Next.js project, scaffold everything)
- Existing project (Add Agora features to an existing Next.js app)
Wait for all answers before proceeding. Use the answers to determine which references to load from the Loading Paths table below.
Loading Paths
Based on answers, load ONLY the needed references (one at a time, as user progresses):
| Goal | References to load (in order) |
|---|---|
| Voice-only agent | 00 → 01 → 02 → 05 |
| Video + voice agent | 00 → 01 → 02 → 03 → 05 |
| Text chat with AI | 00 → 01 → 04 → 05 |
| Full-featured | 00 → 01 → 02 → 03 → 04 → 05 → 06 |
| + Transcript/subtitles | add 06 (and 04 if RTM mode) |
| + Settings UI | add 07 |
| + Host mute controls | add 08 (requires 04) |
| + Avatar | add 09 (requires 03 + 05) |
| + MCP / advanced | add 10 |
Dependency Graph
00-core-setup ← everything starts here
01-token-auth ← required by all features
02-rtc-voice ← requires 01
03-rtc-video ← requires 02 (extends voice with camera)
04-rtm-messaging ← requires 01 (independent of RTC)
05-agent-lifecycle ← requires 01 + (02 or 04)
06-transcript ← requires 05 + (02 or 04)
07-settings ← requires 05
08-host-controls ← requires 04
09-avatar ← requires 03 + 05
10-advanced ← requires 05
How to Load
Read references one at a time as the user progresses:
Read: <skill-path>/references/<filename>.md
For copy-paste code files:
Read: <skill-path>/snippets/<filename>
IMPORTANT: Do NOT load all references upfront. Load the next one only when the user is ready.
Critical Architecture Patterns
Embed these in every implementation:
-
Module-scoped singletons — RTC/RTM clients and track refs MUST be module-scoped (outside hook function), NOT
useRefinside hooks. Multiple components calluseAgora()— each gets its own hook instance, souseRefstate is NOT shared. -
Track lifecycle — To guarantee hardware release (mic/camera indicator off):
track.getMediaStreamTrack()?.stop(); // browser-native, guaranteed track.stop(); track.close(); -
Server key injection — Client sends sentinel
"__USE_SERVER__"for API keys. Server reads from env and replaces. Keys never appear in client code or browser.
Common Post-Generation Issues
Fix these immediately if present in generated code:
- Agent UID "0" — Invite route MUST return
responseData.agent_uidfrom Agora, NOT the input"0" - Agent in participant list — Filter
agentRtcUid+agentAvatarRtcUidinhandleUserPublished(see 02) - Empty transcript — Do NOT skip
data.objectmessages in RTM handler; route toprocessTranscriptMessage(see 04) - Video toggle error — Cleanup existing track before creating new one (see 03)
- No chat —
sendChatMessagemust publish to agent UID withchannelType: "USER"(see 04) - Duplicate transcripts — Use
addedTurnIdsSet + separate user/agent turn tracking (see 06) - Chat echo — Track sent messages in
recentlySentMessagesSet (see 04) - Agent name 409 — Generate unique names:
${baseName}-${timestamp}-${random}(see 05) - Wrong field names — User uses
final, agent usesturn_status(NOTis_final) (see 06) - Controls layout — Center: mic/camera/end-call, Right: agent/settings
Skill Boundaries
- This skill: Agora SDK patterns, token generation, agent API, transcript, RTM, avatar
- next-best-practices: API route structure, RSC boundaries, error handling, performance