podcast
When to Use
- User wants to create a podcast episode on any topic
- User provides a URL or text and wants it turned into a podcast discussion
- User asks for a "debate", "dialogue", or "discussion" format
- User says "podcast", "播客", or "录一期节目"
When NOT to Use
- User wants text-to-speech reading (use
/speech) - User wants an explainer video with visuals (use
/explainer) - User wants to generate an image (use
/image-gen) - User only wants to extract content from a URL without generating audio (use
/content-parser)
Purpose
Generate podcast episodes with 1-2 AI speakers discussing a topic. Supports quick overviews, deep analysis, and debate formats. Input can be a topic description, URL(s), or text. Output is a full audio episode with transcript.
Hard Constraints
- Always check CLI auth following
shared/cli-authentication.md - Follow
shared/cli-patterns.mdfor command execution and error handling - Never hardcode speaker IDs in API calls — use built-in defaults from
shared/speaker-selection.mdas fallback only; fetch from the speakers API when the user wants to change voice - Never fabricate CLI commands or parameters
- Always read config following
shared/config-pattern.mdbefore any interaction - Always follow
shared/speaker-selection.mdfor speaker selection (text table + free-text input) - Never save files to
~/Downloads/or.listenhub/— save artifacts to the current working directory with friendly topic-based names (seeshared/config-pattern.md§ Artifact Naming)
Step -1: CLI Auth Check
Follow shared/cli-authentication.md § Auth Check. If the CLI is not installed or the user is not logged in, auto-install and auto-login — never ask the user to run commands manually.
Step 0: Config Setup
Follow shared/config-pattern.md Step 0 (Zero-Question Boot).
If file doesn't exist — silently create with defaults and proceed:
mkdir -p ".listenhub/podcast"
echo '{"outputMode":"inline","language":null,"defaultMode":"quick","defaultSpeakers":{}}' > ".listenhub/podcast/config.json"
CONFIG_PATH=".listenhub/podcast/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Do NOT ask any setup questions. Proceed directly to the Interaction Flow.
If file exists — read config silently and proceed:
CONFIG_PATH=".listenhub/podcast/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/podcast/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Setup Flow (user-initiated reconfigure only)
Only run when the user explicitly asks to reconfigure. Display current settings:
当前配置 (podcast):
输出方式:{inline / download / both}
语言偏好:{zh / en / 未设置}
默认模式:{quick / deep / debate / 未设置}
默认主播:{speakerName(s) / 使用内置默认}
Then ask these questions in order and save:
-
outputMode: Follow
shared/output-mode.md§ Setup Flow Question. -
Language (optional): "默认语言?"
- "中文 (zh)"
- "English (en)"
- "每次手动选择" → keep
null
-
Mode (optional): "默认播客模式?"
- "Quick — 简短概述"
- "Deep — 深度分析"
- "Debate — 辩论对话"
- "每次手动选择" → keep
null
After collecting answers, save immediately:
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
# Save language if user chose one (not "每次手动选择")
if [ "$LANGUAGE" != "null" ]; then
NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
fi
# Save mode if user chose one
if [ "$MODE" != "null" ]; then
NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg mode "$MODE" '. + {"defaultMode": $mode}')
fi
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
Interaction Flow
Step 1: Topic + Reference Materials
Ask topic and optional reference materials together in a single question using AskUserQuestion with two sub-questions, or a single free-text prompt:
What topic would you like to turn into a podcast? If you have reference materials (URLs or text), include them here too.
Accept: topic description, URL(s), pasted text, or any combination.
Examples of valid input:
- "AI developments in 2026"
- "https://example.com/article — discuss this"
- "The pros and cons of remote work. Reference: https://study.com/remote-work-2026"
Step 2: Mode
Default: "quick" — skip this question unless:
config.defaultModeis set to something else → use that value silently- User explicitly mentioned a mode keyword in Step 1 (e.g. "deep dive", "debate", "in depth") → infer mode from intent
Only ask this question if the user's intent is ambiguous AND no default is configured. In most cases, just use "quick".
Step 3: Language
Default: match the user's interaction language. Detect from the language the user used in Step 1:
- If the user wrote in Chinese →
zh - If the user wrote in English →
en - If
config.languageis set → use that value
Never ask this question. Always infer silently. Show in the confirmation summary so the user can override if needed.
Step 4: Speaker Count
Default: 2 speakers (dialogue) — the most common and engaging format.
Skip this question. Debate mode requires 2 speakers. For quick/deep, default to 2 speakers as well.
Only use 1 speaker if the user explicitly requests a monologue or solo format.
Step 5: Speaker Selection
Follow shared/speaker-selection.md:
- If
config.defaultSpeakers.{language}is set → use saved speakers silently - If not set → use built-in defaults from
shared/speaker-selection.md(no question asked) - Show the speaker(s) in the confirmation summary — user can change from there if desired
- Only show the full speaker list if the user explicitly asks to change voices
For 2-speaker mode (dialogue/debate): use Primary + Secondary defaults for the language.
Step 6: Confirm & Generate
Summarize all choices:
Ready to generate podcast:
Topic: {topic}
Mode: {mode}
Language: {language}
Speakers: {speaker name(s)}
References: {yes/no + brief description}
Proceed?
Wait for explicit confirmation before calling any CLI command. The user can adjust any parameter here before confirming.
Workflow
Generation
-
Submit (background): Run the CLI command with
run_in_background: trueandtimeout: 360000:listenhub podcast create \ --query "{topic}" \ --source-url "{url}" \ --source-text "{text}" \ --mode {quick|deep|debate} \ --lang {en|zh|ja} \ --speaker "{name}" \ --speaker "{name2}" \ --jsonFlag notes:
--query— the topic or question to discuss--source-url— repeatable, one per URL reference--source-text— repeatable, one per text block reference--mode— one ofquick,deep,debate--lang— language code--speaker— repeatable (max 2); use speaker display names--speaker-id— alternative to--speaker; use speaker IDs instead of names- Omit
--source-url/--source-textif the user provided no references
The CLI handles polling internally and returns the final result when generation completes.
-
Tell the user the task is submitted and that they will be notified when it finishes.
-
When notified of completion, Present result:
Parse the CLI JSON output to extract fields:
audioUrl,subtitlesUrl,audioDuration,credits.Read
OUTPUT_MODEfrom config. Followshared/output-mode.mdfor behavior.inlineorboth: DisplayaudioUrlas a clickable link.Present:
播客已生成! 在线收听:{audioUrl} 字幕:{subtitlesUrl}(如有) 时长:{audioDuration / 1000}s 消耗积分:{credits}downloadorboth: Also download the file. Generate a topic slug followingshared/config-pattern.md§ Artifact Naming.SLUG="{topic-slug}" # e.g. "ai-developments" NAME="${SLUG}-podcast.mp3" # Dedup: if file exists, append -2, -3, etc. BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2 while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done curl -sS -o "$NAME" "{audioUrl}"Present:
已保存到当前目录: {NAME} -
Offer to show transcript or provide download URL on request
After Successful Generation
Update config with the choices made this session:
NEW_CONFIG=$(echo "$CONFIG" | jq \
--arg lang "{language}" \
--arg mode "{mode}" \
--argjson speakers '{"{language}": ["{speakerId}"]}' \
'. + {"language": $lang, "defaultMode": $mode, "defaultSpeakers": (.defaultSpeakers + $speakers)}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
API Reference
- Speaker list:
shared/cli-speakers.md - Speaker selection guide:
shared/speaker-selection.md - CLI patterns:
shared/cli-patterns.md - CLI authentication:
shared/cli-authentication.md - Config pattern:
shared/config-pattern.md
Composability
- Invokes: speakers API (for speaker selection)
- Invoked by: content-planner (Phase 3)
Example
User: "Make a podcast about the latest AI developments"
Agent workflow:
- Detect: podcast request, topic = "latest AI developments", no references
- Infer: mode = "quick" (default), language = "en" (user wrote in English), 2 speakers (default)
- Show confirmation summary → user confirms
listenhub podcast create \
--query "The latest AI developments" \
--mode deep \
--lang en \
--speaker "Mars" \
--speaker "Mia" \
--json
Wait for CLI to return result, then present with title and listen link.