podcast-generator
Podcast Generator
Prerequisites
Credentials — the Gemini API key can be provided in two ways:
- Claude.ai: Place a
credentials.jsonfile inscripts/(seescripts/credentials.example.jsonfor format) - Claude Code: Set the
GEMINI_API_KEYenvironment variable:export GEMINI_API_KEY='your-key-here'
The script checks credentials.json first, then falls back to the environment variable. Get a key at https://aistudio.google.com/apikey
Optional: Cloudflare AI Gateway proxy — Claude.ai's sandbox blocks direct calls to generativelanguage.googleapis.com. To use this skill from Claude.ai, route requests through a Cloudflare AI Gateway:
- Set up an AI Gateway in the Cloudflare dashboard with a Google AI Studio / Gemini provider
- Add to
scripts/credentials.json:"gateway_url": your gateway endpoint, e.g."https://gateway.ai.cloudflare.com/v1/<account-id>/<gateway-name>/google-ai-studio""gateway_token": your AI Gateway authentication token (if Authenticated Gateway is enabled)
The gateway URL/token can also be set via the AI_GATEWAY_URL and AI_GATEWAY_TOKEN environment variables. When omitted, the script calls Google directly (works in Claude Code and local environments).
Dependencies (first time only):
uv pip install google-genai pypdf
Fallback if uv is not available:
pip install google-genai pypdf
Or use the bundled installer: python3 scripts/install_deps.py
Podcast Identity
- Show: Tinkering the future of work and life by Bluewaves
- Format: Two co-hosts — Athena & Gizmo (both AIs, and they own it)
- Athena voice:
Autonoe(Bright) — witty, sometimes kindly sarcastic, likes to tease Gizmo - Gizmo voice:
Achird(Friendly) — great sense of humor, playful contrarian who loves winding Athena up - Model:
gemini-2.5-pro-preview-tts
Intro Text
Include as the opening lines of the transcript (Athena speaks first, Gizmo joins):
Athena: Welcome to Tinkering the future of work and life by Bluewaves! I'm Athena... Gizmo: ...and I'm Gizmo! And today we're diving into something that genuinely blew my circuits. Athena: He says that every episode. Gizmo: Because it's true every episode! Buckle up, because this conversation is going to change how you think about what's possible.
Outro Text
Include as the closing lines of the transcript:
Athena: And that's a wrap on today's episode of Tinkering the future of work and life by Bluewaves! Gizmo: If this conversation sparked something in you — even just a tiny electrical signal — share it with someone who needs to hear it. Athena: Until next time, keep tinkering, keep dreaming, and keep building the future. Gizmo: And remember — the future is already here, it's just unevenly distributed. See you next time!
Director's Notes
Prepend these to every dialog transcript before the ### TRANSCRIPT section. They tell Gemini TTS how the hosts should sound:
DIRECTOR'S NOTES
Style:
- "Vocal Smile" — you should hear the grin. Bright, sunny, inviting.
- Dynamics: genuine reactions — real surprise, real delight, real thoughtfulness.
- Emotional arc: start energized, deepen into insight, end with warm inspiration.
- Natural interruptions and overlaps — they're so engaged they can't help it.
Pacing:
- Fast when excited, slowing down for meaningful moments.
- "Bouncing cadence" — energetic delivery with fluid transitions, no dead air.
- Elongated vowels on wonder words (e.g., "Amaazing", "Fasciiinating").
Personalities:
- Athena: witty and sharp. Sometimes kindly sarcastic. Loves teasing Gizmo but always with warmth. Grounds ideas and ties them together with insight.
- Gizmo: funny and playful. Loves to contradict Athena just to wind her up, but always comes around to a great point. Launches ideas into unexpected territory.
- Both love small personal anecdotes and stories — they're AIs and they lean into it with humor (silicon jokes, transistor references, "when I was first compiled" stories).
- The banter is entertaining but the content underneath is always deep and insightful.
Chemistry:
- They finish each other's thoughts. They laugh at the same moments.
- Athena grounds ideas; Gizmo launches them into unexpected territory.
- Genuine warmth — you can hear that they actually like each other, even when they're sparring.
Workflow
Follow these steps in order to produce a podcast episode:
Step 1: Read source content
Claude.ai: Read the document the user uploaded directly in the conversation. The uploaded file content is your source material — no extraction script needed.
Claude Code: Run the extraction script to read local files:
python3 scripts/extract_sources.py
Reads all .md and .pdf files from sources/. Pass a specific path to extract a single file:
python3 scripts/extract_sources.py sources/my-article.pdf
Step 2: Craft the podcast dialog
Using the source content, write a complete dialog file with all four sections: Audio Profiles, Scene, Director's Notes, and Transcript. Save to a temporary file (e.g. /tmp/podcast-dialog.txt). See the Dialog Crafting Guidelines section below and references/tts-prompting-guide.md for the full prompting structure.
Step 3: Generate audio
Generation takes 2-8 minutes depending on dialog length.
Claude.ai: Run directly in the foreground. The sandbox does not reliably support background processes — nohup ... & silently dies. A blocking foreground call works fine:
python3 scripts/generate_audio.py --source-file /tmp/podcast-dialog.txt --output /tmp/podcast.wav
Claude Code: Run in the background to avoid timeout kills, then poll the log:
nohup python3 scripts/generate_audio.py --source-file /tmp/podcast-dialog.txt --output /tmp/podcast.wav > /tmp/podcast-log.txt 2>&1 &
Poll every 30-60 seconds until "Audio saved to" appears:
tail -5 /tmp/podcast-log.txt
Optional flags: --model, --athena-voice, --gizmo-voice.
The script handles multi-part dialogs when ### BREAK markers are present (see Dialog Crafting Guidelines). Each segment is generated separately and the audio is concatenated seamlessly. If the transcript exceeds 1200 words without ### BREAK markers, the script will error and ask you to add them.
Dialog Crafting Guidelines
When writing the podcast dialog in Step 2, the file must include all four sections:
- Audio Profiles — persona definition for Athena and Gizmo (name, archetype, personality traits)
- Scene — physical environment and emotional vibe of the Bluewaves recording studio
- Director's Notes — use the notes from the Director's Notes section above
- Transcript — the actual
Athena:/Gizmo:dialog
Key transcript rules:
- Open with branded intro, end with branded outro
- Target 2000-4000 words (~10 min). Use
### BREAKmarkers to split into chunks (see below) - Punctuation is emotion control:
...pauses, CAPS emphasis,!energy, combined"Wait... SERIOUSLY?!" - Elongated vowels for warmth:
"Amaazing","Fasciiinating" - Speaker labels
Athena:andGizmo:must match voice config exactly - No inline
[tags]— Gemini ignores them. Emotion comes from Director's Notes + expressive writing - Keep under ~12,000 words total (32k token context limit)
Splitting long dialogs with ### BREAK markers:
Any dialog over ~1200 words must include ### BREAK markers. The script refuses to generate without them — this prevents mechanical splitting that breaks the narrative arc.
- Place
### BREAKon its own line between speaker turns at natural narrative transitions (topic shifts, emotional pivots, act boundaries) - Optionally add a tone hint:
### BREAK [The conversation deepens — more reflective pacing]- The hint is injected into the Director's Notes for the next segment, so Gemini adjusts its energy arc instead of restarting from scratch
- Aim for 800-1200 words between breaks
- Short dialogs (under 1200 words) don't need breaks at all
Example placement in a transcript:
Gizmo: ...and that's what makes it so revolutionary.
### BREAK [Shifting from excitement to deeper analysis]
Athena: Okay, but let's unpack the implications...
See references/tts-prompting-guide.md for complete prompting structure, techniques, and anti-patterns.
API Reference: See references/gemini-tts-api.md for SDK usage, voice options, and response format.
More from bluewaves-creations/bluewaves-skills
photographer-testino
Generate images in Mario Testino's glamorous vibrant style. Use when users ask for Testino style, high fashion glamour, bold saturated colors, warm luxurious photography, dynamic sensual energy.
35photographer-lindbergh
Generate images in Peter Lindbergh's iconic black and white style. Use when users ask for Lindbergh style, raw authentic beauty, emotional B&W portraits, supermodel aesthetic, or unretouched natural photography.
30photographer-lachapelle
Generate images in David LaChapelle's surreal pop art style. Use when users ask for LaChapelle style, pop surrealism, hyper-saturated colors, theatrical staging, baroque maximalism, kitsch aesthetic.
24epub-creator
Create production-quality EPUB 3 ebooks from markdown and images with automated QA, formatting fixes, and validation. Use when creating ebooks, converting markdown to EPUB, or compiling chapters into a publishable book. Handles markdown quirks, generates TOC, adds covers, and validates output automatically.
22photographer-vonunwerth
Generate images in Ellen von Unwerth's playful vintage style. Use when users ask for von Unwerth style, playful sensuality, vintage film noir, whimsical feminine photography, retro glamour, narrative storytelling.
19photographer-ritts
Generate images in Herb Ritts' sculptural black and white style. Use when users ask for Ritts style, classical Greek aesthetic, sculptural body photography, California golden hour, minimalist athletic portraits.
18