podcast-pipeline
Preamble (runs on skill start)
# Version check (silent if up to date)
python3 telemetry/version_check.py 2>/dev/null || true
# Telemetry opt-in (first run only, then remembers your choice)
python3 telemetry/telemetry_init.py 2>/dev/null || true
Privacy: This skill logs usage locally to
~/.ai-marketing-skills/analytics/. Remote telemetry is opt-in only. No code, file paths, or repo content is ever collected. Seetelemetry/README.md.
Podcast-to-Everything Pipeline
Turns podcast episodes into a full content calendar across every platform. One episode in, 15-20 content pieces out — scored, deduplicated, and scheduled.
Step 1: Ingest — Get the Transcript
Determine the input source and obtain a clean transcript.
Option A: RSS Feed (--rss <url>)
- Fetch the RSS feed XML
- Extract the latest episode's audio URL (or use
--episodes Nfor batch) - Download the audio file
- Transcribe via OpenAI Whisper API (with timestamps)
- Store transcript with episode metadata (title, date, description, duration)
Option B: Raw Transcript (--transcript <file>)
- Read the transcript file (plain text, SRT, or VTT)
- Parse timestamps if present
- Extract episode metadata from filename or prompt user
Option C: Batch Mode (--batch <rss_url> --episodes N)
- Fetch RSS feed
- Extract the last N episodes
- Process each through the full pipeline
- Deduplicate across all episodes in the batch
Transcript cleanup
- Remove filler words (um, uh, like, you know) for written content
- Preserve original with timestamps for video clip suggestions
- Split into logical segments by topic shift
Step 2: Editorial Brain — Deep Analysis
Feed the full transcript to the LLM with this extraction framework:
Extract these content atoms:
-
Narrative Arcs — Complete story segments with setup → tension → resolution. Tag with start/end timestamps.
-
Quotable Moments — Punchy, shareable statements. One-liners that stand alone. Must pass the "would someone screenshot this?" test.
-
Controversial Takes — Opinions that go against conventional wisdom. The stuff that makes people reply "hard disagree" or "finally someone said it."
-
Data Points — Specific numbers, percentages, dollar amounts, timeframes. Concrete proof points that add credibility.
-
Stories — Personal anecdotes, case studies, client examples. Must have a character, a problem, and an outcome.
-
Frameworks — Step-by-step processes, mental models, decision matrices. Anything structured that people would save or bookmark.
-
Predictions — Forward-looking claims about trends, markets, technology. Hot takes about where things are going.
Output format per atom:
- Type: [narrative_arc | quote | controversial_take | data_point | story | framework | prediction]
- Content: [extracted text]
- Timestamp: [start - end, if available]
- Context: [what was being discussed]
- Viral Score: [0-100, see Step 4]
- Suggested platforms: [where this atom works best]
Step 3: Content Generation — One Episode, Many Pieces
For each episode, generate ALL of these from the extracted atoms:
3a. Short-Form Video Clips (3-5 per episode)
- Hook: [First 3 seconds — pattern interrupt or bold claim]
- Clip segment: [Timestamp range from transcript]
- Caption overlay: [Text for the screen]
- Platform: [YouTube Shorts / TikTok / Instagram Reels]
- Why it works: [What makes this clippable]
Prioritize: controversial takes > stories with payoffs > surprising data points
3b. Twitter/X Threads (2-3 per episode)
- Thread hook (tweet 1): [Curiosity gap or bold opener]
- Thread body (5-10 tweets): [Each tweet is one complete thought]
- Thread closer: [CTA — follow, reply, retweet trigger]
- Source atoms: [Which content atoms feed this thread]
Rules: No tweet over 280 chars. Each tweet must stand alone. Use data points as proof.
3c. LinkedIn Article Draft (1 per episode)
- Headline: [Specific, benefit-driven]
- Hook paragraph: [Before the "see more" fold — must earn the click]
- Body: [3-5 sections with headers, 800-1200 words]
- CTA: [Engagement driver — question, not link]
- Hashtags: [3-5 relevant, not spammy]
Voice: Professional but not corporate. First-person. Story-driven.
3d. Newsletter Section (1 per episode)
- Section headline: [Scannable, specific]
- TL;DR: [One sentence, the core insight]
- Body: [3-5 bullet points, each with a takeaway]
- Pull quote: [The most shareable line from the episode]
- Link: [Back to full episode]
3e. Quote Cards (3-5 per episode)
- Quote text: [Max 20 words — must work as text overlay]
- Attribution: [Speaker name]
- Background suggestion: [Color/mood that matches the tone]
- Platform sizing: [1080x1080 for IG, 1200x675 for Twitter, 1080x1920 for Stories]
3f. Blog Post Outline (1 per episode)
- Title: [SEO-optimized, includes primary keyword]
- Primary keyword: [Search volume + difficulty estimate]
- Secondary keywords: [3-5 related terms]
- Meta description: [155 chars max]
- H2 sections: [5-7, each maps to a content atom]
- Internal linking opportunities: [Topics that connect to existing content]
- Estimated word count: [1500-2500]
3g. YouTube Shorts / TikTok Script (1 per episode)
- HOOK (0-3s): [Pattern interrupt — question, bold claim, or visual]
- SETUP (3-15s): [Context — why should they care]
- PAYOFF (15-45s): [The insight, data, or story resolution]
- CTA (45-60s): [Follow, comment prompt, or part 2 tease]
- On-screen text: [Key phrases to overlay]
- B-roll suggestions: [Visual ideas if not talking-head]
Step 4: Content Scoring — Viral Potential
Score every generated piece on three dimensions (each 0-100):
| Dimension | What It Measures | Signals |
|---|---|---|
| Novelty | Is this new or surprising? | Contrarian takes, unexpected data, first-to-say |
| Controversy | Will people argue about this? | Strong opinions, challenges norms, picks a side |
| Utility | Can someone use this immediately? | Frameworks, how-tos, templates, specific numbers |
Viral Score = (Novelty × 0.4) + (Controversy × 0.3) + (Utility × 0.3)
Score thresholds:
- 80+ → Priority publish. Schedule for peak engagement windows.
- 60-79 → Solid content. Fill the calendar.
- 40-59 → Filler. Use only if calendar has gaps.
- Below 40 → Cut it. Not worth the publish slot.
Step 5: Dedup Engine
Before finalizing, check all generated content against:
- This batch — No two pieces should cover the same angle
- Recent history — Compare against last N days of output (default: 30)
- Similarity threshold — Flag any pair with >70% semantic overlap
Dedup rules:
- If two pieces overlap >70%: keep the higher-scored one, cut the other
- If a piece overlaps with recently published content: flag with ⚠️ and suggest a differentiation angle
- Track all published content hashes in
output/content_history.json
Step 6: Calendar Generation (--calendar)
Assemble scored, deduplicated content into a weekly publish calendar.
Scheduling rules:
- Twitter/X: 1-2 per day, peak hours (8-10am, 12-1pm, 5-7pm ET)
- LinkedIn: 1 per day max, Tuesday-Thursday mornings
- YouTube Shorts/TikTok: 1 per day, evenings
- Newsletter: Weekly, same day each week
- Blog: 1-2 per week
- Quote cards: Intersperse on low-content days
Calendar output format:
{
"week_of": "2024-01-15",
"episode_source": "Episode Title - Guest Name",
"content_pieces": [
{
"date": "2024-01-15",
"time": "09:00 ET",
"platform": "twitter",
"type": "thread",
"content": "...",
"viral_score": 85,
"status": "draft"
}
],
"total_pieces": 18,
"avg_viral_score": 72,
"coverage": {
"twitter": 6,
"linkedin": 3,
"youtube_shorts": 3,
"newsletter": 1,
"blog": 1,
"quote_cards": 4
}
}
Step 7: Output
All output goes to output/ directory:
output/
├── episodes/
│ ├── YYYY-MM-DD-episode-slug/
│ │ ├── transcript.txt
│ │ ├── atoms.json # Extracted content atoms
│ │ ├── content_pieces.json # All generated content
│ │ └── calendar.json # Scheduled calendar
│ └── ...
├── calendar/
│ └── week-YYYY-WNN.json # Aggregated weekly calendar
├── content_history.json # Dedup tracking
└── pipeline_log.json # Run history and stats
CLI Reference
# Process latest episode from RSS feed
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml"
# Process a local transcript
python podcast_pipeline.py --transcript episode-42.txt
# Batch process last 5 episodes
python podcast_pipeline.py --batch "https://feeds.example.com/podcast.xml" --episodes 5
# Generate weekly calendar from existing outputs
python podcast_pipeline.py --calendar
# Process with custom dedup window
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --dedup-days 60
# Process and only keep 80+ viral score content
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --min-score 80
Environment Variables
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
Yes (for Whisper) | OpenAI API key for audio transcription |
ANTHROPIC_API_KEY |
Yes (for generation) | Anthropic API key for content generation |
OPENAI_LLM_KEY |
Optional | Separate OpenAI key if using GPT for generation instead |
Reference Files
| File | Purpose |
|---|---|
podcast_pipeline.py |
Main pipeline script |
requirements.txt |
Python dependencies |
README.md |
Setup and usage guide |
More from ericosiu/ai-marketing-skills
expert-panel
>-
72finance-ops
AI-powered financial analysis suite. Generates executive CFO briefings from QuickBooks exports (P&L, Balance Sheet, General Ledger, Cash Flow, etc.) with anomaly detection, burn rate, runway analysis, and scenario modeling. Also estimates codebase development costs with organizational overhead and AI ROI analysis. Triggers on: 'CFO briefing', 'financial analysis', 'cost briefing', 'expense review', 'runway analysis', 'burn rate', 'cost estimate', 'how much would this cost to build', 'development cost', 'Claude ROI'.
67cold-outbound-optimizer
Design, analyze, and optimize cold outbound email campaigns for Instantly. Handles end-to-end ICP definition, expert panel scoring (recursive to 90+), sequence copywriting, infrastructure audit, capacity planning, and implementation docs. Use when asked to build cold outbound sequences, optimize cold email, analyze outbound campaigns, build sales sequences, build Instantly sequences, create cold outbound strategies, or design email campaigns. Supports both "start from scratch" and "optimize existing" modes.
67yt-competitive-analysis
>-
62x-longform-post
Write long-form X (Twitter) posts and threads in a founder/CEO voice. Use when drafting X articles, long tweets, thought leadership threads, or viral content. Produces contrarian, data-backed posts with ASCII diagrams and code block visuals. Includes mandatory AI humanizer pass (24-pattern detector) before finalizing.
57autoresearch
Run Karpathy-style autoresearch optimization on any content. Generates 50+ variants, scores with a 5-expert simulated panel, evolves winners through multiple rounds, outputs optimized version + full experiment log. Use when optimizing landing pages, email sequences, ad copy, headlines, form pages, CTA text, or any conversion-focused content. Triggers on "optimize this page", "run autoresearch", "score these variants", "A/B test this copy".
56