podcast-pipeline

Installation
SKILL.md

Preamble (runs on skill start)

# Version check (silent if up to date)
python3 telemetry/version_check.py 2>/dev/null || true

# Telemetry opt-in (first run only, then remembers your choice)
python3 telemetry/telemetry_init.py 2>/dev/null || true

Privacy: This skill logs usage locally to ~/.ai-marketing-skills/analytics/. Remote telemetry is opt-in only. No code, file paths, or repo content is ever collected. See telemetry/README.md.


Podcast-to-Everything Pipeline

Turns podcast episodes into a full content calendar across every platform. One episode in, 15-20 content pieces out — scored, deduplicated, and scheduled.


Step 1: Ingest — Get the Transcript

Determine the input source and obtain a clean transcript.

Option A: RSS Feed (--rss <url>)

  1. Fetch the RSS feed XML
  2. Extract the latest episode's audio URL (or use --episodes N for batch)
  3. Download the audio file
  4. Transcribe via OpenAI Whisper API (with timestamps)
  5. Store transcript with episode metadata (title, date, description, duration)

Option B: Raw Transcript (--transcript <file>)

  1. Read the transcript file (plain text, SRT, or VTT)
  2. Parse timestamps if present
  3. Extract episode metadata from filename or prompt user

Option C: Batch Mode (--batch <rss_url> --episodes N)

  1. Fetch RSS feed
  2. Extract the last N episodes
  3. Process each through the full pipeline
  4. Deduplicate across all episodes in the batch

Transcript cleanup

  • Remove filler words (um, uh, like, you know) for written content
  • Preserve original with timestamps for video clip suggestions
  • Split into logical segments by topic shift

Step 2: Editorial Brain — Deep Analysis

Feed the full transcript to the LLM with this extraction framework:

Extract these content atoms:

  1. Narrative Arcs — Complete story segments with setup → tension → resolution. Tag with start/end timestamps.

  2. Quotable Moments — Punchy, shareable statements. One-liners that stand alone. Must pass the "would someone screenshot this?" test.

  3. Controversial Takes — Opinions that go against conventional wisdom. The stuff that makes people reply "hard disagree" or "finally someone said it."

  4. Data Points — Specific numbers, percentages, dollar amounts, timeframes. Concrete proof points that add credibility.

  5. Stories — Personal anecdotes, case studies, client examples. Must have a character, a problem, and an outcome.

  6. Frameworks — Step-by-step processes, mental models, decision matrices. Anything structured that people would save or bookmark.

  7. Predictions — Forward-looking claims about trends, markets, technology. Hot takes about where things are going.

Output format per atom:

- Type: [narrative_arc | quote | controversial_take | data_point | story | framework | prediction]
- Content: [extracted text]
- Timestamp: [start - end, if available]
- Context: [what was being discussed]
- Viral Score: [0-100, see Step 4]
- Suggested platforms: [where this atom works best]

Step 3: Content Generation — One Episode, Many Pieces

For each episode, generate ALL of these from the extracted atoms:

3a. Short-Form Video Clips (3-5 per episode)

- Hook: [First 3 seconds — pattern interrupt or bold claim]
- Clip segment: [Timestamp range from transcript]
- Caption overlay: [Text for the screen]
- Platform: [YouTube Shorts / TikTok / Instagram Reels]
- Why it works: [What makes this clippable]

Prioritize: controversial takes > stories with payoffs > surprising data points

3b. Twitter/X Threads (2-3 per episode)

- Thread hook (tweet 1): [Curiosity gap or bold opener]
- Thread body (5-10 tweets): [Each tweet is one complete thought]
- Thread closer: [CTA — follow, reply, retweet trigger]
- Source atoms: [Which content atoms feed this thread]

Rules: No tweet over 280 chars. Each tweet must stand alone. Use data points as proof.

3c. LinkedIn Article Draft (1 per episode)

- Headline: [Specific, benefit-driven]
- Hook paragraph: [Before the "see more" fold — must earn the click]
- Body: [3-5 sections with headers, 800-1200 words]
- CTA: [Engagement driver — question, not link]
- Hashtags: [3-5 relevant, not spammy]

Voice: Professional but not corporate. First-person. Story-driven.

3d. Newsletter Section (1 per episode)

- Section headline: [Scannable, specific]
- TL;DR: [One sentence, the core insight]
- Body: [3-5 bullet points, each with a takeaway]
- Pull quote: [The most shareable line from the episode]
- Link: [Back to full episode]

3e. Quote Cards (3-5 per episode)

- Quote text: [Max 20 words — must work as text overlay]
- Attribution: [Speaker name]
- Background suggestion: [Color/mood that matches the tone]
- Platform sizing: [1080x1080 for IG, 1200x675 for Twitter, 1080x1920 for Stories]

3f. Blog Post Outline (1 per episode)

- Title: [SEO-optimized, includes primary keyword]
- Primary keyword: [Search volume + difficulty estimate]
- Secondary keywords: [3-5 related terms]
- Meta description: [155 chars max]
- H2 sections: [5-7, each maps to a content atom]
- Internal linking opportunities: [Topics that connect to existing content]
- Estimated word count: [1500-2500]

3g. YouTube Shorts / TikTok Script (1 per episode)

- HOOK (0-3s): [Pattern interrupt — question, bold claim, or visual]
- SETUP (3-15s): [Context — why should they care]
- PAYOFF (15-45s): [The insight, data, or story resolution]
- CTA (45-60s): [Follow, comment prompt, or part 2 tease]
- On-screen text: [Key phrases to overlay]
- B-roll suggestions: [Visual ideas if not talking-head]

Step 4: Content Scoring — Viral Potential

Score every generated piece on three dimensions (each 0-100):

Dimension What It Measures Signals
Novelty Is this new or surprising? Contrarian takes, unexpected data, first-to-say
Controversy Will people argue about this? Strong opinions, challenges norms, picks a side
Utility Can someone use this immediately? Frameworks, how-tos, templates, specific numbers

Viral Score = (Novelty × 0.4) + (Controversy × 0.3) + (Utility × 0.3)

Score thresholds:

  • 80+ → Priority publish. Schedule for peak engagement windows.
  • 60-79 → Solid content. Fill the calendar.
  • 40-59 → Filler. Use only if calendar has gaps.
  • Below 40 → Cut it. Not worth the publish slot.

Step 5: Dedup Engine

Before finalizing, check all generated content against:

  1. This batch — No two pieces should cover the same angle
  2. Recent history — Compare against last N days of output (default: 30)
  3. Similarity threshold — Flag any pair with >70% semantic overlap

Dedup rules:

  • If two pieces overlap >70%: keep the higher-scored one, cut the other
  • If a piece overlaps with recently published content: flag with ⚠️ and suggest a differentiation angle
  • Track all published content hashes in output/content_history.json

Step 6: Calendar Generation (--calendar)

Assemble scored, deduplicated content into a weekly publish calendar.

Scheduling rules:

  • Twitter/X: 1-2 per day, peak hours (8-10am, 12-1pm, 5-7pm ET)
  • LinkedIn: 1 per day max, Tuesday-Thursday mornings
  • YouTube Shorts/TikTok: 1 per day, evenings
  • Newsletter: Weekly, same day each week
  • Blog: 1-2 per week
  • Quote cards: Intersperse on low-content days

Calendar output format:

{
  "week_of": "2024-01-15",
  "episode_source": "Episode Title - Guest Name",
  "content_pieces": [
    {
      "date": "2024-01-15",
      "time": "09:00 ET",
      "platform": "twitter",
      "type": "thread",
      "content": "...",
      "viral_score": 85,
      "status": "draft"
    }
  ],
  "total_pieces": 18,
  "avg_viral_score": 72,
  "coverage": {
    "twitter": 6,
    "linkedin": 3,
    "youtube_shorts": 3,
    "newsletter": 1,
    "blog": 1,
    "quote_cards": 4
  }
}

Step 7: Output

All output goes to output/ directory:

output/
├── episodes/
│   ├── YYYY-MM-DD-episode-slug/
│   │   ├── transcript.txt
│   │   ├── atoms.json          # Extracted content atoms
│   │   ├── content_pieces.json # All generated content
│   │   └── calendar.json       # Scheduled calendar
│   └── ...
├── calendar/
│   └── week-YYYY-WNN.json     # Aggregated weekly calendar
├── content_history.json        # Dedup tracking
└── pipeline_log.json           # Run history and stats

CLI Reference

# Process latest episode from RSS feed
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml"

# Process a local transcript
python podcast_pipeline.py --transcript episode-42.txt

# Batch process last 5 episodes
python podcast_pipeline.py --batch "https://feeds.example.com/podcast.xml" --episodes 5

# Generate weekly calendar from existing outputs
python podcast_pipeline.py --calendar

# Process with custom dedup window
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --dedup-days 60

# Process and only keep 80+ viral score content
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --min-score 80

Environment Variables

Variable Required Description
OPENAI_API_KEY Yes (for Whisper) OpenAI API key for audio transcription
ANTHROPIC_API_KEY Yes (for generation) Anthropic API key for content generation
OPENAI_LLM_KEY Optional Separate OpenAI key if using GPT for generation instead

Reference Files

File Purpose
podcast_pipeline.py Main pipeline script
requirements.txt Python dependencies
README.md Setup and usage guide
Related skills

More from ericosiu/ai-marketing-skills

Installs
63
GitHub Stars
2.3K
First Seen
Apr 1, 2026