podcast-pipeline

Installation

SKILL.md

Preamble (runs on skill start)

# Version check (silent if up to date)
python3 telemetry/version_check.py 2>/dev/null || true

# Telemetry opt-in (first run only, then remembers your choice)
python3 telemetry/telemetry_init.py 2>/dev/null || true

Privacy: This skill logs usage locally to ~/.ai-marketing-skills/analytics/. Remote telemetry is opt-in only. No code, file paths, or repo content is ever collected. See telemetry/README.md.

Podcast-to-Everything Pipeline

Turns podcast episodes into a full content calendar across every platform. One episode in, 15-20 content pieces out — scored, deduplicated, and scheduled.

Step 1: Ingest — Get the Transcript

Determine the input source and obtain a clean transcript.

Option A: RSS Feed (`--rss <url>`)

Fetch the RSS feed XML
Extract the latest episode's audio URL (or use --episodes N for batch)
Download the audio file
Transcribe via OpenAI Whisper API (with timestamps)
Store transcript with episode metadata (title, date, description, duration)

Option B: Raw Transcript (`--transcript <file>`)

Read the transcript file (plain text, SRT, or VTT)
Parse timestamps if present
Extract episode metadata from filename or prompt user

Option C: Batch Mode (`--batch <rss_url> --episodes N`)

Fetch RSS feed
Extract the last N episodes
Process each through the full pipeline
Deduplicate across all episodes in the batch

Transcript cleanup

Remove filler words (um, uh, like, you know) for written content
Preserve original with timestamps for video clip suggestions
Split into logical segments by topic shift

Step 2: Editorial Brain — Deep Analysis

Feed the full transcript to the LLM with this extraction framework:

Extract these content atoms:

Narrative Arcs — Complete story segments with setup → tension → resolution. Tag with start/end timestamps.
Quotable Moments — Punchy, shareable statements. One-liners that stand alone. Must pass the "would someone screenshot this?" test.
Controversial Takes — Opinions that go against conventional wisdom. The stuff that makes people reply "hard disagree" or "finally someone said it."
Data Points — Specific numbers, percentages, dollar amounts, timeframes. Concrete proof points that add credibility.
Stories — Personal anecdotes, case studies, client examples. Must have a character, a problem, and an outcome.
Frameworks — Step-by-step processes, mental models, decision matrices. Anything structured that people would save or bookmark.
Predictions — Forward-looking claims about trends, markets, technology. Hot takes about where things are going.

Output format per atom:

- Type: [narrative_arc | quote | controversial_take | data_point | story | framework | prediction]
- Content: [extracted text]
- Timestamp: [start - end, if available]
- Context: [what was being discussed]
- Viral Score: [0-100, see Step 4]
- Suggested platforms: [where this atom works best]

Step 3: Content Generation — One Episode, Many Pieces

For each episode, generate ALL of these from the extracted atoms:

3a. Short-Form Video Clips (3-5 per episode)

- Hook: [First 3 seconds — pattern interrupt or bold claim]
- Clip segment: [Timestamp range from transcript]
- Caption overlay: [Text for the screen]
- Platform: [YouTube Shorts / TikTok / Instagram Reels]
- Why it works: [What makes this clippable]

Prioritize: controversial takes > stories with payoffs > surprising data points

3b. Twitter/X Threads (2-3 per episode)

- Thread hook (tweet 1): [Curiosity gap or bold opener]
- Thread body (5-10 tweets): [Each tweet is one complete thought]
- Thread closer: [CTA — follow, reply, retweet trigger]
- Source atoms: [Which content atoms feed this thread]

Rules: No tweet over 280 chars. Each tweet must stand alone. Use data points as proof.

3c. LinkedIn Article Draft (1 per episode)

- Headline: [Specific, benefit-driven]
- Hook paragraph: [Before the "see more" fold — must earn the click]
- Body: [3-5 sections with headers, 800-1200 words]
- CTA: [Engagement driver — question, not link]
- Hashtags: [3-5 relevant, not spammy]

Voice: Professional but not corporate. First-person. Story-driven.

3d. Newsletter Section (1 per episode)

- Section headline: [Scannable, specific]
- TL;DR: [One sentence, the core insight]
- Body: [3-5 bullet points, each with a takeaway]
- Pull quote: [The most shareable line from the episode]
- Link: [Back to full episode]

3e. Quote Cards (3-5 per episode)

- Quote text: [Max 20 words — must work as text overlay]
- Attribution: [Speaker name]
- Background suggestion: [Color/mood that matches the tone]
- Platform sizing: [1080x1080 for IG, 1200x675 for Twitter, 1080x1920 for Stories]

3f. Blog Post Outline (1 per episode)

- Title: [SEO-optimized, includes primary keyword]
- Primary keyword: [Search volume + difficulty estimate]
- Secondary keywords: [3-5 related terms]
- Meta description: [155 chars max]
- H2 sections: [5-7, each maps to a content atom]
- Internal linking opportunities: [Topics that connect to existing content]
- Estimated word count: [1500-2500]

3g. YouTube Shorts / TikTok Script (1 per episode)

- HOOK (0-3s): [Pattern interrupt — question, bold claim, or visual]
- SETUP (3-15s): [Context — why should they care]
- PAYOFF (15-45s): [The insight, data, or story resolution]
- CTA (45-60s): [Follow, comment prompt, or part 2 tease]
- On-screen text: [Key phrases to overlay]
- B-roll suggestions: [Visual ideas if not talking-head]

Step 4: Content Scoring — Viral Potential

Score every generated piece on three dimensions (each 0-100):

Dimension	What It Measures	Signals
Novelty	Is this new or surprising?	Contrarian takes, unexpected data, first-to-say
Controversy	Will people argue about this?	Strong opinions, challenges norms, picks a side
Utility	Can someone use this immediately?	Frameworks, how-tos, templates, specific numbers

Viral Score = (Novelty × 0.4) + (Controversy × 0.3) + (Utility × 0.3)

Score thresholds:

80+ → Priority publish. Schedule for peak engagement windows.
60-79 → Solid content. Fill the calendar.
40-59 → Filler. Use only if calendar has gaps.
Below 40 → Cut it. Not worth the publish slot.

Step 5: Dedup Engine

Before finalizing, check all generated content against:

This batch — No two pieces should cover the same angle
Recent history — Compare against last N days of output (default: 30)
Similarity threshold — Flag any pair with >70% semantic overlap

Dedup rules:

If two pieces overlap >70%: keep the higher-scored one, cut the other
If a piece overlaps with recently published content: flag with ⚠️ and suggest a differentiation angle
Track all published content hashes in output/content_history.json

Step 6: Calendar Generation (`--calendar`)

Assemble scored, deduplicated content into a weekly publish calendar.

Scheduling rules:

Twitter/X: 1-2 per day, peak hours (8-10am, 12-1pm, 5-7pm ET)
LinkedIn: 1 per day max, Tuesday-Thursday mornings
YouTube Shorts/TikTok: 1 per day, evenings
Newsletter: Weekly, same day each week
Blog: 1-2 per week
Quote cards: Intersperse on low-content days

Calendar output format:

{
  "week_of": "2024-01-15",
  "episode_source": "Episode Title - Guest Name",
  "content_pieces": [
    {
      "date": "2024-01-15",
      "time": "09:00 ET",
      "platform": "twitter",
      "type": "thread",
      "content": "...",
      "viral_score": 85,
      "status": "draft"
    }
  ],
  "total_pieces": 18,
  "avg_viral_score": 72,
  "coverage": {
    "twitter": 6,
    "linkedin": 3,
    "youtube_shorts": 3,
    "newsletter": 1,
    "blog": 1,
    "quote_cards": 4
  }
}

Step 7: Output

All output goes to output/ directory:

output/
├── episodes/
│   ├── YYYY-MM-DD-episode-slug/
│   │   ├── transcript.txt
│   │   ├── atoms.json          # Extracted content atoms
│   │   ├── content_pieces.json # All generated content
│   │   └── calendar.json       # Scheduled calendar
│   └── ...
├── calendar/
│   └── week-YYYY-WNN.json     # Aggregated weekly calendar
├── content_history.json        # Dedup tracking
└── pipeline_log.json           # Run history and stats

CLI Reference

# Process latest episode from RSS feed
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml"

# Process a local transcript
python podcast_pipeline.py --transcript episode-42.txt

# Batch process last 5 episodes
python podcast_pipeline.py --batch "https://feeds.example.com/podcast.xml" --episodes 5

# Generate weekly calendar from existing outputs
python podcast_pipeline.py --calendar

# Process with custom dedup window
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --dedup-days 60

# Process and only keep 80+ viral score content
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --min-score 80

Environment Variables

Variable	Required	Description
`OPENAI_API_KEY`	Yes (for Whisper)	OpenAI API key for audio transcription
`ANTHROPIC_API_KEY`	Yes (for generation)	Anthropic API key for content generation
`OPENAI_LLM_KEY`	Optional	Separate OpenAI key if using GPT for generation instead

Reference Files

File	Purpose
`podcast_pipeline.py`	Main pipeline script
`requirements.txt`	Python dependencies
`README.md`	Setup and usage guide

Related skills

More from ericosiu/ai-marketing-skills

Installs

Repository

ericosiu/ai-mar…g-skills

GitHub Stars

2.3K

First Seen

Apr 1, 2026

Security Audits

Gen Agent Trust HubWarn

SocketPass

SnykWarn

podcast-pipeline

Preamble (runs on skill start)

Podcast-to-Everything Pipeline

Step 1: Ingest — Get the Transcript

Option A: RSS Feed (`--rss <url>`)

Option B: Raw Transcript (`--transcript <file>`)

Option C: Batch Mode (`--batch <rss_url> --episodes N`)

Transcript cleanup

Step 2: Editorial Brain — Deep Analysis

Extract these content atoms:

Output format per atom:

Step 3: Content Generation — One Episode, Many Pieces

3a. Short-Form Video Clips (3-5 per episode)

3b. Twitter/X Threads (2-3 per episode)

3c. LinkedIn Article Draft (1 per episode)

3d. Newsletter Section (1 per episode)

3e. Quote Cards (3-5 per episode)

3f. Blog Post Outline (1 per episode)

3g. YouTube Shorts / TikTok Script (1 per episode)

Step 4: Content Scoring — Viral Potential

Score thresholds:

Step 5: Dedup Engine

Dedup rules:

Step 6: Calendar Generation (`--calendar`)

Scheduling rules:

Calendar output format:

Step 7: Output

CLI Reference

Environment Variables

Reference Files

More from ericosiu/ai-marketing-skills

expert-panel

finance-ops

cold-outbound-optimizer

yt-competitive-analysis

x-longform-post

autoresearch

podcast-pipeline

Preamble (runs on skill start)

Podcast-to-Everything Pipeline

Step 1: Ingest — Get the Transcript

Option A: RSS Feed (--rss <url>)

Option B: Raw Transcript (--transcript <file>)

Option C: Batch Mode (--batch <rss_url> --episodes N)

Transcript cleanup

Step 2: Editorial Brain — Deep Analysis

Extract these content atoms:

Output format per atom:

Step 3: Content Generation — One Episode, Many Pieces

3a. Short-Form Video Clips (3-5 per episode)

3b. Twitter/X Threads (2-3 per episode)

3c. LinkedIn Article Draft (1 per episode)

3d. Newsletter Section (1 per episode)

3e. Quote Cards (3-5 per episode)

3f. Blog Post Outline (1 per episode)

3g. YouTube Shorts / TikTok Script (1 per episode)

Step 4: Content Scoring — Viral Potential

Score thresholds:

Step 5: Dedup Engine

Dedup rules:

Step 6: Calendar Generation (--calendar)

Scheduling rules:

Calendar output format:

Step 7: Output

CLI Reference

Environment Variables

Reference Files

More from ericosiu/ai-marketing-skills

expert-panel

finance-ops

cold-outbound-optimizer

yt-competitive-analysis

x-longform-post

autoresearch

Option A: RSS Feed (`--rss <url>`)

Option B: Raw Transcript (`--transcript <file>`)

Option C: Batch Mode (`--batch <rss_url> --episodes N`)

Step 6: Calendar Generation (`--calendar`)