markdown-video

SKILL.md

Markdown Video Skill

Convert markdown slides to presentation video with AI-generated visuals and TTS audio narration.

When to Use This Skill

Activate this skill when the user:

  • Asks to create video from markdown slides
  • Requests to convert presentation to MP4 format
  • Wants to generate narrated video from slides
  • Needs automated slide-to-video conversion

Key Features

  • Gemini AI-generated visuals: High-quality slide images with full emoji and Korean support
  • OpenAI TTS narration: Natural voice from speaker notes
  • Delta updates: Only regenerates changed slides (saves time and API costs)
  • Multiple visual styles: technical-diagram, professional, vibrant-cartoon, watercolor

Input Requirements

  • Markdown file with speaker notes marked with ^ prefix
  • GEMINI_API_KEY environment variable for image generation
  • OPENAI_API_KEY environment variable for TTS audio

Output Specifications

  • MP4 video: 1920x1080 (Full HD)
  • Duration: Each slide displays for duration of its audio narration
  • File naming: {input_filename}.mp4

Workflow

Step 1: Generate Audio Files

cd "{slides_directory}"
python /Users/lifidea/.claude/skills/markdown-video/generate_audio.py "{slides_filename}" --output-dir "audio"

Delta update: Only regenerates audio for slides with changed speaker notes.

  • Use --force to regenerate all audio files

Output:

  • audio/slide_0.mp3, slide_1.mp3, ... (0-indexed)
  • Cache file: audio/.audio_cache.json

Step 2: Generate Slide Images with Gemini

cd "{slides_directory}"
python /Users/lifidea/.claude/skills/markdown-video/create_slides_gemini.py "{slides_filename}" \
  --output-dir "slides-gemini" \
  --style "technical-diagram" \
  --auto-approve

Delta update: Only regenerates images for slides with changed content.

  • Use --force to regenerate all slide images

Style Options:

Style Description Best For
technical-diagram Clean lines, infographic icons, muted blue/gray Technical, education
professional Minimalist, geometric shapes Corporate, formal
vibrant-cartoon Bright gradients, flat design Marketing, startups
watercolor Soft pastels, organic shapes Creative, personal

Other Parameters:

  • --model: Gemini model (default: gemini-3-pro-image-preview)
  • --aspect-ratio: 16:9 (default), 1:1, 9:16, 4:3, 3:4
  • --start-from N: Resume from slide N
  • --dry-run: Preview prompts without generating

Output:

  • slides-gemini/1.jpeg, 2.jpeg, ... (1-indexed)
  • Cache file: slides-gemini/.slides_cache.json

Step 3: Create Final Video

cd "{slides_directory}"
python /Users/lifidea/.claude/skills/markdown-video/slides_to_video.py \
  --slides-dir "slides-gemini" \
  --audio-dir "audio" \
  --output "{output_filename}.mp4"

Delta Updates

Both audio and image generation support delta updates - only regenerating what changed.

How It Works

  1. Content hashing: Each slide's content is hashed (MD5)
  2. Cache storage: Hashes stored in .audio_cache.json / .slides_cache.json
  3. Change detection: On subsequent runs, only changed slides are regenerated
  4. File verification: Also checks if output file exists

Example Output

✅ Found 20 slides
   20 slides with speaker notes

✨ Delta update: 17 slides unchanged, 3 to regenerate

🎵 Generating 3 audio files...
Progress |████████████████████████████████████████| 3/3 (100.0%)

✅ Audio generation complete!
   Generated: 3/3 files
   Unchanged: 17 files (skipped)

Force Regeneration

To ignore cache and regenerate everything:

# Force regenerate all audio
python generate_audio.py "slides.md" --output-dir "audio" --force

# Force regenerate all images
python create_slides_gemini.py "slides.md" --output-dir "slides-gemini" --force

Quick Reference

Full Workflow (First Run)

cd "{slides_directory}"

# Step 1: Generate audio
python /Users/lifidea/.claude/skills/markdown-video/generate_audio.py "slides.md" --output-dir "audio"

# Step 2: Generate slide images
python /Users/lifidea/.claude/skills/markdown-video/create_slides_gemini.py "slides.md" \
  --output-dir "slides-gemini" \
  --style "technical-diagram" \
  --auto-approve

# Step 3: Create video
python /Users/lifidea/.claude/skills/markdown-video/slides_to_video.py \
  --slides-dir "slides-gemini" \
  --audio-dir "audio" \
  --output "presentation.mp4"

Update Workflow (After Changes)

Same commands - delta updates are automatic:

# Only regenerates changed slides
python generate_audio.py "slides.md" --output-dir "audio"
python create_slides_gemini.py "slides.md" --output-dir "slides-gemini" --auto-approve
python slides_to_video.py --slides-dir "slides-gemini" --audio-dir "audio" --output "presentation.mp4"

Requirements

System Dependencies

  • Python 3.7+
  • ffmpeg: brew install ffmpeg

Python Packages

pip install Pillow requests google-genai

Environment Variables

export OPENAI_API_KEY="sk-..."
export GEMINI_API_KEY="..."

Cost Estimation

Component Cost Example (20 slides)
Gemini images ~$0.04/slide ~$0.80
OpenAI TTS ~$0.015/1K chars ~$0.50
Total ~$1.30

With delta updates, subsequent runs only cost for changed slides.


Error Handling

No speaker notes found

  • Slides need ^ prefixed speaker notes for narration
  • Example: ^ This is the speaker note for this slide.

Pronunciation problems

  • Replace technical terms with phonetic equivalents in speaker notes
  • Test with --dry-run first

API errors

  • Check API key environment variables
  • Gemini rate limits: script includes 1-second delay between generations

Quality Checklist

Before marking complete:

  • OpenAI and Gemini API keys configured
  • Markdown file has speaker notes with ^ prefix
  • Audio files generated successfully
  • Slide images generated successfully
  • Video plays correctly with synced audio
  • Resolution is 1920x1080

Image Generation Mode

Two approaches for generating visuals:

Mode Script Best For
Slide-by-Slide create_slides_gemini.py Standard presentations, precise control
Section-based generate_section_images.py Long presentations, infographic style

When to Use Section-Based

  • Presentations with 20+ slides
  • Content naturally groups into logical sections
  • Prefer infographic overview per section vs. individual slides
  • Want to reduce API costs (fewer images)

Section-Based Workflow (Alternative)

For presentations with many slides, generate one infographic image per section instead of per slide.

Comparison

Aspect Slide-by-Slide Section-Based
Images 1 per slide 1 per section
Audio Per slide Per slide → merged by section
Review Direct in markdown Video script document
Best for Short presentations Long presentations (20+ slides)

Step 1: Generate Audio Files

Same as standard workflow:

cd "{slides_directory}"
python /Users/lifidea/.claude/skills/markdown-video/generate_audio.py "{slides_filename}" --output-dir "audio"

Step 2: Generate Section Infographic Images

cd "{slides_directory}"
python /Users/lifidea/.claude/skills/markdown-video/generate_section_images.py "{slides_filename}" \
  --output-dir "slides-section" \
  --style "infographic"

Style Options:

Style Description
infographic Clean professional with icons (default)
professional Minimalist corporate design
vibrant Bright gradients for marketing
technical Flowcharts and technical diagrams

Other Parameters:

  • --start-from N: Resume from section N
  • --force: Regenerate all images
  • --dry-run: Preview sections without generating
  • --delay N: Seconds between API calls (default: 2.0)

Step 3: Create Video Script (Optional)

Generate a markdown document for reviewing narration:

cd "{slides_directory}"
python /Users/lifidea/.claude/skills/markdown-video/create_video_script.py "{slides_filename}" \
  --output "video_script.md" \
  --image-dir "slides-section"

The script document shows:

  • Section images embedded
  • Speaker notes for each slide
  • Easy editing format

Step 4: Review & Edit Narration

  1. Open video_script.md
  2. Review narration in blockquotes
  3. Edit directly in the document
  4. Update original markdown file with changes
  5. Regenerate audio for changed slides:
    python generate_audio.py "slides.md" --output-dir "audio"
    

Step 5: Create Section-Based Video

cd "{slides_directory}"
python /Users/lifidea/.claude/skills/markdown-video/create_section_video.py \
  --slides "{slides_filename}" \
  --audio-dir "audio" \
  --image-dir "slides-section" \
  --output "presentation.mp4"

With config file (for custom section mappings):

python create_section_video.py \
  --config "sections.json" \
  --audio-dir "audio" \
  --image-dir "slides-section" \
  --output "presentation.mp4"

Config file format:

{
  "sections": [
    {"id": 0, "name": "title", "audio_slides": [0]},
    {"id": 1, "name": "introduction", "audio_slides": [1, 2, 3]},
    {"id": 2, "name": "main_content", "audio_slides": [4, 5, 6, 7]}
  ]
}

Section-Based Quick Reference

cd "{slides_directory}"

# Step 1: Generate audio
python /Users/lifidea/.claude/skills/markdown-video/generate_audio.py "slides.md" --output-dir "audio"

# Step 2: Generate section images
python /Users/lifidea/.claude/skills/markdown-video/generate_section_images.py "slides.md" \
  --output-dir "slides-section" \
  --style "infographic"

# Step 3 (optional): Create review document
python /Users/lifidea/.claude/skills/markdown-video/create_video_script.py "slides.md" \
  --output "video_script.md" \
  --image-dir "slides-section"

# Step 4: Create final video
python /Users/lifidea/.claude/skills/markdown-video/create_section_video.py \
  --slides "slides.md" \
  --audio-dir "audio" \
  --image-dir "slides-section" \
  --output "presentation.mp4"
Weekly Installs
20
GitHub Stars
36
First Seen
Jan 24, 2026
Installed on
gemini-cli17
codex16
opencode16
cursor15
claude-code14
github-copilot13