instagram-reel-extractor

Installation
SKILL.md

Instagram Reel Extractor

Extract the spoken transcript, metadata, music, comments, and key frames from an Instagram Reel.

Prerequisites

The following must be installed on the user's machine:

  • yt-dlpbrew install yt-dlp
  • ffmpegbrew install ffmpeg
  • Python packagespip3 install openai-whisper ShazamAPI

Or run the setup script: bash setup.sh

If any dependency is missing, tell the user which ones to install before proceeding.

Usage

Run the extraction script with the Reel URL:

python3 scripts/extract_reel.py "REEL_URL"

Save the extraction to a specific directory:

python3 scripts/extract_reel.py "REEL_URL" --save-dir ~/notes/reels

For better transcription accuracy (slower, uses more memory):

python3 scripts/extract_reel.py "REEL_URL" --whisper-model small

For raw JSON output:

python3 scripts/extract_reel.py "REEL_URL" --json

Skip frame extraction (faster, audio/metadata only):

python3 scripts/extract_reel.py "REEL_URL" --no-frames

Adjust frame interval (default every 2 seconds):

python3 scripts/extract_reel.py "REEL_URL" --frame-interval 1.0

What Gets Extracted

  • Original URL — link back to the source reel (always included)
  • Creator — username and handle
  • Metrics — likes, comments count (as of extraction date)
  • Music — song title, artist, album, genre, Shazam link (via audio fingerprinting)
  • Caption — the original post caption
  • Hashtags — all tags from the post
  • Top Comments — comment text with author and like count
  • Duration — video length
  • Upload date
  • Transcript — full spoken text from the audio, with timestamps
  • Language — auto-detected language of the speech
  • Key frames — screenshots extracted every N seconds, saved as JPGs with timestamps

Output

The script outputs a structured markdown summary with the original reel URL at the top. Present this to the user as-is — do not modify the transcript text itself, though you can clean up line breaks and paragraph flow for readability.

The frame images are saved to a frames/ subdirectory inside the working directory. When presenting the extraction, use the Read tool to view the frame images so you can describe what's happening visually at each timestamp (talking head, B-roll, text overlay, product shot, etc.).

When --save-dir is provided, the extraction is automatically saved as <creator>-reel-<upload_date>.md. If a file already exists, a number is appended (e.g., garyvee-reel-2026-04-08-2.md).

Configuring a Default Save Location

To always save extractions to a specific folder, tell Claude: "save reel extractions to ~/my/folder". Claude will pass --save-dir automatically on future runs.

Whisper Model Sizes

Model Speed Accuracy Memory
tiny Fastest Lower ~1 GB
base Fast Good ~1 GB
small Moderate Better ~2 GB
medium Slow Great ~5 GB
large Slowest Best ~10 GB

Default is base — a good balance for short-form reel content. Suggest small if the user reports transcription quality issues.

Troubleshooting

  • Login required errors: Some reels may require authentication. Pass --cookies-from chrome (or firefox) to use browser cookies.
  • No speech detected: The reel may be music-only or use on-screen text instead of speech. Let the user know.
  • Slow transcription: Whisper runs on CPU by default. On Apple Silicon Macs, it uses the Neural Engine automatically. For faster runs, suggest the tiny model.
  • Music not identified: Shazam works best with distinct background music. Speech-heavy reels with no music will return no match.
Installs
17
GitHub Stars
6
First Seen
Apr 8, 2026