transcribe-video
Transcribe Video
Extract transcript text from a local video file. The skill checks for embedded subtitles first (faster and more accurate), and only falls back to API-based speech recognition if none are found.
Step 1: Identify the video file
Confirm the video file path with the user. Supported formats: mp4, mkv, mov, avi, webm, and any format ffmpeg can handle.
Step 2: Check for embedded subtitles
ffprobe -v quiet -select_streams s -show_entries stream=index,codec_name:stream_tags=language,title -of json "<video_path>"
- If subtitle streams exist → go to Step 3a (extract embedded subtitles)
- If no subtitle streams → go to Step 3b (API transcription)
Step 3a: Extract embedded subtitles
If multiple subtitle tracks exist, prefer the one matching the video's primary language or ask the user which track to use.
# Extract as SRT (stream index 0 for first subtitle track; adjust if needed)
ffmpeg -i "<video_path>" -map 0:s:0 -c:s srt "<output_path>.srt" -y
After extraction, convert SRT to clean text:
- Remove sequence numbers
- Remove timestamp lines (lines matching
\d{2}:\d{2}:\d{2}) - Remove HTML-like tags (
<i>,</i>, etc.) - Join remaining non-empty lines
Save the clean transcript to <video_name>.txt next to the video file. Done — skip Step 3b.
Step 3b: API-based transcription
Use the bundled transcription script. It reads credentials from ~/.transcribe_video.env.
Prerequisites check
-
Verify the env file exists:
test -f ~/.transcribe_video.env && echo "OK" || echo "MISSING" -
If MISSING, tell the user to create
~/.transcribe_video.envwith:OPENAI_API_KEY=your-key-here # Optional Base URL: # OPENAI_API_BASE=https://<base-url>/v1/ # Optional Model Name: # TRANSCRIBE_MODEL=gpt-4o-transcribeWait for the user to confirm before proceeding.
-
Verify dependencies:
python3 -c "from openai import OpenAI; from dotenv import load_dotenv; print('OK')" 2>&1If missing:
pip install openai python-dotenv
Run transcription
python3 <skill_directory>/scripts/transcribe.py "<video_path>"
The script extracts audio (WAV, 16kHz mono), sends it to the API, and saves the transcript to <video_name>.txt next to the video file.
Step 4: Report results
Tell the user:
- Where the transcript file was saved
- How many lines / approximate word count
- Whether it came from embedded subtitles or API transcription
- Display the first few lines as a preview
More from feiskyer/video-skills
narrate-video
Generate professional voiceover narration for a video with audio-video sync using Azure TTS by default, or Gemini 3.1 Flash TTS when configured. Use this skill whenever the user wants to add narration, voiceover, commentary, or voice dubbing to any video file — even if they just say "add audio to this video" or "make a narrated version." Also trigger when the user has a screen recording, demo, tutorial, or presentation video that needs a voice track. Trigger on Chinese requests like "视频配音", "给视频加旁白", "录屏解说", "视频加语音", "视频添加声音", "生成视频旁白", "自动配音", "视频解说词".
34download-video
Download videos from 1000+ websites (YouTube, Bilibili, Twitter/X, TikTok, Vimeo, Instagram, Twitch, etc.) using yt-dlp. Use this skill whenever a user shares a video URL, asks to save or download a video, wants to extract audio from an online video, needs a specific quality like 1080p or 4K, or mentions downloading a playlist. Also trigger on "下载视频", "保存视频", "提取音频", or any URL from a supported video platform.
31