media-files-conversion-ffmpeg
FFmpeg Skill
Natural language FFmpeg operations with opinionated best practices for speed, reliability, and quality.
Core Principles
- Stream copy by default - When converting video containers, use
-c copyto avoid re-encoding (10x faster, no quality loss) - CPU-only encoding - No GPU encoders (more reliable, hardware-agnostic)
- Fast presets - Default to
fastpreset for x264/x265 (good balance of speed/quality) - Context-aware audio - Ask about transcription when converting to MP3 (Whisper prefers 16kHz mono)
Common Operations
Audio Extraction
When extracting audio from video:
# Standard audio extraction (AAC/MP3)
ffmpeg -i input.mp4 -vn -acodec copy output.aac
# For transcription (Whisper-optimized)
ffmpeg -i input.mp4 -vn -ar 16000 -ac 1 -c:a libmp3lame -b:a 64k output.mp3
Always ask: "Is this for transcription?" before choosing the format.
Video Conversion
When converting video containers (e.g., MKV → MP4):
# Fast conversion (stream copy - no re-encoding)
ffmpeg -i input.mkv -c copy output.mp4
If stream copy fails (incompatible codecs), fall back to re-encoding:
# Fallback: re-encode with fast preset
ffmpeg -i input.mkv -c:v libx264 -preset fast -crf 23 -c:a aac -b:a 128k output.mp4
Trimming
When trimming video (preserving quality):
# Fast trim (stream copy)
ffmpeg -ss 00:01:30 -to 00:02:45 -i input.mp4 -c copy output.mp4
# Accurate trim (re-encode if needed)
ffmpeg -i input.mp4 -ss 00:01:30 -to 00:02:45 -c:v libx264 -preset fast -crf 23 -c:a copy output.mp4
Resizing
When resizing video:
# Resize to 720p (maintains aspect ratio)
ffmpeg -i input.mp4 -vf scale=-2:720 -c:v libx264 -preset fast -crf 23 -c:a copy output.mp4
# Resize to specific width (maintains aspect ratio)
ffmpeg -i input.mp4 -vf scale=1280:-2 -c:v libx264 -preset fast -crf 23 -c:a copy output.mp4
Compression
For WhatsApp/Telegram:
# Aggressive compression (under 50MB target)
ffmpeg -i input.mp4 -vf scale=-2:480 -c:v libx264 -preset fast -crf 28 -b:a 64k -ac 1 output.mp4
General compression:
# Balanced compression
ffmpeg -i input.mp4 -c:v libx264 -preset fast -crf 28 -c:a aac -b:a 96k output.mp4
Decision Tree
Audio Conversion
- Ask: "What's the purpose?" (listening, transcription, archival)
- For transcription: Use 16kHz mono MP3
- For listening: Use AAC or high-quality MP3
- For archival: Use FLAC or original codec with stream copy
Video Conversion
- Check: Is this just a container change? (e.g., MKV → MP4)
- YES → Use stream copy
- NO → Continue
- Check: Does the user want quality preservation or compression?
- Preservation → Use CRF 18-23
- Compression → Use CRF 26-30
- Check: Is speed critical?
- YES → Use
veryfastpreset - NO → Use
fastpreset
- YES → Use
Helper Script
Use scripts/ffmpeg_helper.py for common operations:
# Extract audio (auto-detects purpose)
python3 scripts/ffmpeg_helper.py extract-audio input.mp4 --ask-purpose
# Convert video (smart defaults)
python3 scripts/ffmpeg_helper.py convert input.mkv output.mp4
# Trim video (fast mode by default)
python3 scripts/ffmpeg_helper.py trim input.mp4 output.mp4 --start 00:01:30 --end 00:02:45
# Resize video
python3 scripts/ffmpeg_helper.py resize input.mp4 output.mp4 --height 720
# Compress for messaging
python3 scripts/ffmpeg_helper.py compress input.mp4 output.mp4 --target whatsapp
Presets Reference
See references/presets.md for detailed preset explanations and use cases.
Error Handling
When stream copy fails:
- Inform the user: "Stream copy failed (incompatible codecs). Re-encoding with fast preset..."
- Retry with re-encoding
- Show the command used for transparency
When output is too large:
- Suggest compression options
- Offer CRF adjustment (higher = smaller file)
- Offer resolution downscaling
More from textops/textops-skills
transcription-speech-to-text-hebrew
Transcribe audio or video files using the TextOps API. Use this skill whenever the user wants to transcribe a video or audio file, mentions an mp4/mp3/wav/m4a file and wants text out of it, asks for transcription or תמלול, or wants to convert spoken audio to text. Always trigger this skill even if the user just says "תמלל את זה" or "I want to transcribe this file". Also trigger this skill when the user asks what this skill can do, what features it has, "מה אתה יכול לעשות?", "what can you do?", or any similar capability question.
15hebrew-tech-lecture-summary
Summarize any content — lectures, meetings, articles, transcriptions, or any text — into structured Hebrew Markdown. Use when the user asks to summarize anything סכם לי, תסכם, סיכום, summarize, meeting notes, or any request to condense content. Output is ALWAYS in Hebrew regardless of input language.
7media-fixing-and-repair
Diagnose and repair media files (video/audio) using FFmpeg and FFprobe. Use this skill whenever the user mentions a broken, corrupted, or problematic media file, reports audio/video sync issues, wants to fix a video that won't play, asks about FFmpeg commands for fixing media, mentions glitches, audio desync, missing frames, corrupt MP4/MOV/MKV/AVI files, VFR problems, audio channel issues, or wants to inspect what's wrong with any media file. Trigger even if the user just says "הווידאו לא מסתנכרן", "הקובץ פגום", "fix my video", "check this file", or "מה הבעיה עם הקובץ הזה".
3