whisper
whisper - Local Speech-to-Text & Subtitles
The whisper module provides a high-performance local speech recognition capability using whisper.cpp. It handles everything from model management to video subtitle merging.
When to Activate
- When the user wants to transcribe an audio file into text.
- When generating
.srtsubtitle files from audio/video. - When merging generated subtitles into a video file.
- When performing real-time speech-to-text using LiveKit or Streaming.
Core Principles & Rules
- Local Processing: Emphasize that transcription happens locally without uploading data.
- Model Selection: Allow users to choose from different model sizes (tiny, base, small, medium, large) for speed vs. accuracy.
- File Integrity: Ensure input audio files are accessible.
Additional Scenarios
- SRT Generation: Use
dictate --srtto create industry-standard subtitle files. - Video Integration: Use
mergeto embed subtitles into a video stream.
Patterns & Examples
Simple Transcription
# Interactively choose a model and transcribe an audio file
x whisper ./meeting_record.mp3
Generate Subtitles
# Create an SRT subtitle file from audio
x whisper dictate --srt -o my_subtitles ./interview.wav
Merge Subtitles
# Embed an SRT file into a video
x whisper merge ./subtitles.srt ./video.mp4
Checklist
- Confirm if the user has downloaded the required whisper model.
- Verify the audio file format is supported by whisper.cpp.
- Check if ffmpeg is available for the
mergesubcommand.
More from x-cmd/skill
x-cmd
|
25x-security
This skill provides comprehensive security assessment and vulnerability management tools through x-cmd CLI, including network reconnaissance with Shodan, vulnerability scanning with OSV, and known exploited vulnerability tracking with KEV. This skill should be used when users need to perform security assessments, vulnerability research, network reconnaissance, or security monitoring from command line interfaces.
13x-network
This skill provides comprehensive network administration and diagnostic tools through x-cmd CLI, including network scanning with Nmap, ARP table management, DNS configuration, routing table analysis, and enhanced ping utilities. This skill should be used when users need to perform network diagnostics, troubleshoot connectivity issues, analyze network topology, or monitor network performance from command line interfaces.
11x-knowledge
This skill provides access to various knowledge search tools through x-cmd CLI, including Hacker News, Wikipedia, DuckDuckGo search, RFC documents, Project Gutenberg books, and Stack Exchange. This skill should be used when users need to search for technical information, browse online knowledge bases, or access documentation from command line interfaces.
6x-git
This skill provides comprehensive Git and code hosting platform management tools through x-cmd CLI, including GitHub, GitLab, Codeberg, Forgejo integration, and Git hooks management. This skill should be used when users need to manage Git repositories, work with code hosting platforms, automate Git workflows, or configure Git hooks from command line interfaces.
6x-system
This skill provides comprehensive system administration and monitoring tools through x-cmd CLI, including process management, macOS system utilities, network configuration, disk health monitoring, and storage analysis. This skill should be used when users need to perform system administration tasks, monitor system performance, manage network configurations, or troubleshoot system issues from command line interfaces.
6