The Agent Skills Directory

[EXTERNAL_DOWNLOADS]: The skill fetches necessary tools and models from external repositories:
Downloads FFmpeg source code from ffmpeg.org and pre-built binaries for macOS from martin-riedl.de.
Clones the whisper.cpp source code from the official ggml-org GitHub repository.
Downloads the jq utility from the official jqlang GitHub releases.
Fetches various Whisper models from Hugging Face repositories (ggerganov, BELLE-2, and kotoba-tech).
All downloads are directed to official, trusted, or widely recognized community sources for the respective tools.
[COMMAND_EXECUTION]: Local system commands are used for media processing and build tasks:
Invokes ffmpeg to convert input audio files to 16kHz mono WAV format.
Executes whisper-cli to perform local inference for speech-to-text.
Runs cmake and make to compile whisper.cpp locally if the binary is missing.
Uses shasum to verify model integrity by comparing calculated SHA256 hashes against hardcoded values in scripts/_model_common.sh.
[PROMPT_INJECTION]: The skill presents an indirect prompt injection surface (Category 8):
Ingestion points: Processes user-provided audio files through transcribe.sh to generate text.
Boundary markers: The resulting transcription output (JSON and TXT) does not contain delimiters or system warnings to isolate the transcribed content.
Capability inventory: The skill writes transcription files to the data/ directory and utilizes temporary directories for processing.
Sanitization: No sanitization is performed on the transcribed text before it is returned.
Risk: This represents a standard risk for transcription tasks where spoken instructions in source audio could influence downstream reasoning if not properly handled by the consuming agent.

mk-youtube-audio-transcribe