mk-youtube-audio-transcribe
Pass
Audited by Gen Agent Trust Hub on Feb 25, 2026
Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [EXTERNAL_DOWNLOADS]: The skill fetches necessary tools and models from external repositories:
- Downloads FFmpeg source code from ffmpeg.org and pre-built binaries for macOS from martin-riedl.de.
- Clones the whisper.cpp source code from the official ggml-org GitHub repository.
- Downloads the jq utility from the official jqlang GitHub releases.
- Fetches various Whisper models from Hugging Face repositories (ggerganov, BELLE-2, and kotoba-tech).
- All downloads are directed to official, trusted, or widely recognized community sources for the respective tools.
- [COMMAND_EXECUTION]: Local system commands are used for media processing and build tasks:
- Invokes
ffmpegto convert input audio files to 16kHz mono WAV format. - Executes
whisper-clito perform local inference for speech-to-text. - Runs
cmakeandmaketo compile whisper.cpp locally if the binary is missing. - Uses
shasumto verify model integrity by comparing calculated SHA256 hashes against hardcoded values inscripts/_model_common.sh. - [PROMPT_INJECTION]: The skill presents an indirect prompt injection surface (Category 8):
- Ingestion points: Processes user-provided audio files through
transcribe.shto generate text. - Boundary markers: The resulting transcription output (JSON and TXT) does not contain delimiters or system warnings to isolate the transcribed content.
- Capability inventory: The skill writes transcription files to the
data/directory and utilizes temporary directories for processing. - Sanitization: No sanitization is performed on the transcribed text before it is returned.
- Risk: This represents a standard risk for transcription tasks where spoken instructions in source audio could influence downstream reasoning if not properly handled by the consuming agent.
Audit Metadata