mk-youtube-audio-transcribe

Pass

Audited by Gen Agent Trust Hub on Feb 25, 2026

Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [EXTERNAL_DOWNLOADS]: The skill fetches necessary tools and models from external repositories:
  • Downloads FFmpeg source code from ffmpeg.org and pre-built binaries for macOS from martin-riedl.de.
  • Clones the whisper.cpp source code from the official ggml-org GitHub repository.
  • Downloads the jq utility from the official jqlang GitHub releases.
  • Fetches various Whisper models from Hugging Face repositories (ggerganov, BELLE-2, and kotoba-tech).
  • All downloads are directed to official, trusted, or widely recognized community sources for the respective tools.
  • [COMMAND_EXECUTION]: Local system commands are used for media processing and build tasks:
  • Invokes ffmpeg to convert input audio files to 16kHz mono WAV format.
  • Executes whisper-cli to perform local inference for speech-to-text.
  • Runs cmake and make to compile whisper.cpp locally if the binary is missing.
  • Uses shasum to verify model integrity by comparing calculated SHA256 hashes against hardcoded values in scripts/_model_common.sh.
  • [PROMPT_INJECTION]: The skill presents an indirect prompt injection surface (Category 8):
  • Ingestion points: Processes user-provided audio files through transcribe.sh to generate text.
  • Boundary markers: The resulting transcription output (JSON and TXT) does not contain delimiters or system warnings to isolate the transcribed content.
  • Capability inventory: The skill writes transcription files to the data/ directory and utilizes temporary directories for processing.
  • Sanitization: No sanitization is performed on the transcribed text before it is returned.
  • Risk: This represents a standard risk for transcription tasks where spoken instructions in source audio could influence downstream reasoning if not properly handled by the consuming agent.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 25, 2026, 12:00 PM