The Agent Skills Directory

[COMMAND_EXECUTION]: The skill utilizes the 'agent-media' CLI tool to perform audio transcription and video-to-audio extraction tasks as its primary function.
[EXTERNAL_DOWNLOADS]: When using the 'local' provider, the skill fetches machine learning models (~100MB) from Hugging Face's official repositories. This is a standard and documented behavior for the Transformers.js library mentioned.
[DATA_EXFILTRATION]: The skill integrates with well-known third-party services (Fal, Replicate, Runpod) for processing audio files. These operations are performed using official API keys and target established technology platforms.
[PROMPT_INJECTION]: The skill processes untrusted audio data from local paths or URLs. While this creates a surface for indirect prompt injection (where spoken audio might contain instructions), the risk is inherent to media processing and mitigated by the agent's structured handling of the transcription output.

audio-transcribe