The Agent Skills Directory

[COMMAND_EXECUTION] (MEDIUM): The scripts scripts/speech-to-text.sh and scripts/text-to-speech.sh use shell heredocs (cat <<EOF) to construct JSON payloads. Because these heredocs allow parameter expansion, any user-provided input (via --text, --audio-url, or --model) containing shell metacharacters like $(...) or backticks will be executed by the host shell during variable assignment. For example, setting --text "$(whoami)" would result in the command whoami running on the system.
[CREDENTIALS_UNSAFE] (MEDIUM): The --add-fal-key functionality in both scripts writes the FAL_KEY to a .env file in the skill's directory. Storing credentials in plain-text on the filesystem is a risk for data exposure if the agent environment is compromised or shared.
[PROMPT_INJECTION] (LOW): The skill is susceptible to Indirect Prompt Injection (Category 8c). It retrieves and transcribes content from external audio-url sources. If an attacker provides an audio file that transcribes into malicious instructions, the agent may follow those instructions in subsequent steps.
Ingestion points: scripts/speech-to-text.sh (via --audio-url argument)
Boundary markers: None detected in the prompt logic for the transcription output.
Capability inventory: The skill can execute shell commands via curl and manage local files (.env).
Sanitization: No sanitization is performed on the transcription text before it is returned to the agent.

fal-audio