skill-comply

Fail

Audited by Gen Agent Trust Hub on Mar 23, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The script scripts/runner.py iterates through a list of setup_commands and executes them using subprocess.run. These commands are parsed directly from LLM output generated during the scenario creation phase.\n- [REMOTE_CODE_EXECUTION]: The setup_commands executed in scripts/runner.py are dynamically generated by an LLM in scripts/scenario_generator.py. Because the tool does not validate or sanitize these commands before execution, it is vulnerable to executing arbitrary shell commands provided by the AI model. This risk is particularly high because the LLM is prompted using potentially untrusted skill files provided by the user.\n- [PROMPT_INJECTION]: The tool is vulnerable to indirect prompt injection because it incorporates external skill file content directly into prompts used for specification and scenario generation.\n
  • Ingestion points: Skill files provided as command-line arguments are read and interpolated into prompt templates in scripts/spec_generator.py and scripts/scenario_generator.py.\n
  • Boundary markers: The skill content is wrapped in triple-dash (---) delimiters in the prompt templates, which may not be sufficient to prevent instructions in the skill file from escaping the intended context.\n
  • Capability inventory: The tool has the capability to execute shell commands via subprocess.run in scripts/runner.py based on LLM output.\n
  • Sanitization: No validation, sanitization, or filtering is performed on the input skill content or the resulting LLM-generated setup commands.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Mar 23, 2026, 06:20 AM