experiment-code
Fail
Audited by Gen Agent Trust Hub on Feb 20, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION] (HIGH): The skill uses
subprocess.runwithin its core experiment loop (Pattern 1 inreferences/code-patterns.md) to execute generated scripts likeexperiment.py. This provides the AI with a direct mechanism to run arbitrary code on the local machine. - [REMOTE_CODE_EXECUTION] (HIGH): The skill facilitates an automated cycle where code is generated from model responses and then executed. Because the code is derived from untrusted user input ($1), it acts as a vector for remote code execution.
- [PROMPT_INJECTION] (LOW): This skill is vulnerable to indirect prompt injection. 1. Ingestion points: The
$1argument (research plan) and error messages from failed runs. 2. Boundary markers: None; there are no delimiters used to isolate user data from instructions in the prompt templates. 3. Capability inventory: High-privilege actions includingsubprocess.runand broad file system access. 4. Sanitization: None; generated code is executed immediately without static analysis, linting, or sandboxing.
Recommendations
- AI detected serious security threats
Audit Metadata