experiment-code

Fail

Audited by Gen Agent Trust Hub on Feb 20, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION] (HIGH): The skill uses subprocess.run within its core experiment loop (Pattern 1 in references/code-patterns.md) to execute generated scripts like experiment.py. This provides the AI with a direct mechanism to run arbitrary code on the local machine.
  • [REMOTE_CODE_EXECUTION] (HIGH): The skill facilitates an automated cycle where code is generated from model responses and then executed. Because the code is derived from untrusted user input ($1), it acts as a vector for remote code execution.
  • [PROMPT_INJECTION] (LOW): This skill is vulnerable to indirect prompt injection. 1. Ingestion points: The $1 argument (research plan) and error messages from failed runs. 2. Boundary markers: None; there are no delimiters used to isolate user data from instructions in the prompt templates. 3. Capability inventory: High-privilege actions including subprocess.run and broad file system access. 4. Sanitization: None; generated code is executed immediately without static analysis, linting, or sandboxing.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 20, 2026, 05:23 AM