experiment-code
Fail
Audited by Gen Agent Trust Hub on Feb 22, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
- Command Execution (HIGH): The skill explicitly provides implementation patterns (e.g., Pattern 1 in
code-patterns.md) that usesubprocess.run(['python', 'experiment.py', ...])to execute code generated by the agent. This execution happens on the host environment without mentioned sandboxing or safety boundaries. - Indirect Prompt Injection (LOW): The skill accepts research plans and ideas as input (
$1), which are then interpolated into system prompts (e.g., Prompt 1 inexperiment-prompts.md). - Ingestion points: Input parameter
$1(research plan/idea) inSKILL.md. - Boundary markers: None. The input is directly placed into the
{idea}and{title}placeholders in prompt templates. - Capability inventory: File system read/write (
json.load,open,notes.txt) and arbitrary command execution viasubprocess.runincode-patterns.md. - Sanitization: There is no logic to sanitize the user input or validate the generated code before it is executed.
- Dynamic Execution (HIGH): The skill's primary workflow involves the dynamic creation of Python scripts (
experiment.py,plot.py) which are subsequently executed and iteratively modified based on runtime errors and performance scores.
Recommendations
- AI detected serious security threats
Audit Metadata