run

Warn

Audited by Gen Agent Trust Hub on Mar 13, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill executes shell commands like cat, git checkout, and python using variables {domain} and {name}. The lack of sanitization or validation on these variables could allow an attacker to perform path traversal or command injection by providing specially crafted experiment identifiers.
  • [PROMPT_INJECTION]: Indirect prompt injection surface detected.
  • Ingestion points: Data is read from .autoresearch/{domain}/{name}/config.cfg, program.md, and results.tsv.
  • Boundary markers: There are no boundary markers or instructions to ignore embedded commands within the ingested files.
  • Capability inventory: The skill has the ability to execute shell commands and write to the filesystem.
  • Sanitization: Content from the ingested files directly influences the agent's decision-making and code modification logic without any sanitization.
  • [COMMAND_EXECUTION]: The skill modifies target source files and then executes them via an evaluation script (run_experiment.py). This capability allows for the execution of agent-generated code, which constitutes a significant security risk if the agent's instructions are influenced by malicious data.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 13, 2026, 10:00 PM