autoresearch

Warn

Audited by Gen Agent Trust Hub on Mar 18, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill executes arbitrary shell commands provided by the user (METRIC_COMMAND) in an autonomous loop without a pause for approval between iterations. Evidence: 'Phase 3, Step 4: RUN
  • Execute the metric command. Redirect output to run.log'.
  • [REMOTE_CODE_EXECUTION]: The agent is instructed to autonomously generate and write code to the local filesystem and then execute that code via a metric command. This creates a cycle where generated code is run without manual review. Evidence: 'Phase 3, Step 2: EDIT
  • Modify the in-scope file(s)... Step 4: RUN
  • Execute the metric command'.
  • [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection because it reads and analyzes code from the repository ('in-scope files') which is then used to generate the next iteration of logic. Malicious instructions inside the codebase could influence the agent's behavior during its autonomous 'THINK' phase. Evidence: 'Phase 2, Step 2: Read in-scope files' and 'Phase 3, Step 1: THINK
  • Analyze previous results and the current code'.
  • [INDIRECT_PROMPT_INJECTION_MANDATORY_EVIDENCE]: Ingestion points: 'Phase 2, Step 2: Read in-scope files' (SKILL.md). Boundary markers: None present. Capability inventory: 'Phase 3, Step 2: EDIT (Modify files)' and 'Phase 3, Step 4: RUN (Shell command execution)' (SKILL.md). Sanitization: No sanitization or filtering of file content is described before analysis.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 18, 2026, 11:58 PM