agent-eval-harness

Warn

Audited by Gen Agent Trust Hub on Feb 28, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The run command includes --simple and --shell modes that execute prompt content directly in a shell environment. The documentation explicitly warns that malicious prompt text could escape quoting and execute arbitrary commands, posing a risk if untrusted prompts are processed.
  • [COMMAND_EXECUTION]: The harness is designed to execute arbitrary local scripts or executables provided by the user via the --grader and --schema arguments. These scripts are shown using powerful capabilities like Bun's shell (Bun.$) to perform file system operations and run tests.
  • [EXTERNAL_DOWNLOADS]: The documentation for Docker setup (docker-evals.md) provides instructions to download and execute scripts directly from the internet, specifically using curl -fsSL https://claude.ai/install.sh | bash to install the Claude CLI. While this targets a well-known service (Anthropic), the pattern involves executing remote code locally.
  • [INDIRECT_PROMPT_INJECTION]: The tool's primary purpose is to process agent trajectories, which are untrusted external data. This data is passed to grader scripts that may have significant system privileges, creating an attack surface where an agent's output could influence the grader's execution.
  • Ingestion points: Agent outputs and trajectories are read from results.jsonl and prompts.jsonl files.
  • Boundary markers: No specific delimiters or boundary markers are enforced by the harness; isolation depends on the user's implementation of graders.
  • Capability inventory: The harness and its graders can perform subprocess calls (Bun.$), network requests (via SDKs), and file system writes.
  • Sanitization: The harness does not appear to sanitize the captured trajectories before passing them to the grading logic.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 28, 2026, 09:25 AM