agent-eval-harness
Warn
Audited by Gen Agent Trust Hub on Feb 28, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The
runcommand includes--simpleand--shellmodes that execute prompt content directly in a shell environment. The documentation explicitly warns that malicious prompt text could escape quoting and execute arbitrary commands, posing a risk if untrusted prompts are processed. - [COMMAND_EXECUTION]: The harness is designed to execute arbitrary local scripts or executables provided by the user via the
--graderand--schemaarguments. These scripts are shown using powerful capabilities like Bun's shell (Bun.$) to perform file system operations and run tests. - [EXTERNAL_DOWNLOADS]: The documentation for Docker setup (
docker-evals.md) provides instructions to download and execute scripts directly from the internet, specifically usingcurl -fsSL https://claude.ai/install.sh | bashto install the Claude CLI. While this targets a well-known service (Anthropic), the pattern involves executing remote code locally. - [INDIRECT_PROMPT_INJECTION]: The tool's primary purpose is to process agent trajectories, which are untrusted external data. This data is passed to grader scripts that may have significant system privileges, creating an attack surface where an agent's output could influence the grader's execution.
- Ingestion points: Agent outputs and trajectories are read from
results.jsonlandprompts.jsonlfiles. - Boundary markers: No specific delimiters or boundary markers are enforced by the harness; isolation depends on the user's implementation of graders.
- Capability inventory: The harness and its graders can perform subprocess calls (
Bun.$), network requests (via SDKs), and file system writes. - Sanitization: The harness does not appear to sanitize the captured trajectories before passing them to the grading logic.
Audit Metadata