agent-eval-harness

Warn

Audited by Snyk on Feb 28, 2026

Risk Level: MEDIUM
Full Analysis

MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).

  • Third-party content exposure detected (high risk: 0.90). The SKILL.md shows the harness executes prompts that can invoke external web/tool calls (the capture output example includes a "WebSearch" tool_call) and also documents --simple/--shell modes and LLM-as-judge graders that run external commands or fetch content, so it can ingest untrusted public web/user-generated content that may influence agent decisions.

MEDIUM W012: Unverifiable external dependency detected (runtime URL that controls agent).

  • Potentially malicious external URL detected (high risk: 1.00). The Dockerfile and docs contain a direct remote-install command that is executed during setup—RUN curl -fsSL https://claude.ai/install.sh | bash—which fetches and runs remote code (the Claude CLI installer) that the harness recommends/relies on for the Claude headless adapter.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 28, 2026, 09:25 AM