agent-eval-harness
Fail
Audited by Socket on Feb 28, 2026
1 alert found:
MalwareMalwareSKILL.md
HIGHMalwareHIGH
SKILL.md
The analyzed report describes a legitimate evaluation harness with a broad but standard surface for executing external adapters and graders. The main security concerns center on handling API keys and the shell-based execution modes, which can enable command execution or data leakage if prompts/adapters are untrusted. No evidence of malicious code or hidden payloads is present; the risk is driven by configuration and workflow choices. Recommended mitigations include secret management, restricted log exposure for keys, sandboxing or strict validation for shell modes, and vetting of graders/adapters to prevent data exfiltration. Overall, a cautious but acceptable assessment with moderate risk.
Confidence: 95%Severity: 90%
Audit Metadata