eval-harness
Pass
Audited by Gen Agent Trust Hub on Mar 12, 2026
Risk Level: SAFE
Full Analysis
- [SAFE]: The skill is a documentation-only framework providing guidelines for testing and evaluation. It contains no executable scripts, hardcoded credentials, or obfuscated content.
- [COMMAND_EXECUTION]: The documentation references standard development commands such as
npm test,npm run build, andgrep. These are used as illustrative examples for deterministic 'Code-Based Graders' to verify project state and are appropriate for the skill's stated purpose. - [INDIRECT_PROMPT_INJECTION]: The skill defines an attack surface by design, as it involves an agent reading and evaluating external data (code and task outputs).
- Ingestion points: Reads eval definitions from
.claude/evals/*.mdand project source files viaRead,Grep, andGlobtools. - Boundary markers: None explicitly defined in the framework templates to separate instructions from evaluated data.
- Capability inventory: Requests
Bash,Write,Edit,Read,Grep, andGlobtools inSKILL.mdto perform evaluations. - Sanitization: None specified; the framework relies on the user to define safe test scripts and includes a 'Best Practice' recommendation for human review of security-sensitive checks.
Audit Metadata