eval-harness

Pass

Audited by Gen Agent Trust Hub on Mar 12, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill is a documentation-only framework providing guidelines for testing and evaluation. It contains no executable scripts, hardcoded credentials, or obfuscated content.
  • [COMMAND_EXECUTION]: The documentation references standard development commands such as npm test, npm run build, and grep. These are used as illustrative examples for deterministic 'Code-Based Graders' to verify project state and are appropriate for the skill's stated purpose.
  • [INDIRECT_PROMPT_INJECTION]: The skill defines an attack surface by design, as it involves an agent reading and evaluating external data (code and task outputs).
  • Ingestion points: Reads eval definitions from .claude/evals/*.md and project source files via Read, Grep, and Glob tools.
  • Boundary markers: None explicitly defined in the framework templates to separate instructions from evaluated data.
  • Capability inventory: Requests Bash, Write, Edit, Read, Grep, and Glob tools in SKILL.md to perform evaluations.
  • Sanitization: None specified; the framework relies on the user to define safe test scripts and includes a 'Best Practice' recommendation for human review of security-sensitive checks.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 12, 2026, 08:39 AM