The Agent Skills Directory

[SAFE]: The skill is a documentation-only framework providing guidelines for testing and evaluation. It contains no executable scripts, hardcoded credentials, or obfuscated content.
[COMMAND_EXECUTION]: The documentation references standard development commands such as npm test, npm run build, and grep. These are used as illustrative examples for deterministic 'Code-Based Graders' to verify project state and are appropriate for the skill's stated purpose.
[INDIRECT_PROMPT_INJECTION]: The skill defines an attack surface by design, as it involves an agent reading and evaluating external data (code and task outputs).
Ingestion points: Reads eval definitions from .claude/evals/*.md and project source files via Read, Grep, and Glob tools.
Boundary markers: None explicitly defined in the framework templates to separate instructions from evaluated data.
Capability inventory: Requests Bash, Write, Edit, Read, Grep, and Glob tools in SKILL.md to perform evaluations.
Sanitization: None specified; the framework relies on the user to define safe test scripts and includes a 'Best Practice' recommendation for human review of security-sensitive checks.

eval-harness