eval-harness
Pass
Audited by Gen Agent Trust Hub on Mar 4, 2026
Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
- [PROMPT_INJECTION]: The skill introduces an indirect prompt injection surface through the 'Model-Based Grader' and 'Capability Eval' features. These mechanisms ingest external code and task outputs to be processed by the agent, which could contain malicious instructions designed to manipulate the evaluation results.
- Ingestion points: Processing of external data within
[MODEL GRADER PROMPT]and[CAPABILITY EVAL]blocks as described inSKILL.md. - Boundary markers: The framework uses Markdown headers and code block delimiters to separate evaluation logic from data.
- Capability inventory: The skill utilizes
Bash,Read,Write,Edit,Grep, andGlobtools to perform its functions. - Sanitization: There is no mention of input sanitization or explicit 'ignore previous instructions' markers for the data being evaluated.
- [COMMAND_EXECUTION]: The skill leverages the
Bashtool to perform deterministic checks, such as runningnpm test,npm run build, andgrep. While these are powerful capabilities, they are standard for development evaluation workflows and are used here to automate verification against success criteria.
Audit Metadata