ai-evals
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFEPROMPT_INJECTIONDATA_EXFILTRATION
Full Analysis
- PROMPT_INJECTION (LOW): Indirect Prompt Injection surface detected in the LLM-as-judge evaluation flow.
- Ingestion points:
references/TEMPLATES.mdprovides a prompt skeleton for 'LLM-as-judge' that interpolates untrusted data from<test case input>and<model output>. - Boundary markers: Absent. The template relies on simple placeholders without robust delimiters or instructions to ignore embedded commands within the content being evaluated.
- Capability inventory: The judge's output (JSON) is designed to influence automated ship/no-ship decisions or product iteration loops.
- Sanitization: The skill contains manual checklists in
references/CHECKLISTS.mdandreferences/INTAKE.mdadvising on anonymization, but lacks automated sanitization for the prompt construction. - DATA_EXFILTRATION (LOW): Potential for sensitive data exposure during the evaluation process.
- The skill's workflow (
SKILL.mdandreferences/INTAKE.md) encourages gathering real user logs and examples. While it advises on anonymization, the process of sending this data to external LLM providers for judging constitutes a low-level risk of data exposure if users do not strictly follow the manual redaction guidelines.
Audit Metadata