skills/oldwinter/skills/ai-evals/Gen Agent Trust Hub

ai-evals

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFEPROMPT_INJECTIONDATA_EXFILTRATION
Full Analysis
  • PROMPT_INJECTION (LOW): Indirect Prompt Injection surface detected in the LLM-as-judge evaluation flow.
  • Ingestion points: references/TEMPLATES.md provides a prompt skeleton for 'LLM-as-judge' that interpolates untrusted data from <test case input> and <model output>.
  • Boundary markers: Absent. The template relies on simple placeholders without robust delimiters or instructions to ignore embedded commands within the content being evaluated.
  • Capability inventory: The judge's output (JSON) is designed to influence automated ship/no-ship decisions or product iteration loops.
  • Sanitization: The skill contains manual checklists in references/CHECKLISTS.md and references/INTAKE.md advising on anonymization, but lacks automated sanitization for the prompt construction.
  • DATA_EXFILTRATION (LOW): Potential for sensitive data exposure during the evaluation process.
  • The skill's workflow (SKILL.md and references/INTAKE.md) encourages gathering real user logs and examples. While it advises on anonymization, the process of sending this data to external LLM providers for judging constitutes a low-level risk of data exposure if users do not strictly follow the manual redaction guidelines.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 06:25 PM