advanced-evaluation

Pass

Audited by Gen Agent Trust Hub on Apr 10, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: The skill is designed to process untrusted external data (outputs from other LLMs) to perform evaluation. This creates a surface for indirect prompt injection where the data being judged could attempt to influence the evaluator agent's behavior.
  • Ingestion points: Untrusted LLM responses are passed into evaluation prompts in references/implementation-patterns.md and scripts/evaluation_example.py.
  • Boundary markers: The implementation patterns in references/full-guide.md and scripts/evaluation_example.py use clear structural delimiters (e.g., markdown headers like '## Response to Evaluate') to isolate untrusted content from the instructions.
  • Capability inventory: Across all scripts and guides, the skill focuses on data processing, scoring logic, and comparison. There are no subprocess calls, file-write operations, or network exfiltration capabilities applied to the untrusted data.
  • Sanitization: The skill relies on structural delimiters and prompt instructions (like 'Do NOT prefer responses because they are longer') rather than programmatic sanitization or escaping of the input data.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 10, 2026, 09:45 AM