advanced-evaluation
Pass
Audited by Gen Agent Trust Hub on Apr 10, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION]: The skill is designed to process untrusted external data (outputs from other LLMs) to perform evaluation. This creates a surface for indirect prompt injection where the data being judged could attempt to influence the evaluator agent's behavior.
- Ingestion points: Untrusted LLM responses are passed into evaluation prompts in
references/implementation-patterns.mdandscripts/evaluation_example.py. - Boundary markers: The implementation patterns in
references/full-guide.mdandscripts/evaluation_example.pyuse clear structural delimiters (e.g., markdown headers like '## Response to Evaluate') to isolate untrusted content from the instructions. - Capability inventory: Across all scripts and guides, the skill focuses on data processing, scoring logic, and comparison. There are no subprocess calls, file-write operations, or network exfiltration capabilities applied to the untrusted data.
- Sanitization: The skill relies on structural delimiters and prompt instructions (like 'Do NOT prefer responses because they are longer') rather than programmatic sanitization or escaping of the input data.
Audit Metadata