llm-evaluation
Pass
Audited by Gen Agent Trust Hub on Apr 18, 2026
Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
- [PROMPT_INJECTION]: The skill implements 'LLM-as-Judge' evaluation patterns in
SKILL.mdthat are susceptible to indirect prompt injection.\n - Ingestion points: The
llm_judge_qualityandcompare_responsesfunctions inSKILL.mdingest untrusted model outputs into judge prompts.\n - Boundary markers: The prompts do not use delimiters or instructions to ignore embedded commands within the untrusted model outputs.\n
- Capability inventory: The functions use the
openailibrary to call chat completion models with the interpolated prompts.\n - Sanitization: No sanitization or validation of the ingested content is performed before it is used to construct LLM prompts.\n- [EXTERNAL_DOWNLOADS]: The skill uses the
transformersandbert_scorelibraries to download and utilize pre-trained evaluation models from Microsoft's official repositories.
Audit Metadata