llm-evaluation
Pass
Audited by Gen Agent Trust Hub on Mar 8, 2026
Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
- [PROMPT_INJECTION]: The skill implements evaluation patterns where untrusted data (model responses and test inputs) is directly interpolated into prompts for a judge LLM, creating a surface for indirect prompt injection.
- Ingestion points: Data from
test_casesand generated model outputs (response,response_a,response_b) are ingested into prompt templates in thellm_judge_qualityandcompare_responsesfunctions inSKILL.md. - Boundary markers: The templates use simple textual headers like 'Question:' and 'Response:' but lack robust delimiters or specific instructions to disregard instructions embedded within the evaluated text.
- Capability inventory: The skill utilizes LLM API calls via
openai.ChatCompletion.createto process the prompts. - Sanitization: There is no evidence of sanitization, escaping, or schema validation for the input strings before they are interpolated into the judge prompts.
- [EXTERNAL_DOWNLOADS]: The skill utilizes well-known machine learning libraries for evaluation which download external assets.
- Evidence: Uses
transformersanddetoxifyto load pre-trained models. These downloads target established repositories like Hugging Face. - Evidence: Uses
bert_scorewhich fetches the 'microsoft/deberta-xlarge-mnli' model from Microsoft's official repository.
Audit Metadata