llm-evaluation

Pass

Audited by Gen Agent Trust Hub on Mar 8, 2026

Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [PROMPT_INJECTION]: The skill implements evaluation patterns where untrusted data (model responses and test inputs) is directly interpolated into prompts for a judge LLM, creating a surface for indirect prompt injection.
  • Ingestion points: Data from test_cases and generated model outputs (response, response_a, response_b) are ingested into prompt templates in the llm_judge_quality and compare_responses functions in SKILL.md.
  • Boundary markers: The templates use simple textual headers like 'Question:' and 'Response:' but lack robust delimiters or specific instructions to disregard instructions embedded within the evaluated text.
  • Capability inventory: The skill utilizes LLM API calls via openai.ChatCompletion.create to process the prompts.
  • Sanitization: There is no evidence of sanitization, escaping, or schema validation for the input strings before they are interpolated into the judge prompts.
  • [EXTERNAL_DOWNLOADS]: The skill utilizes well-known machine learning libraries for evaluation which download external assets.
  • Evidence: Uses transformers and detoxify to load pre-trained models. These downloads target established repositories like Hugging Face.
  • Evidence: Uses bert_score which fetches the 'microsoft/deberta-xlarge-mnli' model from Microsoft's official repository.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 8, 2026, 07:40 AM