llm-evaluation

Pass

Audited by Gen Agent Trust Hub on Mar 8, 2026

Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection within the 'LLM-as-Judge' implementation patterns.
  • Ingestion points: Untrusted model responses and user inputs are directly interpolated into evaluation prompts within the llm_judge_quality and compare_responses functions.
  • Boundary markers: The prompts do not utilize clear delimiters (e.g., XML tags or isolated blocks) to separate instructions from the data being evaluated, which could allow a malicious model response to influence the judge's scoring or reasoning.
  • Capability inventory: The skill is configured to interact with the OpenAI API and execute code in an environment equipped with various data science and NLP libraries.
  • Sanitization: No input validation or filtering is performed on the content before it is processed by the judging model.
  • [EXTERNAL_DOWNLOADS]: The skill references and downloads pre-trained models from trusted repositories.
  • Evidence: Fetches the deberta-xlarge-mnli and deberta-large-mnli models from Microsoft's official organization on Hugging Face for metrics calculation. This is standard behavior for the intended functionality and follows established security protocols for trusted vendors.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 8, 2026, 12:11 AM