llm-evaluation
Pass
Audited by Gen Agent Trust Hub on Mar 8, 2026
Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
- [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection within the 'LLM-as-Judge' implementation patterns.
- Ingestion points: Untrusted model responses and user inputs are directly interpolated into evaluation prompts within the
llm_judge_qualityandcompare_responsesfunctions. - Boundary markers: The prompts do not utilize clear delimiters (e.g., XML tags or isolated blocks) to separate instructions from the data being evaluated, which could allow a malicious model response to influence the judge's scoring or reasoning.
- Capability inventory: The skill is configured to interact with the OpenAI API and execute code in an environment equipped with various data science and NLP libraries.
- Sanitization: No input validation or filtering is performed on the content before it is processed by the judging model.
- [EXTERNAL_DOWNLOADS]: The skill references and downloads pre-trained models from trusted repositories.
- Evidence: Fetches the
deberta-xlarge-mnlianddeberta-large-mnlimodels from Microsoft's official organization on Hugging Face for metrics calculation. This is standard behavior for the intended functionality and follows established security protocols for trusted vendors.
Audit Metadata