llm-evaluation

Pass

Audited by Gen Agent Trust Hub on May 7, 2026

Risk Level: SAFE
Full Analysis
  • [PROMPT_INJECTION]: The skill contains 'LLM-as-Judge' implementation examples where model outputs are interpolated into evaluation prompts. This creates an inherent surface for indirect prompt injection if the evaluated content contains adversarial instructions. However, these are standard benchmarking patterns and do not target the agent's own safety guidelines.
  • Ingestion points: The response and question variables in the llm_judge_quality and compare_responses functions in SKILL.md.
  • Boundary markers: Absent; responses are directly embedded in the f-string prompt.
  • Capability inventory: The snippets demonstrate calls to the OpenAI ChatCompletion API.
  • Sanitization: None provided in the simplified code examples.
  • [EXTERNAL_DOWNLOADS]: The code snippets reference downloading and using the microsoft/deberta-large-mnli and microsoft/deberta-xlarge-mnli models via the Hugging Face transformers library. These are official models from a well-known organization and are standard for Natural Language Inference tasks.
Audit Metadata
Risk Level
SAFE
Analyzed
May 7, 2026, 03:31 PM