llm-evaluation

Pass

Audited by Gen Agent Trust Hub on Apr 18, 2026

Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [PROMPT_INJECTION]: The skill implements 'LLM-as-Judge' evaluation patterns in SKILL.md that are susceptible to indirect prompt injection.\n
  • Ingestion points: The llm_judge_quality and compare_responses functions in SKILL.md ingest untrusted model outputs into judge prompts.\n
  • Boundary markers: The prompts do not use delimiters or instructions to ignore embedded commands within the untrusted model outputs.\n
  • Capability inventory: The functions use the openai library to call chat completion models with the interpolated prompts.\n
  • Sanitization: No sanitization or validation of the ingested content is performed before it is used to construct LLM prompts.\n- [EXTERNAL_DOWNLOADS]: The skill uses the transformers and bert_score libraries to download and utilize pre-trained evaluation models from Microsoft's official repositories.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 18, 2026, 03:40 AM