llm-evaluation
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
- Indirect Prompt Injection (LOW): Vulnerability identified where untrusted content can manipulate the results of the 'judge' model.\n
- Ingestion points: Untrusted data enters via parameters such as
input_text,output_text, andcontextinscripts/evaluator-template.pyandexamples/evaluation-patterns.md.\n - Boundary markers: Absent. The code does not use delimiters (e.g., XML tags or triple quotes) or explicit 'ignore embedded instructions' warnings to isolate the text being evaluated.\n
- Capability inventory: Evaluation scores produced by LLM calls directly drive 'Quality Gate' logic and model preference selection, which affects downstream application behavior.\n
- Sanitization: Absent. Data is used directly in f-strings with only simple length truncation for token management, offering no protection against instruction hijacking.\n- External Downloads (SAFE): The skill utilizes standard, industry-standard libraries for its stated purpose.\n
- Evidence: Mentions and utilizes
ragas,numpy,scipy,langsmith,langfuse, anddatasetsfor evaluation metrics and observability tracking.
Audit Metadata