The Agent Skills Directory

Indirect Prompt Injection (LOW): Vulnerability identified where untrusted content can manipulate the results of the 'judge' model.\n
Ingestion points: Untrusted data enters via parameters such as input_text, output_text, and context in scripts/evaluator-template.py and examples/evaluation-patterns.md.\n
Boundary markers: Absent. The code does not use delimiters (e.g., XML tags or triple quotes) or explicit 'ignore embedded instructions' warnings to isolate the text being evaluated.\n
Capability inventory: Evaluation scores produced by LLM calls directly drive 'Quality Gate' logic and model preference selection, which affects downstream application behavior.\n
Sanitization: Absent. Data is used directly in f-strings with only simple length truncation for token management, offering no protection against instruction hijacking.\n- External Downloads (SAFE): The skill utilizes standard, industry-standard libraries for its stated purpose.\n
Evidence: Mentions and utilizes ragas, numpy, scipy, langsmith, langfuse, and datasets for evaluation metrics and observability tracking.

llm-evaluation