The Agent Skills Directory

[PROMPT_INJECTION]: The skill implements evaluation patterns where untrusted data (model responses and test inputs) is directly interpolated into prompts for a judge LLM, creating a surface for indirect prompt injection.
Ingestion points: Data from test_cases and generated model outputs (response, response_a, response_b) are ingested into prompt templates in the llm_judge_quality and compare_responses functions in SKILL.md.
Boundary markers: The templates use simple textual headers like 'Question:' and 'Response:' but lack robust delimiters or specific instructions to disregard instructions embedded within the evaluated text.
Capability inventory: The skill utilizes LLM API calls via openai.ChatCompletion.create to process the prompts.
Sanitization: There is no evidence of sanitization, escaping, or schema validation for the input strings before they are interpolated into the judge prompts.
[EXTERNAL_DOWNLOADS]: The skill utilizes well-known machine learning libraries for evaluation which download external assets.
Evidence: Uses transformers and detoxify to load pre-trained models. These downloads target established repositories like Hugging Face.
Evidence: Uses bert_score which fetches the 'microsoft/deberta-xlarge-mnli' model from Microsoft's official repository.

llm-evaluation