The Agent Skills Directory

[PROMPT_INJECTION] (MEDIUM): The score_with_llm function in SKILL.md implements an 'LLM as judge' pattern that is vulnerable to indirect prompt injection.
Ingestion points: It processes untrusted model outputs (actual) and dataset entries (expected) which are interpolated into the prompt.
Boundary markers: No delimiters or isolation techniques are used in the prompt template to separate data from instructions.
Capability inventory: The resulting scores can trigger sys.exit(1), potentially failing automated builds and affecting the deployment process.
Sanitization: The input is interpolated directly into the judge's prompt without any sanitization or validation of its content.

evaluation-harness