The Agent Skills Directory

[Indirect Prompt Injection] (LOW): The skill implements an evaluation pattern that ingests untrusted output from other LLMs.
Ingestion points: Untrusted data is processed via the {output} variable in prompts defined in reference/llm-as-judge-guide.md.
Boundary markers: The templates use Markdown headers to delimit untrusted content from judge instructions.
Capability inventory: The skill is authorized to use the Bash and Write tools according to the SKILL.md frontmatter.
Sanitization: Implementation guides suggest using JSON schema validation with pydantic or zod to ensure the integrity of the judge model output.
[COMMAND_EXECUTION] (SAFE): The skill permits the use of the Bash tool for running local evaluation scripts, which is appropriate for a testing and performance benchmarking toolset.

grey-haven-evaluation