The Agent Skills Directory

[PROMPT_INJECTION]: The skill defines a 'Model-Based Grader' workflow where the agent is instructed to evaluate code changes or task outputs. This creates a surface for indirect prompt injection if the content being evaluated contains malicious instructions designed to manipulate the grader's judgment or trigger unintended tool use.
Ingestion points: Task outputs, code files, and test results processed during the '/eval check' and '/eval report' workflows.
Boundary markers: Uses specific tags such as '[MODEL GRADER PROMPT]' to delineate instructions, but does not implement strict isolation or 'ignore' directives for the data being graded.
Capability inventory: The skill requires 'Read', 'Write', 'Edit', 'Bash', 'Grep', and 'Glob' tools, providing significant system access that could be exploited if an injection succeeds.
Sanitization: No explicit sanitization, filtering, or instruction-ignoring protocols are defined in the framework to protect the grading prompt from embedded data.

eval-harness