The Agent Skills Directory

[Prompt Injection] (LOW): The skill describes patterns for 'LLM-as-judge' and self-reflection, creating a surface for indirect prompt injection.
Ingestion points: task and output variables are interpolated directly into prompts in reflect_and_refine, evaluate, and evaluate_outcome functions.
Boundary markers: Absent. The prompts do not use delimiters (like triple quotes or XML tags) or system instructions to prevent the evaluator from obeying commands embedded in the data being evaluated.
Capability inventory: The skill involves calling an LLM in a loop and potentially executing code via run_tests.
Sanitization: None provided.
[Command Execution] (LOW): The CodeReflector pattern (Pattern 3) explicitly suggests a workflow that executes dynamically generated Python code and tests via a run_tests function. While the skill only provides the pattern and not the implementation of run_tests, it encourages the execution of untrusted AI-generated content which requires strict sandboxing to avoid host compromise.

agentic-eval