The Agent Skills Directory

[COMMAND_EXECUTION]: The script scripts/run_eval.py uses subprocess.Popen to invoke the claude CLI tool on the host system. Additionally, eval-viewer/generate_review.py executes subprocess.run with the lsof command to manage local network ports for the evaluation viewer.
[REMOTE_CODE_EXECUTION]: The skill dynamically constructs and writes temporary skill definition files to the project's .claude/commands/ directory. These files contain logic and metadata that are interpreted and executed by the claude CLI during trigger evaluation runs.
[EXTERNAL_DOWNLOADS]: The evaluation report template (eval-viewer/viewer.html) fetches the SheetJS (xlsx) library from cdn.sheetjs.com via a script tag. This external dependency is used to render spreadsheet data within the local browser interface.
[PROMPT_INJECTION]: The skill is designed to process and benchmark untrusted skill drafts and test prompts, presenting an attack surface for indirect prompt injection.
Ingestion points: Evaluation prompts in evals/evals.json and skill instructions in SKILL.md are processed and executed.
Boundary markers: The skill uses standard Markdown and YAML delimiters which do not prevent the execution of instructions embedded within the untrusted content.
Capability inventory: The system can execute local shell commands, write files to the local filesystem, and spawn autonomous subagents.
Sanitization: There is no evidence of content sanitization or validation performed on user-provided prompts or skill instructions before they are processed by the execution engine.

skill-creator