The Agent Skills Directory

[COMMAND_EXECUTION]: The skill uses subprocess calls in scripts such as scripts/run_eval.py and scripts/improve_description.py to execute the claude CLI and other local Python modules. This is the intended behavior for automating the skill development and testing workflow.
[PROMPT_INJECTION]: The skill ingests untrusted data from test outputs and user queries, which are then interpolated into prompts for specialized subagents (Grader, Comparator, and Analyzer). This creates a surface for indirect prompt injection where malicious content in a test case could attempt to influence the evaluation or improvement process.
Ingestion points: evals/evals.json, skill execution outputs in <workspace>/outputs/, and feedback.json.
Boundary markers: The skill does not currently use strong delimiters or explicit instructions to ignore embedded directives when processing test outputs.
Capability inventory: The skill has the ability to execute shell commands and write files to the local filesystem.
Sanitization: No evidence of sanitization or escaping for untrusted data was found before interpolation into agent prompts.
[DATA_EXPOSURE]: The eval-viewer/generate_review.py script starts a local HTTP server on 127.0.0.1. It reads and serves files from the evaluation workspace to provide a user interface for reviewing results, which exposes evaluation data to the local network interface.
[EXTERNAL_DOWNLOADS]: The HTML viewer (eval-viewer/viewer.html) loads the SheetJS library from cdn.sheetjs.com to render spreadsheets in the browser. This is a reference to a well-known service used for legitimate functionality.

skill-creator