The Agent Skills Directory

[COMMAND_EXECUTION]: The skill uses the subprocess module in scripts/run_eval.py to execute the claude CLI. This is used to spawn subagents that test the behavior of newly created or modified skills in an isolated environment.
[EXTERNAL_DOWNLOADS]: The scripts/improve_description.py and scripts/run_loop.py scripts utilize the official anthropic Python SDK to communicate with Anthropic's API for the purpose of automatically optimizing skill descriptions based on evaluation results.
[DYNAMIC_EXECUTION]: The skill dynamically executes prompts and logic defined in user-provided evals.json files. This is a primary feature of the tool, allowing developers to verify skill performance across various scenarios.
[INDIRECT_PROMPT_INJECTION]: The skill's architecture is designed to ingest and process untrusted evaluation data which is then executed by an agent.
Ingestion points: Evaluation prompts are read from evals/evals.json and passed to subagents.
Boundary markers: The skill instructions emphasize the use of isolated workspace directories and subagents to maintain separation between the testing harness and the skill being tested.
Capability inventory: The skill possesses the ability to run shell commands (claude CLI), perform network requests (Anthropic API), and manage files/directories within the project workspace.
Sanitization: The skill does not sanitize the content of evaluation prompts, as the intended use case is the execution of these prompts for testing purposes.
[DATA_EXPOSURE]: The eval-viewer/generate_review.py script starts a local development server on 127.0.0.1:3117 to host a review dashboard. Access is restricted to the local machine to allow the user to review qualitative outputs and quantitative benchmarks.

skill-creator