skill-creator

Pass

Audited by Gen Agent Trust Hub on Mar 20, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [COMMAND_EXECUTION]: The skill uses the subprocess module in scripts/run_eval.py to execute the claude CLI. This is used to spawn subagents that test the behavior of newly created or modified skills in an isolated environment.
  • [EXTERNAL_DOWNLOADS]: The scripts/improve_description.py and scripts/run_loop.py scripts utilize the official anthropic Python SDK to communicate with Anthropic's API for the purpose of automatically optimizing skill descriptions based on evaluation results.
  • [DYNAMIC_EXECUTION]: The skill dynamically executes prompts and logic defined in user-provided evals.json files. This is a primary feature of the tool, allowing developers to verify skill performance across various scenarios.
  • [INDIRECT_PROMPT_INJECTION]: The skill's architecture is designed to ingest and process untrusted evaluation data which is then executed by an agent.
  • Ingestion points: Evaluation prompts are read from evals/evals.json and passed to subagents.
  • Boundary markers: The skill instructions emphasize the use of isolated workspace directories and subagents to maintain separation between the testing harness and the skill being tested.
  • Capability inventory: The skill possesses the ability to run shell commands (claude CLI), perform network requests (Anthropic API), and manage files/directories within the project workspace.
  • Sanitization: The skill does not sanitize the content of evaluation prompts, as the intended use case is the execution of these prompts for testing purposes.
  • [DATA_EXPOSURE]: The eval-viewer/generate_review.py script starts a local development server on 127.0.0.1:3117 to host a review dashboard. Access is restricted to the local machine to allow the user to review qualitative outputs and quantitative benchmarks.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 20, 2026, 05:14 PM