skill-creator
Pass
Audited by Gen Agent Trust Hub on Mar 20, 2026
Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
- [COMMAND_EXECUTION]: The skill uses the
subprocessmodule inscripts/run_eval.pyto execute theclaudeCLI. This is used to spawn subagents that test the behavior of newly created or modified skills in an isolated environment. - [EXTERNAL_DOWNLOADS]: The
scripts/improve_description.pyandscripts/run_loop.pyscripts utilize the officialanthropicPython SDK to communicate with Anthropic's API for the purpose of automatically optimizing skill descriptions based on evaluation results. - [DYNAMIC_EXECUTION]: The skill dynamically executes prompts and logic defined in user-provided
evals.jsonfiles. This is a primary feature of the tool, allowing developers to verify skill performance across various scenarios. - [INDIRECT_PROMPT_INJECTION]: The skill's architecture is designed to ingest and process untrusted evaluation data which is then executed by an agent.
- Ingestion points: Evaluation prompts are read from
evals/evals.jsonand passed to subagents. - Boundary markers: The skill instructions emphasize the use of isolated workspace directories and subagents to maintain separation between the testing harness and the skill being tested.
- Capability inventory: The skill possesses the ability to run shell commands (
claudeCLI), perform network requests (Anthropic API), and manage files/directories within the project workspace. - Sanitization: The skill does not sanitize the content of evaluation prompts, as the intended use case is the execution of these prompts for testing purposes.
- [DATA_EXPOSURE]: The
eval-viewer/generate_review.pyscript starts a local development server on127.0.0.1:3117to host a review dashboard. Access is restricted to the local machine to allow the user to review qualitative outputs and quantitative benchmarks.
Audit Metadata