skill-creator

Pass

Audited by Gen Agent Trust Hub on Mar 16, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill implements a robust 'skill-creator' workflow that includes drafting, testing with subagents, human feedback loops, and automated optimization of skill descriptions.
  • [SAFE]: External dependencies are managed through standard scripts and utilities provided within the skill's own package (e.g., scripts/aggregate_benchmark.py, scripts/run_loop.py, eval-viewer/generate_review.py).
  • [SAFE]: The subagent spawning mechanism is used legitimately for running parallel evaluations (with-skill vs. baseline) to measure performance improvements accurately.
  • [SAFE]: Data handling is restricted to the local workspace for storing test outputs, grading results, and benchmark statistics. No sensitive file access or non-whitelisted network operations were found.
  • [SAFE]: The description optimization loop uses legitimate calls to the Anthropic API (via standard clients) to propose and test improved skill descriptions based on evaluation failures.
  • [SAFE]: Security instructions within the skill body (Principle of Lack of Surprise) explicitly prohibit the creation of malicious skills, exploit code, or tools for unauthorized access.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 16, 2026, 07:57 PM