skill-creator

Pass

Audited by Gen Agent Trust Hub on Mar 28, 2026

Risk Level: SAFECOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONEXTERNAL_DOWNLOADSDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill executes several local Python scripts (e.g., run_eval.py, run_loop.py, package_skill.py) to manage benchmark data and package files. It also utilizes the claude CLI via subprocess calls to perform triggering tests and generate improved skill descriptions.
  • [REMOTE_CODE_EXECUTION]: To verify the behavior of newly created or modified skills, the tool spawns parallel subagents using the TaskCreate tool to execute test prompts in an isolated environment.
  • [EXTERNAL_DOWNLOADS]: The evaluation viewer loads the SheetJS library from a public CDN (cdn.sheetjs.com) to allow users to inspect spreadsheet outputs directly within the generated HTML report.
  • [DATA_EXFILTRATION]: The skill requires PII (full name and email address) from the user to populate the created-by metadata field in the generated skill's YAML frontmatter, which is then stored in the local file system.
  • [PROMPT_INJECTION]: The skill possesses an indirect prompt injection surface as it processes user-provided test prompts and feedback, which are subsequently interpolated into prompts for subagents and LLM-based optimization cycles.
  • Ingestion points: Reads evaluation prompts from evals.json and qualitative feedback from feedback.json (via a local web server).
  • Boundary markers: Uses XML-style tags (e.g., <skill_content>, <scores_summary>) to structure data provided to the model.
  • Capability inventory: Includes the ability to execute shell commands, write to the file system, and spawn subagent tasks.
  • Sanitization: Implements standard HTML escaping in the evaluation viewer UI and relies on model guardrails during the skill generation process.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 28, 2026, 04:44 PM