skill-creator

Pass

Audited by Gen Agent Trust Hub on Mar 26, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The script scripts/run_eval.py executes the claude command-line interface via subprocess.Popen to test skill triggering behavior.
  • [COMMAND_EXECUTION]: The script eval-viewer/generate_review.py executes lsof and kill via subprocess.run to manage the local network port used for the evaluation viewer.
  • [EXTERNAL_DOWNLOADS]: The skill utilizes the official anthropic Python package to connect to the Anthropic API for the purpose of optimizing skill descriptions using large language models.
  • [EXTERNAL_DOWNLOADS]: The evaluation viewer template (eval-viewer/viewer.html) loads the SheetJS library from a well-known public CDN (cdn.sheetjs.com) to enable the rendering of spreadsheet files in the browser.
  • [PROMPT_INJECTION]: The skill processes user-defined test cases from evals.json and eval_set.json. These prompts are executed through the agent context, creating a surface for indirect prompt injection if the test data is untrusted. This behavior is associated with the intended primary skill purpose as a development and testing tool.
  • Ingestion points: evals/evals.json and eval_set.json files processed in scripts/run_eval.py.
  • Boundary markers: Test queries are passed as arguments to the claude CLI.
  • Capability inventory: The skill can execute shell commands and modify files to facilitate the benchmarking process.
  • Sanitization: The scripts focus on structural parsing and formatting; the content of the test prompts themselves is passed directly to the model for execution.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 26, 2026, 05:37 PM