skill-creator

Warn

Audited by Gen Agent Trust Hub on Mar 6, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [COMMAND_EXECUTION]: The script scripts/run_eval.py utilizes the Python subprocess module to execute the claude CLI tool for evaluation purposes. Furthermore, eval-viewer/generate_review.py invokes the lsof system utility to identify and manage processes occupying network ports.
  • [REMOTE_CODE_EXECUTION]: The skill implements a workflow where it dynamically writes temporary instruction files to the .claude/commands/ directory and subsequently triggers their execution via the claude CLI. This enables the runtime generation and application of agent behaviors based on evaluation datasets.
  • [EXTERNAL_DOWNLOADS]: The skill facilitates communication with Anthropic's official API using the anthropic Python client library in scripts/improve_description.py and scripts/run_loop.py to perform automated description optimization. Additionally, the visualization component in eval-viewer/viewer.html loads the SheetJS library from a public CDN (cdn.sheetjs.com) for spreadsheet processing.
  • [PROMPT_INJECTION]: The skill possesses an indirect prompt injection surface as it ingests untrusted data from evaluation sets and user feedback files. This data is interpolated into prompts used for description optimization in scripts/improve_description.py without explicit sanitization, though it employs XML-style boundary markers like <current_description> to mitigate accidental misinterpretation. Evidence Chain: 1) Ingestion points: eval_set.json and feedback.json. 2) Boundary markers: XML tags used in optimizer prompt. 3) Capability inventory: Subprocess command execution, file system modification, and network operations. 4) Sanitization: No explicit validation or filtering observed for query content.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 6, 2026, 11:01 AM