skill-creator
Pass
Audited by Gen Agent Trust Hub on Mar 19, 2026
Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill uses subprocesses to execute the
claudeCLI tool (scripts/run_eval.py) for testing skill triggering and manages local server processes usinglsofandkill(eval-viewer/generate_review.py). These operations are necessary for the tool's core functionality of benchmarking and serving the evaluation viewer. \n- [EXTERNAL_DOWNLOADS]: The skill communicates with the Anthropic API to perform analysis and improve skill descriptions (scripts/improve_description.py). Additionally, the evaluation viewer template (eval-viewer/viewer.html) loads the SheetJS library from a well-known CDN for rendering spreadsheet data. All external references are to trusted organizations or established technology service providers. \n- [PROMPT_INJECTION]: The skill exhibits an indirect prompt injection surface as it ingests untrusted data from evaluation sets and user feedback which are then processed by the agent. \n - Ingestion points: Evaluation queries are read from
eval_set.jsoninscripts/run_eval.py, and user reviews are read fromfeedback.jsonineval-viewer/generate_review.py. \n - Boundary markers: The skill uses XML-style tags such as
<skill_content>and<attempt>inscripts/improve_description.pyto delimit untrusted data within LLM prompts. \n - Capability inventory: Across its bundled scripts, the skill can execute the
claudeCLI, start a local HTTP server, and write files to the local workspace. \n - Sanitization: The skill employs standard JSON parsing and YAML block scalars to prevent structural breakages, relying on model reasoning for data interpretation.
Audit Metadata