skill-eval

Pass

Audited by Gen Agent Trust Hub on Apr 9, 2026

Risk Level: SAFECOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill invokes local Python modules and system-level commands to automate skill testing and optimization. It executes scripts such as scripts.skill_eval.run_eval, run_loop.py, and quick_validate.py via the Bash tool. It also utilizes the claude -p command-line interface to perform live trigger testing and generate improvement proposals.
  • [PROMPT_INJECTION]: The skill has a significant attack surface for indirect prompt injection because it is designed to ingest and act upon the contents of other skills. Ingestion points: The skill reads SKILL.md files and associated scripts from user-provided paths, as well as test queries from evals.json files. Boundary markers: The instructions lack specific requirements for using delimiters or "ignore previous instructions" safety headers when passing external skill content to evaluation subagents. Capability inventory: The orchestrating agent has access to Bash, Write, Read, and Grep tools. Malicious instructions inside an ingested skill could potentially trick the agent into using these tools to perform unintended file modifications or command execution. Sanitization: There is no evidence of sanitization or filtering of the content extracted from target skills before it is processed by the AI for evaluation or optimization.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 9, 2026, 03:01 PM