The Agent Skills Directory

[COMMAND_EXECUTION]: The skill utilizes the subprocess module to execute shell commands via the claude CLI in several scripts, including scripts/run_eval.py, scripts/improve_description.py, and scripts/run_loop.py. This is the primary mechanism for running test cases and description optimization.
[REMOTE_CODE_EXECUTION]: Evaluation queries provided by the user (via evals.json or eval_set.json) are passed as arguments to claude -p in scripts/run_eval.py. This presents a risk of command injection if the user-controlled input contains malicious shell characters or triggers unexpected tool behavior within the agent environment.
[DATA_EXFILTRATION]: The script eval-viewer/generate_review.py recursively reads files from the designated workspace to embed them into a standalone HTML viewer. If the tool is misconfigured to target a sensitive directory, it could expose private files, credentials, or keys by reading them and rendering them in the browser-based review interface.
[PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection as it ingests untrusted data from test cases and user feedback and incorporates them into prompts for skill optimization.
Ingestion points: Evaluation queries and user feedback are read from eval_set.json and feedback.json respectively.
Boundary markers: The skill uses XML tags (e.g., <current_description>, <skill_content>) to separate untrusted data in the prompts, which is a best practice but does not entirely eliminate the risk of sophisticated injection attacks.
Capability inventory: The skill possesses the ability to execute shell commands and modify files in the project directory.
Sanitization: No explicit sanitization or filtering of the user-provided data was observed in the scripts processing these inputs.

skill-creator