skills/workersio/spec/skill-benchmark/Gen Agent Trust Hub

skill-benchmark

Fail

Audited by Gen Agent Trust Hub on Mar 11, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill uses a verification script (run_checks.py) that executes shell commands defined in task files using subprocess.run with shell=True, which can be exploited if task definitions are malicious.
  • [COMMAND_EXECUTION]: The skill mandates the use of the --dangerously-skip-permissions flag when invoking the claude CLI, which bypasses the standard requirement for human approval of agent tool calls.
  • [COMMAND_EXECUTION]: The execution logic unsets environment variables CLAUDECODE and CLAUDE_CODE_ENTRYPOINT to bypass safety-critical recursion and nesting limits for agent sessions.
  • [REMOTE_CODE_EXECUTION]: The benchmark runner executes prompts and verification steps that can contain arbitrary code; if the tasks or the skills being evaluated are untrusted, this creates a vector for malicious code execution on the host system.
  • [PROMPT_INJECTION]: The skill utilizes the --append-system-prompt feature to force specific instructions on nested agent sessions, overriding their default behavior and safety boundaries.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Mar 11, 2026, 11:26 PM