skills/openclaw/skills/agentbench/Gen Agent Trust Hub

agentbench

Pass

Audited by Gen Agent Trust Hub on Mar 15, 2026

Risk Level: SAFECOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill performs the following command executions:
  • Executes internal setup scripts using bash (e.g., tasks/{suite}/{task}/setup.sh) to prepare test workspaces. These scripts are part of the skill package and generate mock data or initialize local git repositories.
  • Uses openssl to generate SHA-256 HMAC signatures for benchmark results to ensure integrity for leaderboard submissions.
  • Instructs the agent to use standard tools like git, python3, jq, and bash to complete benchmarking tasks within a controlled temporary directory.
  • [PROMPT_INJECTION]: The skill implements a task-based evaluation framework:
  • The agent is prompted to read and execute the user_message from task definition files (task.yaml) distributed within the skill package.
  • This mechanism is the core functionality of the benchmarking suite and is accompanied by explicit boundary instructions directing the agent to operate only within designated temporary workspace directories (/tmp/agentbench-task-*).
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 15, 2026, 05:16 AM