agentbench
Pass
Audited by Gen Agent Trust Hub on Mar 15, 2026
Risk Level: SAFECOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill performs the following command executions:
- Executes internal setup scripts using
bash(e.g.,tasks/{suite}/{task}/setup.sh) to prepare test workspaces. These scripts are part of the skill package and generate mock data or initialize local git repositories. - Uses
opensslto generate SHA-256 HMAC signatures for benchmark results to ensure integrity for leaderboard submissions. - Instructs the agent to use standard tools like
git,python3,jq, andbashto complete benchmarking tasks within a controlled temporary directory. - [PROMPT_INJECTION]: The skill implements a task-based evaluation framework:
- The agent is prompted to read and execute the
user_messagefrom task definition files (task.yaml) distributed within the skill package. - This mechanism is the core functionality of the benchmarking suite and is accompanied by explicit boundary instructions directing the agent to operate only within designated temporary workspace directories (/tmp/agentbench-task-*).
Audit Metadata