The Agent Skills Directory

[COMMAND_EXECUTION]: The skill performs the following command executions:
Executes internal setup scripts using bash (e.g., tasks/{suite}/{task}/setup.sh) to prepare test workspaces. These scripts are part of the skill package and generate mock data or initialize local git repositories.
Uses openssl to generate SHA-256 HMAC signatures for benchmark results to ensure integrity for leaderboard submissions.
Instructs the agent to use standard tools like git, python3, jq, and bash to complete benchmarking tasks within a controlled temporary directory.
[PROMPT_INJECTION]: The skill implements a task-based evaluation framework:
The agent is prompted to read and execute the user_message from task definition files (task.yaml) distributed within the skill package.
This mechanism is the core functionality of the benchmarking suite and is accompanied by explicit boundary instructions directing the agent to operate only within designated temporary workspace directories (/tmp/agentbench-task-*).

agentbench