The Agent Skills Directory

[COMMAND_EXECUTION]: The skill executes a complex bash orchestration script (run-benchmark.sh) and several Node.js scripts to manage isolated benchmarking environments, including filesystem operations and git worktree management.
[COMMAND_EXECUTION]: The benchmark runner invokes the claude CLI with the --dangerously-skip-permissions flag. This configuration bypasses the platform's security controls that normally require human authorization for sensitive operations like file writes or command execution, enabling the benchmarked agent to act autonomously.
[PROMPT_INJECTION]: The skill presents an indirect prompt injection surface by reading scenario data from references/goals.md and interpolating it into automated agent prompts without sanitization. Evidence: (1) Ingestion point: references/goals.md (2) Boundary markers: Absent (3) Capability inventory: Full shell and filesystem access via the claude CLI (4) Sanitization: Absent.

ln-840-benchmark-compare