benchmark-runner
Warn
Audited by Gen Agent Trust Hub on Mar 7, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONDATA_EXFILTRATION
Full Analysis
- [COMMAND_EXECUTION]: The skill provides various shell scripts in
references/environment-capture.mdandreferences/test-case-design.mdto collect system information and configure hardware. Evidence includes commands likelscpu,sysctl,nproc,free,nvidia-smi,lsblk,sw_vers, anduname. It also suggests usingsudo cpupower frequency-set --governor performanceto modify system-wide power management settings for benchmark stability.- [DATA_EXFILTRATION]: The skill instructs the agent to capture the local software environment state, which can reveal configuration details of the host system. Evidence includes commands likepip freezeandnpm listto export comprehensive lists of installed packages and versions.- [REMOTE_CODE_EXECUTION]: The skill is designed to generate shell and Python scripts for benchmark reproduction. TheOutput FormatinSKILL.mdincludes aReproductionsection containing generated shell commands intended to be executed on the host system.
Audit Metadata