benchmark-runner

Warn

Audited by Gen Agent Trust Hub on Mar 7, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONDATA_EXFILTRATION
Full Analysis
  • [COMMAND_EXECUTION]: The skill provides various shell scripts in references/environment-capture.md and references/test-case-design.md to collect system information and configure hardware. Evidence includes commands like lscpu, sysctl, nproc, free, nvidia-smi, lsblk, sw_vers, and uname. It also suggests using sudo cpupower frequency-set --governor performance to modify system-wide power management settings for benchmark stability.- [DATA_EXFILTRATION]: The skill instructs the agent to capture the local software environment state, which can reveal configuration details of the host system. Evidence includes commands like pip freeze and npm list to export comprehensive lists of installed packages and versions.- [REMOTE_CODE_EXECUTION]: The skill is designed to generate shell and Python scripts for benchmark reproduction. The Output Format in SKILL.md includes a Reproduction section containing generated shell commands intended to be executed on the host system.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 7, 2026, 09:55 PM