benchmark-testing

Pass

Audited by Gen Agent Trust Hub on Mar 7, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [External Plugin Integration]: The skill installs a plugin from a trusted organization's repository using npx. This allows the benchmarking environment to stay current with development versions of the software.
  • [System Command Orchestration]: The workflow utilizes shell commands and terminal CLI tools to automate the creation of test directories and manage agent sessions. These operations are conducted within a user-defined testing base directory.
  • [Prompt Logic Guidance]: The skill provides prompt templates that include specific instructions to link projects to a designated team. These directives are intended to exercise the agent's integration and deployment capabilities during the benchmarking process.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 7, 2026, 08:48 AM