benchmark-testing
Pass
Audited by Gen Agent Trust Hub on Mar 7, 2026
Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [External Plugin Integration]: The skill installs a plugin from a trusted organization's repository using npx. This allows the benchmarking environment to stay current with development versions of the software.
- [System Command Orchestration]: The workflow utilizes shell commands and terminal CLI tools to automate the creation of test directories and manage agent sessions. These operations are conducted within a user-defined testing base directory.
- [Prompt Logic Guidance]: The skill provides prompt templates that include specific instructions to link projects to a designated team. These directives are intended to exercise the agent's integration and deployment capabilities during the benchmarking process.
Audit Metadata