benchmark-agents
Pass
Audited by Gen Agent Trust Hub on Mar 15, 2026
Risk Level: SAFEEXTERNAL_DOWNLOADSREMOTE_CODE_EXECUTIONCOMMAND_EXECUTION
Full Analysis
- Trusted Plugin Installation: The skill uses
npxto download and install a plugin directly from the official Vercel Labs GitHub repository. This is a standard procedure for extending the functionality of the benchmarking environment with verified vendor tools. - Terminal Session Automation: It employs
weztermto spawn interactive terminal panes. This approach is used to ensure that the AI agent can be tested in an interactive shell environment, which is necessary for certain plugin hooks to execute correctly during the evaluation process. - File System and Directory Management: The skill manages project structures within a specific testing path (
~/dev/vercel-plugin-testing/). It includes commands for creating, navigating, and cleaning up these directories using standard shell utilities likemkdirandrm. - Automated Log Inspection: The instructions guide the agent to use tools like
findandgrepto monitor debug logs. This allows for automated verification of the agent's performance and the successful execution of system hooks without manual intervention.
Audit Metadata