benchmark-agents

Pass

Audited by Gen Agent Trust Hub on Mar 17, 2026

Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTION
Full Analysis
  • External Content Retrieval: The skill uses npx to download and execute a plugin directly from a Vercel-owned GitHub repository. This is a common pattern for bootstrapping development environments and, in this context, targets a verified and trusted source.
  • Local Workspace Management: Instructions include creating and managing directories within the user's home folder (~/dev/vercel-plugin-testing/) and accessing debug logs in ~/.claude/debug/. These operations are transparently defined and restricted to specific development paths for the purpose of monitoring agent performance.
  • Subprocess Orchestration: The skill utilizes wezterm cli spawn to launch interactive terminal sessions. While this involves command execution, it is used to create isolated environments for testing the agent's behavior in real-world scenarios, which is the primary function of the skill.
  • Dynamic Environment Discovery: A small Node.js snippet is used to programmatically identify the system's temporary directory. This is a standard utility pattern to ensure compatibility across different operating systems during the evaluation process.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 17, 2026, 09:21 AM