auto-arena
Pass
Audited by Gen Agent Trust Hub on Mar 7, 2026
Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTION
Full Analysis
- [EXTERNAL_DOWNLOADS]: The skill requires the installation of
py-openjudge, which is the official benchmarking library from the skill authoragentscope-ai, andmatplotlib, a well-known plotting library. - [COMMAND_EXECUTION]: The skill operates via a CLI (e.g.,
python -m cookbooks.auto_arena) to run evaluation pipelines, handle checkpoints, and generate reports. - [CREDENTIALS_UNSAFE]: While the skill requires LLM API keys, the provided documentation and configuration templates correctly promote the use of environment variable interpolation to avoid hardcoding credentials.
Audit Metadata