auto-arena

Pass

Audited by Gen Agent Trust Hub on Mar 7, 2026

Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTION
Full Analysis
  • [EXTERNAL_DOWNLOADS]: The skill requires the installation of py-openjudge, which is the official benchmarking library from the skill author agentscope-ai, and matplotlib, a well-known plotting library.
  • [COMMAND_EXECUTION]: The skill operates via a CLI (e.g., python -m cookbooks.auto_arena) to run evaluation pipelines, handle checkpoints, and generate reports.
  • [CREDENTIALS_UNSAFE]: While the skill requires LLM API keys, the provided documentation and configuration templates correctly promote the use of environment variable interpolation to avoid hardcoding credentials.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 7, 2026, 03:41 AM