ref-hallucination-arena

Pass

Audited by Gen Agent Trust Hub on Mar 7, 2026

Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [EXTERNAL_DOWNLOADS]: The skill installs the py-openjudge and matplotlib packages from PyPI. py-openjudge is a vendor-related tool provided by the author to support the benchmarking logic.
  • [COMMAND_EXECUTION]: It uses the python -m command to execute the benchmarking pipeline, which is a common and transparent practice for running Python-based tools.
  • [PROMPT_INJECTION]: The skill enables users to configure system prompts for the models being evaluated. This functionality is essential for its primary purpose of testing how different instructions affect model hallucination rates.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 7, 2026, 03:41 AM