ref-hallucination-arena
Pass
Audited by Gen Agent Trust Hub on Mar 7, 2026
Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [EXTERNAL_DOWNLOADS]: The skill installs the
py-openjudgeandmatplotlibpackages from PyPI.py-openjudgeis a vendor-related tool provided by the author to support the benchmarking logic. - [COMMAND_EXECUTION]: It uses the
python -mcommand to execute the benchmarking pipeline, which is a common and transparent practice for running Python-based tools. - [PROMPT_INJECTION]: The skill enables users to configure system prompts for the models being evaluated. This functionality is essential for its primary purpose of testing how different instructions affect model hallucination rates.
Audit Metadata