The Agent Skills Directory

[EXTERNAL_DOWNLOADS]: The skill installs the py-openjudge and matplotlib packages from PyPI. py-openjudge is a vendor-related tool provided by the author to support the benchmarking logic.
[COMMAND_EXECUTION]: It uses the python -m command to execute the benchmarking pipeline, which is a common and transparent practice for running Python-based tools.
[PROMPT_INJECTION]: The skill enables users to configure system prompts for the models being evaluated. This functionality is essential for its primary purpose of testing how different instructions affect model hallucination rates.

ref-hallucination-arena