validate-evaluator

Pass

Audited by Gen Agent Trust Hub on Mar 3, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill uses established statistical methods (TPR, TNR, Rogan-Gladen correction) to validate LLM performance. No prompt injection or data exfiltration vectors were identified in the code or instructions.\n- [EXTERNAL_DOWNLOADS]: The skill references standard Python libraries (scikit-learn, NumPy) and the judgy package. These are appropriate for the task and are sourced from official registries.\n- [COMMAND_EXECUTION]: Includes standard command-line instructions for package installation (pip install judgy).
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 3, 2026, 11:36 PM