validate-evaluator
Pass
Audited by Gen Agent Trust Hub on Mar 3, 2026
Risk Level: SAFE
Full Analysis
- [SAFE]: The skill uses established statistical methods (TPR, TNR, Rogan-Gladen correction) to validate LLM performance. No prompt injection or data exfiltration vectors were identified in the code or instructions.\n- [EXTERNAL_DOWNLOADS]: The skill references standard Python libraries (
scikit-learn,NumPy) and thejudgypackage. These are appropriate for the task and are sourced from official registries.\n- [COMMAND_EXECUTION]: Includes standard command-line instructions for package installation (pip install judgy).
Audit Metadata