advanced-evaluation
Pass
Audited by Gen Agent Trust Hub on Apr 14, 2026
Risk Level: SAFE
Full Analysis
- [SAFE]: The skill is purely informational, providing methodologies, prompt templates, and theoretical frameworks for evaluating AI model outputs.
- [SAFE]: No executable scripts, binaries, or shell commands are included in the file.
- [SAFE]: External references are limited to academic papers on ArXiv and reputable industry blogs, which are standard for documentation and research-oriented skills.
- [SAFE]: Prompt templates provided for direct scoring and pairwise comparison include specific instructions to mitigate biases (length, position), which serves as a security best practice for building robust evaluation systems.
Audit Metadata