advanced-evaluation

Pass

Audited by Gen Agent Trust Hub on Apr 14, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill is purely informational, providing methodologies, prompt templates, and theoretical frameworks for evaluating AI model outputs.
  • [SAFE]: No executable scripts, binaries, or shell commands are included in the file.
  • [SAFE]: External references are limited to academic papers on ArXiv and reputable industry blogs, which are standard for documentation and research-oriented skills.
  • [SAFE]: Prompt templates provided for direct scoring and pairwise comparison include specific instructions to mitigate biases (length, position), which serves as a security best practice for building robust evaluation systems.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 14, 2026, 10:52 AM