agent-evaluation

Pass

Audited by Gen Agent Trust Hub on Apr 24, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill is entirely instructional and documents best practices for AI agent evaluation. All provided code snippets, such as those demonstrating the use of subprocess for running tests or exec() for code evaluation, are clearly presented as examples within the context of a sandbox or testing environment. No malicious patterns, such as prompt injection, unauthorized data access, or obfuscation, were detected. All external links point to well-known research benchmarks and official documentation.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 24, 2026, 02:48 AM