agent-evaluation
Pass
Audited by Gen Agent Trust Hub on Apr 24, 2026
Risk Level: SAFE
Full Analysis
- [SAFE]: The skill is entirely instructional and documents best practices for AI agent evaluation. All provided code snippets, such as those demonstrating the use of
subprocessfor running tests orexec()for code evaluation, are clearly presented as examples within the context of a sandbox or testing environment. No malicious patterns, such as prompt injection, unauthorized data access, or obfuscation, were detected. All external links point to well-known research benchmarks and official documentation.
Audit Metadata