agent-evaluation
Pass
Audited by Gen Agent Trust Hub on Mar 6, 2026
Risk Level: SAFECOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
- [COMMAND_EXECUTION]: The skill contains educational Python snippets illustrating the use of
subprocess.runto execute external testing tools likepytestfor grading coding tasks.\n- [REMOTE_CODE_EXECUTION]: The examples demonstrate how to evaluate agent-generated Python code using theexec()function. This is provided for instructional purposes with an explicit note to perform the execution within a sandbox environment.\n- [EXTERNAL_DOWNLOADS]: The documentation references trusted industry resources and well-known services, including Anthropic's engineering blog, SWE-bench, WebArena, and standard GitHub Actions for CI/CD integration.
Audit Metadata