The Agent Skills Directory

[COMMAND_EXECUTION]: The skill contains educational Python snippets illustrating the use of subprocess.run to execute external testing tools like pytest for grading coding tasks.\n- [REMOTE_CODE_EXECUTION]: The examples demonstrate how to evaluate agent-generated Python code using the exec() function. This is provided for instructional purposes with an explicit note to perform the execution within a sandbox environment.\n- [EXTERNAL_DOWNLOADS]: The documentation references trusted industry resources and well-known services, including Anthropic's engineering blog, SWE-bench, WebArena, and standard GitHub Actions for CI/CD integration.

agent-evaluation