evals
Warn
Audited by Gen Agent Trust Hub on Mar 10, 2026
Risk Level: MEDIUMREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONPROMPT_INJECTIONDATA_EXFILTRATIONEXTERNAL_DOWNLOADS
Full Analysis
- [REMOTE_CODE_EXECUTION]: The documentation in
use-cases/coding-agents.mdprovides code patterns that use Python'sexec()function to validate code generated by AI models. This introduces a risk of arbitrary code execution if the agent-generated output is malicious. - [COMMAND_EXECUTION]: Multiple guides (e.g.,
use-cases/coding-agents.md,use-cases/testing-agent-skills.md, andrunning.md) demonstrate the use ofsubprocess.run()to execute system commands, test runners likepytest, and AI agent CLI interfaces such asclaudeandcodex. - [PROMPT_INJECTION]: The skill facilitates the processing and evaluation of untrusted agent outputs, creating a significant surface for indirect prompt injection. Ingestion points: Evaluation targets ingest data into
EvalContextfields (input,output,trace_data) from external agent interactions as described inSKILL.mdandtargets.md. Boundary markers: The examples use basic prompt structures for LLM judges but lack robust delimiters or specific instructions to ignore embedded instructions. Capability inventory: The framework is explicitly designed to supportexec()andsubprocesscalls based on the results of these evaluations. Sanitization: The provided examples do not include explicit input validation or sanitization for data before it is passed to execution sinks. - [DATA_EXFILTRATION]: The
ezvals servecommand launches a local HTTP server (defaulting to port 8000) to display evaluation results. While intended for local review, this exposes evaluation traces and potentially sensitive data on the local network. - [EXTERNAL_DOWNLOADS]: The skill documentation recommends installing the
ezvalslibrary from PyPI and the skill itself from the author's GitHub repository (camronh/evals-skill). These are documented as vendor-controlled resources.
Audit Metadata