skills/camronh/evals-skill/evals/Gen Agent Trust Hub

evals

Warn

Audited by Gen Agent Trust Hub on Mar 10, 2026

Risk Level: MEDIUMREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONPROMPT_INJECTIONDATA_EXFILTRATIONEXTERNAL_DOWNLOADS
Full Analysis
  • [REMOTE_CODE_EXECUTION]: The documentation in use-cases/coding-agents.md provides code patterns that use Python's exec() function to validate code generated by AI models. This introduces a risk of arbitrary code execution if the agent-generated output is malicious.
  • [COMMAND_EXECUTION]: Multiple guides (e.g., use-cases/coding-agents.md, use-cases/testing-agent-skills.md, and running.md) demonstrate the use of subprocess.run() to execute system commands, test runners like pytest, and AI agent CLI interfaces such as claude and codex.
  • [PROMPT_INJECTION]: The skill facilitates the processing and evaluation of untrusted agent outputs, creating a significant surface for indirect prompt injection. Ingestion points: Evaluation targets ingest data into EvalContext fields (input, output, trace_data) from external agent interactions as described in SKILL.md and targets.md. Boundary markers: The examples use basic prompt structures for LLM judges but lack robust delimiters or specific instructions to ignore embedded instructions. Capability inventory: The framework is explicitly designed to support exec() and subprocess calls based on the results of these evaluations. Sanitization: The provided examples do not include explicit input validation or sanitization for data before it is passed to execution sinks.
  • [DATA_EXFILTRATION]: The ezvals serve command launches a local HTTP server (defaulting to port 8000) to display evaluation results. While intended for local review, this exposes evaluation traces and potentially sensitive data on the local network.
  • [EXTERNAL_DOWNLOADS]: The skill documentation recommends installing the ezvals library from PyPI and the skill itself from the author's GitHub repository (camronh/evals-skill). These are documented as vendor-controlled resources.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 10, 2026, 12:30 AM