evalview-agent-testing
Pass
Audited by Gen Agent Trust Hub on Mar 31, 2026
Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
- [EXTERNAL_DOWNLOADS]: The skill installs the
evalviewPython package from PyPI (pip install "evalview>=0.5,<1"). This is a vendor-owned resource required for the skill's primary functionality. - [COMMAND_EXECUTION]: The skill uses the
Bashtool to perform administrative and testing tasks, such as initializing suites (evalview init), creating baselines (evalview snapshot), checking for regressions (evalview check), and running an MCP server (evalview mcp serve). It also includes agate_or_revertfeature that executes filesystem-level reverts usinggit checkout -- .when regressions are detected. - [DATA_EXFILTRATION]: The monitoring feature (
evalview monitor) allows the agent to send test results and alerts to an external Slack webhook URL. While documented as a notification feature, it involves transmitting internal testing data to a remote endpoint. - [PROMPT_INJECTION]: The skill presents a surface for indirect prompt injection because it processes and evaluates potentially untrusted data generated by other AI agents.
- Ingestion points: Agent inputs, tool call sequences, and natural language outputs are ingested during evaluation and diffing processes (
SKILL.md). - Boundary markers: No explicit delimiters or instructions are provided to the agent to distinguish between test logic and the untrusted content being evaluated.
- Capability inventory: The skill possesses the ability to execute shell commands (
Bash) and modify the local filesystem (Write). - Sanitization: No sanitization or escaping of the processed agent data is mentioned before it is scored or displayed in reports.
Audit Metadata