run-test-plan

Pass

Audited by Gen Agent Trust Hub on Apr 9, 2026

Risk Level: SAFECOMMAND_EXECUTIONDATA_EXFILTRATION
Full Analysis
  • [COMMAND_EXECUTION]: The skill reads a YAML test plan from docs/testing/test-plan.yaml and executes shell commands specified in the setup.commands field. This allows for the execution of arbitrary commands defined within a local configuration file.
  • [COMMAND_EXECUTION]: During the test execution phase (Step 4b), the skill runs curl commands and agent-browser CLI actions using parameters (methods, URLs, headers, and bodies) sourced directly from the YAML test plan.
  • [DATA_EXFILTRATION]: The skill performs curl requests to URLs defined in the test plan. A maliciously crafted test plan could point these requests to external attacker-controlled servers to exfiltrate data.
  • [DATA_EXFILTRATION]: Upon test failure (Step 6), the skill executes git diff to identify changes and captures the output, along with potentially sensitive API responses, into a failure report file (docs/testing/evidence/<test.id>-failure.md).
  • [COMMAND_EXECUTION]: The skill has a significant attack surface for indirect prompt injection via the docs/testing/test-plan.yaml file.
  • Ingestion points: The YAML test plan is read in Step 1.
  • Boundary markers: None; the skill lacks delimiters or instructions to ignore embedded commands in the YAML.
  • Capability inventory: Execution of shell commands, background process management (nohup), network requests (curl), and browser automation (agent-browser).
  • Sanitization: While the skill uses yaml.safe_load to parse the file, it performs no validation or sanitization of the commands or URLs contained within it before execution.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 9, 2026, 10:47 AM