webapp-testing
Fail
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
- [COMMAND_EXECUTION] (HIGH): The helper script
scripts/with_server.pyusessubprocess.Popen(shell=True)to execute commands provided via the--serverargument. This is a high-risk pattern that allows for arbitrary command execution if the agent provides unvalidated input. - [COMMAND_EXECUTION] (HIGH): The script
scripts/with_server.pyusessubprocess.run()to execute trailing arguments as a separate process, facilitating the execution of arbitrary local scripts. - [REMOTE_CODE_EXECUTION] (MEDIUM): Through Playwright, the skill can navigate to and interact with any URL. If directed to a malicious external site, the agent could be induced to perform actions based on attacker-controlled content.
- [DATA_EXFILTRATION] (LOW): Examples like
console_logging.pycapture browser logs and save them to/mnt/user-data/outputs/console.log, which could contain sensitive information from local applications. - [INDIRECT_PROMPT_INJECTION] (LOW):
- Ingestion points: Browser console logs and rendered DOM content are ingested via
page.on("console", ...)andpage.locator(...).all()in the example scripts. - Boundary markers: None present; the scripts treat content from the browser as trusted data to be logged or acted upon.
- Capability inventory: The skill has the capability to run arbitrary shell commands via
scripts/with_server.pyand write to the filesystem. - Sanitization: No sanitization or escaping of browser-sourced data is performed before logging or processing.
Recommendations
- AI detected serious security threats
Audit Metadata