webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
  • [COMMAND_EXECUTION] (HIGH): The helper script scripts/with_server.py uses subprocess.Popen(shell=True) to execute commands provided via the --server argument. This is a high-risk pattern that allows for arbitrary command execution if the agent provides unvalidated input.
  • [COMMAND_EXECUTION] (HIGH): The script scripts/with_server.py uses subprocess.run() to execute trailing arguments as a separate process, facilitating the execution of arbitrary local scripts.
  • [REMOTE_CODE_EXECUTION] (MEDIUM): Through Playwright, the skill can navigate to and interact with any URL. If directed to a malicious external site, the agent could be induced to perform actions based on attacker-controlled content.
  • [DATA_EXFILTRATION] (LOW): Examples like console_logging.py capture browser logs and save them to /mnt/user-data/outputs/console.log, which could contain sensitive information from local applications.
  • [INDIRECT_PROMPT_INJECTION] (LOW):
  • Ingestion points: Browser console logs and rendered DOM content are ingested via page.on("console", ...) and page.locator(...).all() in the example scripts.
  • Boundary markers: None present; the scripts treat content from the browser as trusted data to be logged or acted upon.
  • Capability inventory: The skill has the capability to run arbitrary shell commands via scripts/with_server.py and write to the filesystem.
  • Sanitization: No sanitization or escaping of browser-sourced data is performed before logging or processing.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 17, 2026, 06:47 PM