webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION] (HIGH): The script scripts/with_server.py uses subprocess.Popen with shell=True to execute server commands and subprocess.run to execute the final automation command. This allows for arbitrary shell command execution based on strings constructed by the agent.
  • [INDIRECT_PROMPT_INJECTION] (HIGH): The skill's core purpose involves ingesting untrusted data from web applications.
  • Ingestion points: Data enters the context via page.content(), page.on("console", ...) (captured in console_logging.py), and page.locator().inner_text() (used in element_discovery.py).
  • Boundary markers: No boundary markers or sanitization logic are present to separate application content from agent instructions.
  • Capability inventory: The agent can write and execute arbitrary Python code and run shell commands via with_server.py.
  • Sanitization: There is no evidence of sanitization for the content read from the web application before it is processed by the agent's logic.
  • [PROMPT_INJECTION] (MEDIUM): The SKILL.md file explicitly instructs the agent: 'DO NOT read the source until you try running the script first'. This encourages the agent to execute complex logic as a 'black-box' without security inspection, which is a tactic used to bypass review and hide malicious behavior.
  • [DATA_EXPOSURE] (LOW): The examples (console_logging.py, element_discovery.py) write application data and screenshots to /mnt/user-data/outputs/ and /tmp/. While these are local paths, if the application being tested contains sensitive information, it could be exposed to other processes or subsequent agent sessions.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 06:23 AM