webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
  • COMMAND_EXECUTION (HIGH): The script scripts/with_server.py uses subprocess.Popen with shell=True to execute strings passed to the --server argument. This is a classic command injection vulnerability if the input is influenced by untrusted data.
  • REMOTE_CODE_EXECUTION (HIGH): The skill is designed to have the agent write and execute arbitrary Python Playwright scripts. Combined with the with_server.py utility, this provides a direct path for the agent to execute arbitrary system commands and interact with the network.
  • PROMPT_INJECTION (MEDIUM): The SKILL.md file contains a directive: "DO NOT read the source until you try running the script first". This instruction discourages the agent from performing security inspections of its own tools before execution, which is an adversarial pattern intended to hide malicious logic in scripts like with_server.py.
  • INDIRECT_PROMPT_INJECTION (LOW): The skill's core function is web reconnaissance (using page.content() and page.screenshot()). This creates a vulnerability surface where a malicious web application could inject instructions into the agent's context.
  • Ingestion points: page.goto(url) and page.content() are used to bring external web data into the agent's reasoning loop (found in SKILL.md and examples/element_discovery.py).
  • Boundary markers: Absent. No instructions are provided to the agent to treat website content as untrusted or to use delimiters.
  • Capability inventory: The agent can execute shell commands via with_server.py and perform file/network operations via Playwright and standard Python libraries.
  • Sanitization: Absent. There is no evidence of HTML sanitization or instruction filtering before the agent processes the DOM content.
  • DATA_EXFILTRATION (LOW): The skill includes examples like examples/console_logging.py that write logs to /mnt/user-data/outputs/, which could be used to store sensitive information captured from local web applications.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 17, 2026, 06:48 PM