webapp-testing

Fail

Audited by Gen Agent Trust Hub on Mar 10, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [COMMAND_EXECUTION]: The script scripts/with_server.py uses subprocess.Popen with shell=True to execute server commands provided as command-line arguments. This pattern is highly susceptible to command injection if the arguments are influenced by untrusted data.
  • [REMOTE_CODE_EXECUTION]: The skill provides mechanisms to execute arbitrary shell commands and Python/TypeScript scripts, which constitutes a broad remote code execution surface.
  • [PROMPT_INJECTION]: The SKILL.md file contains instructions that explicitly tell the agent 'DO NOT read the source until you try running the script first'. This instruction discourages the agent from performing a security review of the code it is about to execute, which is a malicious instructional pattern.
  • [EXTERNAL_DOWNLOADS]: The skill directs the user to install the playwright package and its associated browser binaries from external repositories. While Playwright is a well-known tool, this contributes to the external dependency surface.
  • [PROMPT_INJECTION]: The skill is vulnerable to Indirect Prompt Injection as it processes untrusted data from web pages and has significant system capabilities.
  • Ingestion points: Web content is ingested via page.goto() and page.locator().inner_text() in SKILL.md and examples/element_discovery.py.
  • Boundary markers: Absent. There are no delimiters or instructions to the agent to ignore commands embedded in the web pages being tested.
  • Capability inventory: The skill can execute arbitrary shell commands via subprocess.Popen and subprocess.run in scripts/with_server.py, and can write files to the local system as seen in examples/console_logging.py.
  • Sanitization: Absent. No evidence of content escaping, validation, or filtering of external web data was found.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Mar 10, 2026, 03:56 AM