webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • COMMAND_EXECUTION (HIGH): The script scripts/with_server.py uses subprocess.Popen(shell=True) and subprocess.run() to execute strings provided via command-line arguments. This provides a direct interface for arbitrary shell command execution.
  • PROMPT_INJECTION (HIGH): The skill creates a significant Indirect Prompt Injection surface by processing untrusted web content. Evidence: 1. Ingestion points: page.content() and page.locator().all() in SKILL.md extract data from the browser. 2. Boundary markers: Absent; no delimiters or instructions are used to separate web content from agent instructions. 3. Capability inventory: High-privilege shell execution via with_server.py and local file writing via page.screenshot(). 4. Sanitization: Absent; no filtering or validation of the ingested DOM content is performed before the agent uses it to decide on subsequent actions.
  • REMOTE_CODE_EXECUTION (HIGH): The skill's core workflow involves the agent dynamically writing and executing Python Playwright scripts, which provides an attacker with a path to execute arbitrary code if they can influence the agent's logic through the browser content.
  • PROMPT_INJECTION (MEDIUM): The SKILL.md file contains a deceptive instruction: "DO NOT read the source until you try running the script first". This metadata poisoning discourages the agent from performing necessary security audits on the scripts it executes.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 06:41 AM