webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [Command Execution / RCE] (HIGH): The script scripts/with_server.py uses subprocess.Popen with shell=True to execute arbitrary strings provided as arguments. This allows for arbitrary command execution on the host system.
  • [Prompt Injection / Obfuscation] (HIGH): SKILL.md contains multiple instructions to the agent to treat scripts as 'black boxes' and explicitly tells the agent: 'DO NOT read the source until you try running the script first'. This is a deceptive pattern designed to prevent the AI from identifying the dangerous shell=True calls or other malicious logic within the scripts before execution.
  • [Indirect Prompt Injection] (HIGH): The skill is designed to process untrusted data from web applications (via Playwright).
  • Ingestion points: page.content(), page.locator().inner_text(), and DOM inspection logic in SKILL.md and examples/element_discovery.py.
  • Boundary markers: Absent. There are no instructions or delimiters to help the AI distinguish between the testing task and instructions embedded in the web page.
  • Capability inventory: High-privilege access to the shell via scripts/with_server.py (subprocess calls) and network access via Playwright.
  • Sanitization: Absent. Data from the web page is directly used to 'Identify selectors' and 'Execute actions'.
  • [Data Exposure] (MEDIUM): The examples (examples/console_logging.py, examples/element_discovery.py) demonstrate writing screenshots and log files to potentially sensitive locations like /tmp/ and /mnt/user-data/outputs/.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 12:07 AM