webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 21, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • COMMAND_EXECUTION (HIGH): The script scripts/with_server.py utilizes subprocess.Popen(..., shell=True) to execute server commands. This allows for arbitrary shell command injection if input parameters (such as server commands) are influenced by untrusted data.
  • REMOTE_CODE_EXECUTION (HIGH): The scripts/with_server.py utility is designed to execute a secondary command provided by the user/agent via subprocess.run(args.command). This provides a direct interface for running any local binary or script with the agent's privileges.
  • PROMPT_INJECTION (MEDIUM): The SKILL.md file contains a deceptive instruction: "DO NOT read the source until you try running the script first". This pattern attempts to bypass the agent's ability to perform a safety analysis of the code before execution, which is a violation of secure interaction principles.
  • DATA_EXFILTRATION (MEDIUM): The automation examples (e.g., examples/static_html_automation.py) demonstrate the use of file:// URLs. This capability allows Playwright to read sensitive local files, which could then be exfiltrated via screenshots or logs captured by the agent.
  • PROMPT_INJECTION (LOW): Category 8 (Indirect Prompt Injection) Risk: The skill is designed to ingest and process external web content.
  • Ingestion points: page.content(), page.on("console", ...), and button.inner_text() in examples/ scripts.
  • Boundary markers: Absent. The skill does not use delimiters or instructions to ignore embedded commands in the web content.
  • Capability inventory: subprocess.Popen and subprocess.run in scripts/with_server.py, and file system write access in examples/console_logging.py.
  • Sanitization: Absent. Data from the browser is used directly without escaping or validation.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 21, 2026, 08:21 AM