webapp_testing

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • COMMAND_EXECUTION (HIGH): The script scripts/with_server.py utilizes subprocess.Popen with shell=True to execute strings provided via the --server command-line argument. It also uses subprocess.run to execute the trailing command sequence. This allows for arbitrary shell command execution with the privileges of the agent process. Evidence: scripts/with_server.py lines 86-90 and line 105.
  • REMOTE_CODE_EXECUTION (HIGH): The skill is designed to have the AI agent 'write native Python Playwright scripts' and execute them at runtime. This dynamic code generation and execution, when paired with the system-level command execution capabilities of the provided scripts, constitutes a remote code execution risk. Evidence: SKILL.md instruction block.
  • PROMPT_INJECTION (HIGH): The skill is highly vulnerable to Indirect Prompt Injection (Category 8). It lacks sanitization and boundary markers for external data.
  • Ingestion points: Untrusted data enters the agent context via page.goto() in examples/console_logging.py, examples/element_discovery.py, and examples/static_html_automation.py (which supports local file:// URLs).
  • Boundary markers: No delimiters or instructions to ignore embedded commands are present in the prompts.
  • Capability inventory: The skill has the capability to execute shell commands (subprocess.Popen in with_server.py), write files (open().write() in examples/console_logging.py), and capture system state (screenshots).
  • Sanitization: No escaping or validation of web content is performed before processing.
  • PROMPT_INJECTION (MEDIUM): The SKILL.md file contains a deceptive instruction: 'DO NOT read the source until you try running the script first'. This pattern attempts to discourage the agent or an auditor from inspecting the underlying code, potentially hiding malicious behavior in the large scripts it describes. Evidence: SKILL.md line 14.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 02:28 AM