webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 15, 2026

Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION] (HIGH): The script scripts/with_server.py uses subprocess.Popen with shell=True for server commands and subprocess.run for the final command. This pattern allows for arbitrary shell execution if arguments are influenced by untrusted data.
  • [PROMPT_INJECTION] (HIGH): The skill is vulnerable to Indirect Prompt Injection because it ingests untrusted data from web pages (via page.content() and page.locator()) while possessing high-privilege capabilities like shell command execution. Evidence: Ingestion occurs in SKILL.md and examples/element_discovery.py; Capabilities exist in scripts/with_server.py; Boundary markers and sanitization are entirely absent.
  • [DYNAMIC_EXECUTION] (MEDIUM): The core workflow involves the agent dynamically generating and executing Playwright scripts based on the state of external web applications, creating a significant attack surface.
  • [OBFUSCATION] (LOW): SKILL.md explicitly instructs the agent 'DO NOT read the source' of the helper scripts until absolutely necessary, which discourages security inspection of the tool's behavior.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 15, 2026, 08:36 PM