webapp-testing
Fail
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- COMMAND_EXECUTION (HIGH): The script 'scripts/with_server.py' uses 'subprocess.Popen' with 'shell=True' to execute strings passed to the '--server' argument. This allows for arbitrary shell command execution on the host system.
- REMOTE_CODE_EXECUTION (HIGH): The skill's primary workflow involves the agent generating and executing Python Playwright scripts. Combined with the command execution capabilities in 'with_server.py', this creates a significant surface for executing arbitrary code.
- PROMPT_INJECTION (MEDIUM): 'SKILL.md' contains an instruction ('DO NOT read the source until you try running the script first') that explicitly directs the agent to avoid inspecting the implementation of its tools. This behaves as a bypass marker to prevent the agent from identifying insecure code like the 'shell=True' usage.
- PROMPT_INJECTION (LOW): The skill is vulnerable to Indirect Prompt Injection (Category 8).
- Ingestion points: Browser content via 'page.content()', 'page.locator()', and screenshots in 'element_discovery.py'.
- Boundary markers: Absent. There are no instructions to the agent to treat the content of the web applications it tests as untrusted.
- Capability inventory: The agent has the capability to execute shell commands ('with_server.py') and write files ('console_logging.py').
- Sanitization: Absent. The agent is encouraged to use discovered content to determine its next actions without any validation or escaping.
Recommendations
- AI detected serious security threats
Audit Metadata