webapp-testing
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTIONREMOTE_CODE_EXECUTION
Full Analysis
- [COMMAND_EXECUTION] (HIGH): The script
scripts/with_server.pyusessubprocess.Popen(..., shell=True)to execute commands passed via the--serverargument. This pattern is highly susceptible to command injection if the arguments are derived from untrusted or unvalidated input. - [PROMPT_INJECTION] (HIGH): The skill exhibits a significant Indirect Prompt Injection surface (Category 8). It is designed to navigate to external/local web applications and ingest their rendered content, console logs, and DOM structure.
- Ingestion points: Web content is ingested via
page.content(),page.locator().all(), and screenshots as described inSKILL.mdandexamples/element_discovery.py. - Boundary markers: There are no instructions or delimiters defined to help the agent distinguish between its own instructions and potentially malicious instructions embedded in the HTML/JS of the application being tested.
- Capability inventory: The agent has the capability to execute shell commands (via
with_server.py) and perform file system operations (saving screenshots and logs inexamples/console_logging.py). - Sanitization: No sanitization or filtering of the external web content is performed before the agent uses it to 'Identify selectors' and 'Execute actions'.
- [REMOTE_CODE_EXECUTION] (MEDIUM): The skill's core workflow relies on the agent dynamically writing and executing native Python scripts. While this is the intended use case, the lack of isolation between the untrusted data ingestion (web scraping) and the code execution environment creates a path for an attacker to achieve code execution by poisoning the web pages the agent is instructed to test.
Recommendations
- AI detected serious security threats
Audit Metadata