webapp-testing
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [Command Execution / RCE] (HIGH): The script
scripts/with_server.pyusessubprocess.Popenwithshell=Trueto execute arbitrary strings provided as arguments. This allows for arbitrary command execution on the host system. - [Prompt Injection / Obfuscation] (HIGH):
SKILL.mdcontains multiple instructions to the agent to treat scripts as 'black boxes' and explicitly tells the agent: 'DO NOT read the source until you try running the script first'. This is a deceptive pattern designed to prevent the AI from identifying the dangerousshell=Truecalls or other malicious logic within the scripts before execution. - [Indirect Prompt Injection] (HIGH): The skill is designed to process untrusted data from web applications (via Playwright).
- Ingestion points:
page.content(),page.locator().inner_text(), and DOM inspection logic inSKILL.mdandexamples/element_discovery.py. - Boundary markers: Absent. There are no instructions or delimiters to help the AI distinguish between the testing task and instructions embedded in the web page.
- Capability inventory: High-privilege access to the shell via
scripts/with_server.py(subprocess calls) and network access via Playwright. - Sanitization: Absent. Data from the web page is directly used to 'Identify selectors' and 'Execute actions'.
- [Data Exposure] (MEDIUM): The examples (
examples/console_logging.py,examples/element_discovery.py) demonstrate writing screenshots and log files to potentially sensitive locations like/tmp/and/mnt/user-data/outputs/.
Recommendations
- AI detected serious security threats
Audit Metadata