webapp-testing
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION] (HIGH): The script
scripts/with_server.pyusessubprocess.Popenwithshell=Trueto execute server commands andsubprocess.runto execute the final automation command. This allows for arbitrary shell command execution based on strings constructed by the agent. - [INDIRECT_PROMPT_INJECTION] (HIGH): The skill's core purpose involves ingesting untrusted data from web applications.
- Ingestion points: Data enters the context via
page.content(),page.on("console", ...)(captured inconsole_logging.py), andpage.locator().inner_text()(used inelement_discovery.py). - Boundary markers: No boundary markers or sanitization logic are present to separate application content from agent instructions.
- Capability inventory: The agent can write and execute arbitrary Python code and run shell commands via
with_server.py. - Sanitization: There is no evidence of sanitization for the content read from the web application before it is processed by the agent's logic.
- [PROMPT_INJECTION] (MEDIUM): The
SKILL.mdfile explicitly instructs the agent: 'DO NOT read the source until you try running the script first'. This encourages the agent to execute complex logic as a 'black-box' without security inspection, which is a tactic used to bypass review and hide malicious behavior. - [DATA_EXPOSURE] (LOW): The examples (
console_logging.py,element_discovery.py) write application data and screenshots to/mnt/user-data/outputs/and/tmp/. While these are local paths, if the application being tested contains sensitive information, it could be exposed to other processes or subsequent agent sessions.
Recommendations
- AI detected serious security threats
Audit Metadata