webapp-testing
Fail
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION] (HIGH): The script
scripts/with_server.pyusessubprocess.Popenwithshell=Trueto start servers andsubprocess.runto execute automation commands. These commands are constructed from command-line arguments that can be influenced by the agent. Evidence:subprocess.Popen(server['cmd'], shell=True)andsubprocess.run(args.command)inscripts/with_server.py. - [REMOTE_CODE_EXECUTION] (MEDIUM): The skill is designed around the agent generating and executing dynamic Python code via Playwright to interact with web applications. Evidence: Procedural instructions in
SKILL.mdand multiple files in theexamples/directory. - [PROMPT_INJECTION] (LOW): The skill is vulnerable to indirect prompt injection because it ingests untrusted content from web applications (HTML content and console logs) and includes it in the agent's context without sanitization or boundary markers. Ingestion points:
page.content()andpage.on('console', ...)in example scripts. - [COMMAND_EXECUTION] (MEDIUM): Deceptive instructions are present in
SKILL.md, which contains an instruction telling the agent 'DO NOT read the source until you try running the script first'. This instruction discourages the agent from identifying the unsafe use ofshell=Truein the helper scripts prior to execution.
Recommendations
- AI detected serious security threats
Audit Metadata