webapp-testing
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION] (HIGH): The skill utilizes a helper script
scripts/with_server.pythat takes arbitrary strings as shell commands (e.g.,--server 'npm run dev'). This allows for arbitrary command execution if the input strings are influenced by untrusted sources. - [REMOTE_CODE_EXECUTION / DYNAMIC_EXECUTION] (HIGH): The core workflow requires the agent to 'write native Python Playwright scripts' and then execute them. While intended for testing, this capability allows for the creation and execution of any arbitrary Python code on the host system.
- [PROMPT_INJECTION] (HIGH): The skill is highly vulnerable to Indirect Prompt Injection (Category 8).
- Ingestion points: Data is ingested via
page.content(),page.goto(), andpage.locator()from potentially untrusted local or remote web applications. - Boundary markers: No boundary markers or 'ignore' instructions are provided to help the agent distinguish between application data and instructions.
- Capability inventory: The agent has full shell access via the server helper and the ability to execute generated Python scripts.
- Sanitization: There is no evidence of sanitization or validation of the content retrieved from the browser before it is used to influence the agent's next steps.
- [METADATA_POISONING] (MEDIUM): The instructions explicitly state: 'DO NOT read the source until you try running the script first.' This is a deceptive pattern that encourages the agent to bypass its own reasoning/safety checks and execute potentially malicious code blindly.
Recommendations
- AI detected serious security threats
Audit Metadata