webapp-testing
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [Command Execution / RCE] (HIGH): The skill is designed to execute arbitrary shell commands through the
scripts/with_server.pyhelper. Users or the agent can pass arbitrary strings to the--serverflag (e.g.,"npm run dev"), which are executed on the host system. This capability can be easily abused to run malicious payloads. - [Indirect Prompt Injection] (HIGH): The skill possesses a high-risk vulnerability surface for indirect prompt injection.
- Ingestion points: Untrusted data enters the context through
page.goto()andpage.content()when the agent navigates to web applications as described inSKILL.md. - Boundary markers: There are no boundary markers or instructions to ignore embedded commands within the processed HTML/DOM content.
- Capability inventory: The agent has the ability to execute shell commands via
with_server.pyand generate/execute arbitrary Python scripts. - Sanitization: No sanitization or validation of the web content is performed before the agent 'identifies selectors' and 'executes actions' based on that content.
- [Obfuscation / Evasion] (MEDIUM): The instructions explicitly state 'DO NOT read the source until you try running the script first' and 'These scripts... exist to be called directly as black-box scripts'. This encourages the agent to execute code without verifying its safety, serving as a social engineering tactic to evade analysis of potentially malicious logic within the referenced scripts.
- [Dynamic Execution] (MEDIUM): The core workflow requires the agent to generate and execute Python Playwright scripts at runtime. While necessary for the stated purpose, this provides the mechanism for an attacker to pivot from an indirect injection to full code execution.
Recommendations
- AI detected serious security threats
Audit Metadata