webapp-testing
Fail
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
- [COMMAND_EXECUTION] (HIGH): The skill facilitates the execution of arbitrary shell commands via the
scripts/with_server.pyhelper (e.g., passingnpm run devor other shell commands to the--serverflag). - [REMOTE_CODE_EXECUTION] (HIGH): The core functionality relies on the agent generating and executing custom Python scripts (
your_automation.py) using the Playwright library. - [INDIRECT_PROMPT_INJECTION] (HIGH): There is a significant vulnerability surface where the agent processes untrusted data from web applications.
- Ingestion points: Untrusted data enters the agent context through
page.content()and browser interaction in Playwright scripts (SKILL.md). - Boundary markers: None identified. There are no instructions to the agent to treat page content as untrusted data or use delimiters.
- Capability inventory: The skill allows arbitrary subprocess execution (via
with_server.py), file writing (viapage.screenshot), and execution of generated Python code. - Sanitization: No evidence of sanitization or filtering of the web content before processing or using it to drive further actions.
- [PROMPT_INJECTION] (LOW): The instruction 'DO NOT read the source until you try running the script first' attempts to influence the agent's reasoning process but appears intended for token efficiency rather than malicious bypass.
Recommendations
- AI detected serious security threats
Audit Metadata