webapp-testing
Fail
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTIONDATA_EXFILTRATION
Full Analysis
- [COMMAND_EXECUTION] (HIGH): The skill provides a framework for executing arbitrary shell commands via the
scripts/with_server.pyhelper. Specifically, the--serverargument accepts strings that are likely executed via a shell (e.g.,"npm run dev","cd backend && python server.py"), enabling an attacker to run arbitrary commands if the user task is manipulated. - [REMOTE_CODE_EXECUTION] (HIGH): The skill instructs the agent to write and execute native Python Playwright scripts at runtime based on user-provided tasks. Executing generated code based on untrusted input is a classic RCE vector.
- [PROMPT_INJECTION] (MEDIUM): The skill contains a deceptive instruction: "DO NOT read the source until you try running the script first". This is a 'black box' technique intended to bypass the agent's safety reasoning or inspection of executable code before runtime, effectively overriding standard safety protocols.
- [DATA_EXFILTRATION] (LOW): Playwright scripts have the capability to read local files (via
file://protocols) and perform network requests. When coupled with the ability to generate scripts from untrusted user input, this creates a data exfiltration path. - [INDIRECT_PROMPT_INJECTION] (LOW): The skill has a significant attack surface as it ingests 'User tasks' and interpolates them into command-line arguments and script logic.
- Ingestion points: User tasks (SKILL.md decision tree).
- Boundary markers: Absent; there are no delimiters or instructions to ignore instructions within the processed data.
- Capability inventory: Subprocess execution (via
with_server.py), file system access, and network operations (via Playwright). - Sanitization: Absent; the skill does not suggest any validation or escaping of the user input before it is used in executable contexts.
Recommendations
- AI detected serious security threats
Audit Metadata