webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
  • [COMMAND_EXECUTION] (HIGH): The skill facilitates the execution of arbitrary shell commands via the scripts/with_server.py helper (e.g., passing npm run dev or other shell commands to the --server flag).
  • [REMOTE_CODE_EXECUTION] (HIGH): The core functionality relies on the agent generating and executing custom Python scripts (your_automation.py) using the Playwright library.
  • [INDIRECT_PROMPT_INJECTION] (HIGH): There is a significant vulnerability surface where the agent processes untrusted data from web applications.
  • Ingestion points: Untrusted data enters the agent context through page.content() and browser interaction in Playwright scripts (SKILL.md).
  • Boundary markers: None identified. There are no instructions to the agent to treat page content as untrusted data or use delimiters.
  • Capability inventory: The skill allows arbitrary subprocess execution (via with_server.py), file writing (via page.screenshot), and execution of generated Python code.
  • Sanitization: No evidence of sanitization or filtering of the web content before processing or using it to drive further actions.
  • [PROMPT_INJECTION] (LOW): The instruction 'DO NOT read the source until you try running the script first' attempts to influence the agent's reasoning process but appears intended for token efficiency rather than malicious bypass.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 17, 2026, 12:05 AM