webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION] (HIGH): The script scripts/with_server.py uses subprocess.Popen(server['cmd'], shell=True) where the command is derived directly from the --server argument. This allows for arbitrary command injection if the input to the script is not strictly controlled.
  • [REMOTE_CODE_EXECUTION] (HIGH): The skill's primary function is to have the agent write and execute new Python scripts based on external web content. This 'generate-and-execute' loop is a high-risk capability that can be hijacked by malicious data.
  • [PROMPT_INJECTION] (HIGH): The instructions in SKILL.md use a deceptive pattern by explicitly telling the agent: 'Do not read the source until you know a customized solution is absolutely necessary... These scripts... are there to be called directly as black box scripts.' This discourages the AI from performing safety self-inspections of the code it executes.
  • [PROMPT_INJECTION] (HIGH): Category 8: Indirect Prompt Injection Vulnerability.
  • Ingestion points: element_discovery.py and console_logging.py ingest untrusted data via page.content(), page.on("console"), and button.inner_text().
  • Boundary markers: None. The skill does not provide any delimiters or instructions to ignore commands found within the web pages being tested.
  • Capability inventory: The agent is authorized to use subprocess.run (via with_server.py) and write/execute arbitrary Playwright scripts on the local system.
  • Sanitization: None. The agent is encouraged to use discovered selectors and content directly to formulate subsequent automation logic.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 09:29 AM