webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
  • COMMAND_EXECUTION (HIGH): The script scripts/with_server.py uses subprocess.Popen(server['cmd'], shell=True) and subprocess.run(args.command). Because these arguments are passed directly from the command line, any untrusted input processed by the agent and passed to this script could lead to arbitrary command execution on the host system.
  • PROMPT_INJECTION (MEDIUM): SKILL.md contains an instruction: 'DO NOT read the source until you try running the script first... They exist to be called directly as black-box scripts rather than ingested into your context window.' While framed as a context-window optimization, this discourages the agent from performing security self-inspection on scripts that contain high-risk shell execution logic.
  • INDIRECT_PROMPT_INJECTION (LOW): The skill is designed to interact with and test web applications using Playwright.
  • Ingestion points: page.goto(url) in examples/console_logging.py and examples/element_discovery.py ingest external, potentially attacker-controlled web content.
  • Boundary markers: Absent. There are no delimiters or instructions to ignore embedded commands in the processed HTML.
  • Capability inventory: The agent has access to subprocess.Popen (via with_server.py), file writing (via console_logging.py), and browser interaction (page.click, page.fill).
  • Sanitization: None. The content is processed directly to identify selectors and extract text.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 17, 2026, 06:22 PM