webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • COMMAND_EXECUTION (HIGH): The script scripts/with_server.py uses subprocess.Popen(shell=True) to execute strings passed directly via the --server argument. This allows for arbitrary shell command execution on the host system.
  • PROMPT_INJECTION (HIGH): Category 8 (Indirect Prompt Injection) is a major risk. * Ingestion points: Web page DOM and console logs are processed via Playwright (element_discovery.py, console_logging.py). * Boundary markers: None found. * Capability inventory: Arbitrary shell execution via with_server.py and file system writes for logs/screenshots. * Sanitization: None. Malicious instructions embedded in a website's HTML or console outputs could trick the agent into running malicious payloads via the server management tool.
  • PROMPT_INJECTION (LOW): The SKILL.md file contains instructions telling the agent 'DO NOT read the source until you try running the script first'. While framed as a context window optimization, this practice discourages the agent from inspecting potentially dangerous logic before execution.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 02:13 AM