webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION] (HIGH): The documentation in SKILL.md contains a deceptive directive instructing the agent to 'DO NOT read the source until you try running the script first'. This is an attempt to bypass security analysis by treating scripts as 'black boxes', effectively preventing the model from auditing the code for malicious behavior before execution.
  • [COMMAND_EXECUTION] (HIGH): The scripts/with_server.py utility uses subprocess.Popen with shell=True to execute commands passed via the --server argument. This allows for arbitrary shell command injection on the host environment.
  • [PROMPT_INJECTION] (LOW): The 'Reconnaissance-Then-Action' pattern constitutes a significant surface for Indirect Prompt Injection (Category 8). The agent ingests untrusted data from the web application's DOM and console logs to determine its next actions. Ingestion points: page.content(), page.locator().all(), and handle_console_message. Boundary markers: None. Capability inventory: Arbitrary shell command execution via with_server.py and file writing in examples/console_logging.py. Sanitization: None; the agent is encouraged to use discovered content directly as selectors or logic inputs.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 17, 2026, 06:10 PM