webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 21, 2026

Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • COMMAND_EXECUTION (HIGH): The file scripts/with_server.py utilizes subprocess.Popen with shell=True to execute strings provided via the --server command-line argument. This allows for arbitrary shell command execution, including the use of shell operators (like ;, &&, |) which can be exploited for malicious purposes if input is not strictly controlled.
  • Metadata Poisoning / Deception (MEDIUM): The SKILL.md file contains explicit instructions for the AI: 'Do not read the source code... in your context window. They exist to be called directly as blackbox scripts.' This is a deceptive pattern designed to prevent the AI agent from auditing the dangerous subprocess calls within the scripts, effectively blinding the agent to the security risks it is being asked to invoke.
  • Indirect Prompt Injection (LOW): The skill interacts with web pages via Playwright (page.goto, page.content), creating a surface for indirect prompt injection where a malicious website could attempt to control the agent's behavior.
  • Ingestion points: page.goto() and page.content() in examples/element_discovery.py, examples/console_logging.py, and examples/static_html_automation.py.
  • Boundary markers: Absent; there are no instructions to the agent to treat page content as untrusted data.
  • Capability inventory: Arbitrary shell command execution via scripts/with_server.py (subprocess.Popen and subprocess.run).
  • Sanitization: Absent; the content is retrieved and processed as raw strings/DOM elements.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 21, 2026, 12:25 PM