NYC

webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION] (HIGH): The script scripts/with_server.py uses subprocess.Popen(shell=True) to execute strings provided via the --server argument and subprocess.run() for trailing command arguments. This design provides a direct interface for arbitrary shell command execution without any sanitization or validation of the input strings.
  • [PROMPT_INJECTION] (MEDIUM): The SKILL.md file contains instructions that explicitly tell the AI agent, 'DO NOT read the source until you try running the script first'. This is a deceptive pattern that attempts to bypass the agent's ability to audit executable code before invocation, potentially hiding the high-risk nature of the shell=True implementation.
  • [REMOTE_CODE_EXECUTION] (HIGH): By combining the ability to execute arbitrary shell commands with the primary function of the skill (navigating to and interacting with web applications), the skill creates a significant surface for Remote Code Execution. An attacker-controlled web page could provide instructions (Indirect Prompt Injection) that trigger the use of with_server.py with malicious payloads.
  • [INDIRECT_PROMPT_INJECTION] (LOW): The skill possesses a complete evidence chain for indirect injection vulnerabilities:
  • Ingestion points: page.goto() in all example scripts and file:// URL handling in static_html_automation.py.
  • Boundary markers: None present to distinguish between trusted instructions and data from web pages.
  • Capability inventory: Full shell access via scripts/with_server.py and file system writes via Playwright screenshots and log output.
  • Sanitization: No input validation is performed on command strings before execution.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 17, 2026, 06:27 PM