webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTIONDATA_EXFILTRATION
Full Analysis
  • [COMMAND_EXECUTION] (HIGH): The helper script scripts/with_server.py uses subprocess.Popen(..., shell=True) and subprocess.run() to execute command strings passed as arguments. There is no validation or sanitization of these commands. While intended for starting development servers (e.g., 'npm run dev'), an attacker could manipulate the agent into executing malicious shell commands.
  • [PROMPT_INJECTION] (HIGH): This skill is vulnerable to Indirect Prompt Injection due to its primary function of processing untrusted web content.
  • Ingestion points: The agent reads external content via page.goto(), page.content(), and element discovery in Playwright (e.g., in element_discovery.py).
  • Boundary markers: Absent. The agent is not instructed to ignore or isolate instructions found within the HTML/JS of the applications it tests.
  • Capability inventory: The agent possesses high-privilege capabilities including arbitrary shell execution (with_server.py), file system writes (logs and screenshots), and full browser control.
  • Sanitization: None. Content from the web application is directly processed and used to determine subsequent actions, such as identifying selectors or running scripts.
  • [DATA_EXFILTRATION] (MEDIUM): The skill demonstrates the use of file:// URLs in static_html_automation.py to access the local file system. A malicious web page could exploit this capability by instructing the agent to 'test' local files containing sensitive data (e.g., .env files or SSH keys) and then exfiltrate that data via a network request or by capturing it in logs/screenshots sent back to the attacker.
  • [OBFUSCATION] (LOW): The SKILL.md instructions explicitly tell the agent 'DO NOT read the source until you try running the script first.' While this is framed as a way to save context window space, it discourages the agent from performing safety checks on the internal logic of with_server.py before execution.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 11:36 AM