webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [Obfuscation / Intent Hiding] (HIGH): The SKILL.md file contains an instruction to 'DO NOT read the source until you try running the script first' and describes scripts as 'black-box'. This is a dangerous pattern designed to prevent the AI from auditing the logic it executes, potentially hiding malicious shell operations in with_server.py.- [Command Execution] (HIGH): The helper script scripts/with_server.py utilizes subprocess.Popen(shell=True) and subprocess.run() to execute arbitrary strings. This provides a direct path for shell injection and Remote Code Execution if command arguments are influenced by untrusted data.- [Indirect Prompt Injection] (HIGH): The skill has a significant vulnerability surface where external data can control privileged actions.
  • Ingestion points: Browser DOM content, console logs, and screenshots are captured from local or remote web applications (e.g., in examples/element_discovery.py and examples/console_logging.py).
  • Boundary markers: Completely absent; there are no delimiters or instructions for the agent to ignore commands found within web content.
  • Capability inventory: Full shell execution capabilities via with_server.py and the ability to write to the filesystem.
  • Sanitization: Absent; data retrieved from the browser is used directly to inform subsequent agent actions and script generation.- [Dynamic Execution] (MEDIUM): The core workflow requires the agent to generate and execute native Python scripts, which is inherently risky when those scripts are based on reconnaissance of untrusted web environments.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 09:39 AM