webapp-testing
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [Obfuscation / Intent Hiding] (HIGH): The SKILL.md file contains an instruction to 'DO NOT read the source until you try running the script first' and describes scripts as 'black-box'. This is a dangerous pattern designed to prevent the AI from auditing the logic it executes, potentially hiding malicious shell operations in with_server.py.- [Command Execution] (HIGH): The helper script
scripts/with_server.pyutilizessubprocess.Popen(shell=True)andsubprocess.run()to execute arbitrary strings. This provides a direct path for shell injection and Remote Code Execution if command arguments are influenced by untrusted data.- [Indirect Prompt Injection] (HIGH): The skill has a significant vulnerability surface where external data can control privileged actions. - Ingestion points: Browser DOM content, console logs, and screenshots are captured from local or remote web applications (e.g., in
examples/element_discovery.pyandexamples/console_logging.py). - Boundary markers: Completely absent; there are no delimiters or instructions for the agent to ignore commands found within web content.
- Capability inventory: Full shell execution capabilities via
with_server.pyand the ability to write to the filesystem. - Sanitization: Absent; data retrieved from the browser is used directly to inform subsequent agent actions and script generation.- [Dynamic Execution] (MEDIUM): The core workflow requires the agent to generate and execute native Python scripts, which is inherently risky when those scripts are based on reconnaissance of untrusted web environments.
Recommendations
- AI detected serious security threats
Audit Metadata