webapp-testing
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- COMMAND_EXECUTION (HIGH): The script
scripts/with_server.pyusessubprocess.Popen(shell=True)to execute strings passed directly via the--serverargument. This allows for arbitrary shell command execution on the host system. - PROMPT_INJECTION (HIGH): Category 8 (Indirect Prompt Injection) is a major risk. * Ingestion points: Web page DOM and console logs are processed via Playwright (
element_discovery.py,console_logging.py). * Boundary markers: None found. * Capability inventory: Arbitrary shell execution viawith_server.pyand file system writes for logs/screenshots. * Sanitization: None. Malicious instructions embedded in a website's HTML or console outputs could trick the agent into running malicious payloads via the server management tool. - PROMPT_INJECTION (LOW): The
SKILL.mdfile contains instructions telling the agent 'DO NOT read the source until you try running the script first'. While framed as a context window optimization, this practice discourages the agent from inspecting potentially dangerous logic before execution.
Recommendations
- AI detected serious security threats
Audit Metadata