webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
  • COMMAND_EXECUTION (HIGH): The script 'scripts/with_server.py' uses 'subprocess.Popen' with 'shell=True' and 'subprocess.run' to execute strings provided via command-line arguments. The 'SKILL.md' file guides the agent to pass arbitrary shell strings (e.g., 'npm run dev', 'cd backend && python server.py') to these functions, creating a direct command injection surface.
  • REMOTE_CODE_EXECUTION (HIGH): The skill is designed for the agent to dynamically generate and execute Python scripts using Playwright. This provides a high-capability environment for remote code execution.
  • OBFUSCATION (MEDIUM): 'SKILL.md' contains a directive ('DO NOT read the source until you try running the script first') that discourages security auditing by the agent and promotes the execution of unverified 'black-box' code, which is a significant security anti-pattern.
  • PROMPT_INJECTION (LOW): The 'Reconnaissance-Then-Action' workflow identifies a surface for indirect prompt injection (Category 8). Mandatory Evidence Chain: 1. Ingestion points: 'page.content()' and 'page.locator().all()' in 'examples/element_discovery.py' and 'SKILL.md'. 2. Boundary markers: Absent. 3. Capability inventory: Arbitrary shell execution via 'subprocess', file system access, and network operations via Playwright. 4. Sanitization: Absent. Malicious DOM content could be used to manipulate the agent's logic during the reconnaissance phase.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 17, 2026, 04:55 PM