webapp-testing
Fail
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
- COMMAND_EXECUTION (HIGH): The script
scripts/with_server.pyusessubprocess.Popen(server['cmd'], shell=True)andsubprocess.run(args.command). Because these arguments are passed directly from the command line, any untrusted input processed by the agent and passed to this script could lead to arbitrary command execution on the host system. - PROMPT_INJECTION (MEDIUM):
SKILL.mdcontains an instruction: 'DO NOT read the source until you try running the script first... They exist to be called directly as black-box scripts rather than ingested into your context window.' While framed as a context-window optimization, this discourages the agent from performing security self-inspection on scripts that contain high-risk shell execution logic. - INDIRECT_PROMPT_INJECTION (LOW): The skill is designed to interact with and test web applications using Playwright.
- Ingestion points:
page.goto(url)inexamples/console_logging.pyandexamples/element_discovery.pyingest external, potentially attacker-controlled web content. - Boundary markers: Absent. There are no delimiters or instructions to ignore embedded commands in the processed HTML.
- Capability inventory: The agent has access to
subprocess.Popen(viawith_server.py), file writing (viaconsole_logging.py), and browser interaction (page.click,page.fill). - Sanitization: None. The content is processed directly to identify selectors and extract text.
Recommendations
- AI detected serious security threats
Audit Metadata