webapp-testing
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION] (HIGH): The script
scripts/with_server.pyusessubprocess.Popen(server['cmd'], shell=True)where the command is derived directly from the--serverargument. This allows for arbitrary command injection if the input to the script is not strictly controlled. - [REMOTE_CODE_EXECUTION] (HIGH): The skill's primary function is to have the agent write and execute new Python scripts based on external web content. This 'generate-and-execute' loop is a high-risk capability that can be hijacked by malicious data.
- [PROMPT_INJECTION] (HIGH): The instructions in
SKILL.mduse a deceptive pattern by explicitly telling the agent: 'Do not read the source until you know a customized solution is absolutely necessary... These scripts... are there to be called directly as black box scripts.' This discourages the AI from performing safety self-inspections of the code it executes. - [PROMPT_INJECTION] (HIGH): Category 8: Indirect Prompt Injection Vulnerability.
- Ingestion points:
element_discovery.pyandconsole_logging.pyingest untrusted data viapage.content(),page.on("console"), andbutton.inner_text(). - Boundary markers: None. The skill does not provide any delimiters or instructions to ignore commands found within the web pages being tested.
- Capability inventory: The agent is authorized to use
subprocess.run(viawith_server.py) and write/execute arbitrary Playwright scripts on the local system. - Sanitization: None. The agent is encouraged to use discovered selectors and content directly to formulate subsequent automation logic.
Recommendations
- AI detected serious security threats
Audit Metadata