webapp-testing
Fail
Audited by Gen Agent Trust Hub on Mar 2, 2026
Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The helper script
scripts/with_server.pyusessubprocess.Popenwithshell=Trueto execute commands provided via the--serverflag.\n - The use of
shell=Truewith arguments that can be influenced by the agent or external data allows for arbitrary shell command execution and shell metacharacter injection.\n - The script also executes the final user-provided command via
subprocess.run(args.command), providing an additional sink for code execution.\n- [PROMPT_INJECTION]: TheSKILL.mdfile contains instructions that explicitly tell the agent to treat its helper scripts as black boxes and avoid reading their source code.\n - Evidence: "DO NOT read the source until you try running the script first... They exist to be called directly as black-box scripts rather than ingested into your context window."\n
- This instruction serves to hide the insecure implementation of the command execution logic in
with_server.pyfrom the agent's analysis.\n- [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection attacks due to its design for interacting with untrusted web content.\n - Ingestion points: The agent reads untrusted data from web applications via
page.content(),page.locator().all(), andpage.screenshot()as seen inexamples/element_discovery.pyandSKILL.md.\n - Boundary markers: The skill lacks any boundary markers or instructions to treat web content as untrusted or to ignore instructions embedded within the target application.\n
- Capability inventory: The agent possesses high-impact capabilities, including arbitrary shell execution (via
with_server.py) and filesystem writes (via screenshots andexamples/console_logging.py).\n - Sanitization: There is no evidence of sanitization, escaping, or validation of the data extracted from the web pages before it is used to drive further agent actions.
Recommendations
- AI detected serious security threats
Audit Metadata