webapp-testing

Fail

Audited by Gen Agent Trust Hub on Mar 2, 2026

Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The helper script scripts/with_server.py uses subprocess.Popen with shell=True to execute commands provided via the --server flag.\n
  • The use of shell=True with arguments that can be influenced by the agent or external data allows for arbitrary shell command execution and shell metacharacter injection.\n
  • The script also executes the final user-provided command via subprocess.run(args.command), providing an additional sink for code execution.\n- [PROMPT_INJECTION]: The SKILL.md file contains instructions that explicitly tell the agent to treat its helper scripts as black boxes and avoid reading their source code.\n
  • Evidence: "DO NOT read the source until you try running the script first... They exist to be called directly as black-box scripts rather than ingested into your context window."\n
  • This instruction serves to hide the insecure implementation of the command execution logic in with_server.py from the agent's analysis.\n- [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection attacks due to its design for interacting with untrusted web content.\n
  • Ingestion points: The agent reads untrusted data from web applications via page.content(), page.locator().all(), and page.screenshot() as seen in examples/element_discovery.py and SKILL.md.\n
  • Boundary markers: The skill lacks any boundary markers or instructions to treat web content as untrusted or to ignore instructions embedded within the target application.\n
  • Capability inventory: The agent possesses high-impact capabilities, including arbitrary shell execution (via with_server.py) and filesystem writes (via screenshots and examples/console_logging.py).\n
  • Sanitization: There is no evidence of sanitization, escaping, or validation of the data extracted from the web pages before it is used to drive further agent actions.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Mar 2, 2026, 05:14 PM