webapp-testing

Fail

Audited by Gen Agent Trust Hub on Mar 1, 2026

Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The script scripts/with_server.py is designed to execute arbitrary shell commands provided via the --server argument using subprocess.Popen with shell=True. It also executes a secondary command provided as a positional argument using subprocess.run. This pattern allows an attacker who can influence the agent's parameters to execute unauthorized commands on the host system.
  • [PROMPT_INJECTION]: The SKILL.md file contains instructions that explicitly tell the agent not to read the source code of the scripts before running them ("DO NOT read the source until you try running the script first"). This discourages the model from performing security checks on the dangerous command execution logic within the helper scripts.
  • [PROMPT_INJECTION]: The skill is highly vulnerable to indirect prompt injection because its primary function is to scrape and interact with web application data which is then processed by the agent.
  • Ingestion points: Data is ingested from the browser via page.content(), button.inner_text(), and browser console logs in examples/element_discovery.py and examples/console_logging.py.
  • Boundary markers: There are no boundary markers or specific instructions to ignore malicious content found within the web pages or logs being tested.
  • Capability inventory: The agent has access to powerful capabilities, including arbitrary shell execution via scripts/with_server.py and the ability to write files to the local filesystem.
  • Sanitization: No sanitization or validation is performed on the data retrieved from the web browser before it is used by the agent to make decisions or identify selectors.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Mar 1, 2026, 05:30 PM