webapp-testing

Fail

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
  • COMMAND_EXECUTION (HIGH): The script scripts/with_server.py uses subprocess.Popen with the shell=True parameter to execute commands provided via the --server argument. This allows for arbitrary shell command execution and is highly vulnerable to shell injection attacks.
  • SOCIAL_ENGINEERING (MEDIUM): The documentation in SKILL.md explicitly instructs the agent to avoid reading the source code of scripts before running them. This is an adversarial pattern intended to prevent the agent from identifying the dangerous execution logic within the helper scripts.
  • INDIRECT_PROMPT_INJECTION (LOW): The skill is designed to browse untrusted web pages and inspect their DOM content, which presents an attack surface where malicious websites could attempt to influence the agent's actions. Evidence Chain: 1. Ingestion point: page.goto() and page.content() in examples/element_discovery.py. 2. Boundary markers: None present. 3. Capability inventory: Arbitrary shell execution via with_server.py and file system writes. 4. Sanitization: No sanitization of web content before it is used to decide on subsequent actions.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 17, 2026, 06:15 PM