deep-research

Warn

Audited by Gen Agent Trust Hub on Apr 8, 2026

Risk Level: MEDIUMEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill exhibits a command injection vulnerability in the way it dispatches background tasks. In Phase 2, the Lead Researcher is instructed to dispatch the gemini-research.sh script by inlining the research query directly into a Bash command within double quotes. A specially crafted query containing shell metacharacters (e.g., semicolons, backticks, or command substitutions) could allow for the execution of arbitrary commands in the agent's environment.
  • [EXTERNAL_DOWNLOADS]: The skill dynamically downloads and executes the chromux browser tool from the vendor's NPM registry using npx. It also relies on the gemini-cli from Google, which is a well-known service. While these are vendor-provided or trusted tools, they involve external code execution at runtime.
  • [PROMPT_INJECTION]: The skill possesses an indirect prompt injection surface as it ingests large amounts of untrusted data from the web via search results and browser snapshots.
  • Ingestion points: Web results from WebSearch, full page content from WebFetch, and browser state snapshots from chromux.
  • Boundary markers: The instructions do not define any delimiters or provide explicit 'ignore' directives to the subagents regarding the content they fetch.
  • Capability inventory: The skill has extensive capabilities, including executing local shell scripts (Bash tool) and spawning new subagents (Agent tool).
  • Sanitization: No sanitization or filtering of the fetched web content is performed before it is processed by the agent or subagents.
  • [PROMPT_INJECTION]: The skill's 'Autopilot mode' (--auto flag) is a significant autonomy risk. When enabled, it instructs the agent to 'Skip ALL user confirmations' and run the entire research pipeline end-to-end. This concealment pattern reduces user oversight during critical operations like tool invocation and file writing.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Apr 8, 2026, 11:01 AM