gemini-computer-use
Gemini Computer Use
Quick start
-
Source the env file and set your API key:
cp env.example env.sh $EDITOR env.sh source env.sh -
Create a virtual environment and install dependencies:
python -m venv .venv source .venv/bin/activate pip install google-genai playwright playwright install chromium -
Run the agent script with a prompt:
python scripts/computer_use_agent.py \ --prompt "Find the latest blog post title on example.com" \ --start-url "https://example.com" \ --turn-limit 6
Browser selection
- Default: Playwright's bundled Chromium (no env vars required).
- Choose a channel (Chrome/Edge) with
COMPUTER_USE_BROWSER_CHANNEL. - Use a custom Chromium-based executable (e.g., Brave) with
COMPUTER_USE_BROWSER_EXECUTABLE.
If both are set, COMPUTER_USE_BROWSER_EXECUTABLE takes precedence.
Core workflow (agent loop)
- Capture a screenshot and send the user goal + screenshot to the model.
- Parse
function_callactions in the response. - Execute each action in Playwright.
- If a
safety_decisionisrequire_confirmation, prompt the user before executing. - Send
function_responseobjects containing the latest URL + screenshot. - Repeat until the model returns only text (no actions) or you hit the turn limit.
Operational guidance
- Run in a sandboxed browser profile or container.
- Use
--excludeto block risky actions you do not want the model to take. - Keep the viewport at 1440x900 unless you have a reason to change it.
Resources
- Script:
scripts/computer_use_agent.py - Reference notes:
references/google-computer-use.md - Env template:
env.example
More from sarfraznawaz2005/agent-skills-collection
autohotkey-v2-gui
AutoHotkey v2 GUI development for advanced applications. Use when creating windows, handling events, optimizing performance, or working with controls like ListView, ComboBox, CheckBox. Covers event handling, data submission, positioning, and common GUI patterns.
65neutralinojs
Lightweight cross-platform desktop application framework for JavaScript, HTML, and CSS. Provides native OS operations, window management, filesystem access, and extensibility via extensions. Alternative to Electron with minimal bundle size.
12weather
Get current weather and forecasts (no API key required).
2domain-name-brainstormer
Generates creative domain name ideas for your project and checks availability across multiple TLDs (.com, .io, .dev, .ai, etc.). Saves hours of brainstorming and manual checking.
2opentui
Comprehensive OpenTUI skill for building terminal user interfaces. Covers the core imperative API, React reconciler, and Solid reconciler. Use for any TUI development task including components, layout, keyboard handling, animations, and testing.
2planner
>
2