computer-use
SKILL.md
Computer Use Skill
Full desktop GUI control for headless Linux servers. Creates a virtual display (Xvfb + XFCE) so you can run and control desktop applications on VPS/cloud instances without a physical monitor.
Environment
- Display:
:99 - Resolution: 1024x768 (XGA, Anthropic recommended)
- Desktop: XFCE4
Quick Start
export DISPLAY=:99
# Take screenshot
./scripts/screenshot.sh
# Click at coordinates
./scripts/click.sh 512 384 left
# Type text
./scripts/type_text.sh "Hello world"
# Press key combo
./scripts/key.sh "ctrl+s"
# Scroll down
./scripts/scroll.sh down 5
Actions Reference
| Action | Script | Arguments | Description |
|---|---|---|---|
| screenshot | screenshot.sh |
— | Capture screen → base64 PNG |
| cursor_position | cursor_position.sh |
— | Get current mouse X,Y |
| mouse_move | mouse_move.sh |
x y | Move mouse to coordinates |
| left_click | click.sh |
x y left | Left click at coordinates |
| right_click | click.sh |
x y right | Right click |
| middle_click | click.sh |
x y middle | Middle click |
| double_click | click.sh |
x y double | Double click |
| triple_click | click.sh |
x y triple | Triple click (select line) |
| left_click_drag | drag.sh |
x1 y1 x2 y2 | Drag from start to end |
| left_mouse_down | mouse_down.sh |
— | Press mouse button |
| left_mouse_up | mouse_up.sh |
— | Release mouse button |
| type | type_text.sh |
"text" | Type text (50 char chunks, 12ms delay) |
| key | key.sh |
"combo" | Press key (Return, ctrl+c, alt+F4) |
| hold_key | hold_key.sh |
"key" secs | Hold key for duration |
| scroll | scroll.sh |
dir amt [x y] | Scroll up/down/left/right |
| wait | wait.sh |
seconds | Wait then screenshot |
| zoom | zoom.sh |
x1 y1 x2 y2 | Cropped region screenshot |
Workflow Pattern
- Screenshot — Always start by seeing the screen
- Analyze — Identify UI elements and coordinates
- Act — Click, type, scroll
- Screenshot — Verify result
- Repeat
Tips
- Screen is 1024x768, origin (0,0) at top-left
- Click to focus before typing in text fields
- Use
ctrl+Endto jump to page bottom in browsers - Most actions auto-screenshot after 2 sec delay
- Long text is chunked (50 chars) with 12ms keystroke delay
System Services
# Services auto-start on boot
sudo systemctl status virtual-desktop # Xvfb on :99
sudo systemctl status xfce-desktop # XFCE session
# Manual restart if needed
sudo systemctl restart virtual-desktop xfce-desktop
Opening Applications
export DISPLAY=:99
chromium-browser --no-sandbox & # Web browser
xfce4-terminal & # Terminal
thunar & # File manager
Requirements
System packages (install once):
sudo apt install -y xvfb xfce4 xfce4-terminal xdotool scrot imagemagick dbus-x11 chromium-browser
Weekly Installs
5
Repository
openclaw/skillsGitHub Stars
3.8K
First Seen
Feb 3, 2026
Installed on
openclaw5
gemini-cli3
amp1
cline1
opencode1
cursor1