skills/kernel/skills/kernel-computer-controls

kernel-computer-controls

SKILL.md

Computer Controls

OS-level mouse, keyboard, and screen control for precise browser interaction.

When to Use

Computer controls provide OS-level interaction with the browser VM, operating at the system level rather than through browser APIs. Use computer controls when:

  • Playwright/CDP isn't sufficient: When you need to interact with elements that are difficult to target via DOM selectors
  • Testing real user behavior: Simulating actual mouse movements, clicks, and keyboard input as a human would perform them
  • Interacting with browser UI: Clicking on browser chrome elements (address bar, extensions, menus) that aren't accessible via page automation
  • Handling complex interactions: Drag-and-drop operations, hover effects, or interactions that require precise coordinate-based control
  • Capturing visual output: Taking screenshots of the entire browser viewport or specific screen regions
  • Working with canvas/WebGL: Interacting with canvas elements or games where DOM-based automation doesn't work
  • Bypassing automation detection: Some anti-bot systems detect Playwright/CDP usage but not OS-level input

When NOT to use computer controls:

  • For standard web automation (form filling, clicking buttons) - use kernel browsers playwright execute instead for better reliability and speed
  • When you need to access page content or execute JavaScript - use Playwright execution
  • For headless browsers - computer controls require a GUI environment

Prerequisites

Load the kernel-cli skill for Kernel CLI installation and authentication.

Screenshots

Full Screenshot

kernel browsers computer screenshot <session_id> --to screenshot.png

Region Screenshot

kernel browsers computer screenshot <session_id> --to region.png --x 0 --y 0 --width 800 --height 600

Mouse Actions

Click

# Left click
kernel browsers computer click-mouse <session_id> --x 100 --y 200

# Right click (double)
kernel browsers computer click-mouse <session_id> --x 100 --y 200 --button right --num-clicks 2

Move

kernel browsers computer move-mouse <session_id> --x 500 --y 300

Drag

kernel browsers computer drag-mouse <session_id> --point 100,200 --point 200,300 --button left

Scroll

kernel browsers computer scroll <session_id> --x 300 --y 400 --delta-y 120

Keyboard Actions

Type Text

# Fast typing
kernel browsers computer type <session_id> --text "Hello, World!"

# Slow typing (100ms delay between chars)
kernel browsers computer type <session_id> --text "Slow typing" --delay 100

Press Keys

Key names follow X11 keysym definitions. Common keys include: Return, Tab, Escape, BackSpace, Delete, Home, End, Page_Up, Page_Down, Up, Down, Left, Right, Shift_L, Control_L, Alt_L, etc.

# Single key
kernel browsers computer press-key <session_id> --key Return

# Key combination
kernel browsers computer press-key <session_id> --key Control_L+t

# Complex combination with held keys
kernel browsers computer press-key <session_id> --key Control_L+Shift_L+Tab --hold-key Alt_L

Common Pattern: Navigate and Screenshot

SESSION=$(kernel browsers create -o json | jq -r '.session_id')

# Navigate using Playwright
kernel browsers playwright execute $SESSION 'await page.goto("https://kernel.sh")'

# Take screenshot
kernel browsers computer screenshot $SESSION --to kernel-homepage.png

# Cleanup
kernel browsers delete $SESSION --yes

MCP Tool: Use kernel:execute_playwright_code and kernel:take_screenshot for playwright execution and screenshots.

Weekly Installs
5
Repository
kernel/skills
GitHub Stars
3
First Seen
Jan 26, 2026
Installed on
opencode4
claude-code4
gemini-cli3
kode2
moltbot2
zencoder2