kernel-computer-controls
SKILL.md
Computer Controls
OS-level mouse, keyboard, and screen control for precise browser interaction.
When to Use
Computer controls provide OS-level interaction with the browser VM, operating at the system level rather than through browser APIs. Use computer controls when:
- Playwright/CDP isn't sufficient: When you need to interact with elements that are difficult to target via DOM selectors
- Testing real user behavior: Simulating actual mouse movements, clicks, and keyboard input as a human would perform them
- Interacting with browser UI: Clicking on browser chrome elements (address bar, extensions, menus) that aren't accessible via page automation
- Handling complex interactions: Drag-and-drop operations, hover effects, or interactions that require precise coordinate-based control
- Capturing visual output: Taking screenshots of the entire browser viewport or specific screen regions
- Working with canvas/WebGL: Interacting with canvas elements or games where DOM-based automation doesn't work
- Bypassing automation detection: Some anti-bot systems detect Playwright/CDP usage but not OS-level input
When NOT to use computer controls:
- For standard web automation (form filling, clicking buttons) - use
kernel browsers playwright executeinstead for better reliability and speed - When you need to access page content or execute JavaScript - use Playwright execution
- For headless browsers - computer controls require a GUI environment
Prerequisites
Load the kernel-cli skill for Kernel CLI installation and authentication.
Screenshots
Full Screenshot
kernel browsers computer screenshot <session_id> --to screenshot.png
Region Screenshot
kernel browsers computer screenshot <session_id> --to region.png --x 0 --y 0 --width 800 --height 600
Mouse Actions
Click
# Left click
kernel browsers computer click-mouse <session_id> --x 100 --y 200
# Right click (double)
kernel browsers computer click-mouse <session_id> --x 100 --y 200 --button right --num-clicks 2
Move
kernel browsers computer move-mouse <session_id> --x 500 --y 300
Drag
kernel browsers computer drag-mouse <session_id> --point 100,200 --point 200,300 --button left
Scroll
kernel browsers computer scroll <session_id> --x 300 --y 400 --delta-y 120
Keyboard Actions
Type Text
# Fast typing
kernel browsers computer type <session_id> --text "Hello, World!"
# Slow typing (100ms delay between chars)
kernel browsers computer type <session_id> --text "Slow typing" --delay 100
Press Keys
Key names follow X11 keysym definitions. Common keys include: Return, Tab, Escape, BackSpace, Delete, Home, End, Page_Up, Page_Down, Up, Down, Left, Right, Shift_L, Control_L, Alt_L, etc.
# Single key
kernel browsers computer press-key <session_id> --key Return
# Key combination
kernel browsers computer press-key <session_id> --key Control_L+t
# Complex combination with held keys
kernel browsers computer press-key <session_id> --key Control_L+Shift_L+Tab --hold-key Alt_L
Common Pattern: Navigate and Screenshot
SESSION=$(kernel browsers create -o json | jq -r '.session_id')
# Navigate using Playwright
kernel browsers playwright execute $SESSION 'await page.goto("https://kernel.sh")'
# Take screenshot
kernel browsers computer screenshot $SESSION --to kernel-homepage.png
# Cleanup
kernel browsers delete $SESSION --yes
MCP Tool: Use kernel:execute_playwright_code and kernel:take_screenshot for playwright execution and screenshots.
Weekly Installs
5
Repository
kernel/skillsGitHub Stars
3
First Seen
Jan 26, 2026
Security Audits
Installed on
opencode4
claude-code4
gemini-cli3
kode2
moltbot2
zencoder2