pi-computer-use
pi-computer-use
Skill by ara.so — Daily 2026 Skills collection.
pi-computer-use gives Pi agents a semantic computer-use surface for visible macOS windows. It prefers Accessibility (AX) targets (like @e1) over raw coordinates, returns semantic state after every action, and attaches screenshots only when AX coverage is too weak.
Installation
Via Pi (recommended)
pi install git:github.com/injaneity/pi-computer-use#v0.2.1
Pin to a specific version:
pi install -l git:github.com/injaneity/pi-computer-use#v0.2.1
Via npm
npm install @injaneity/pi-computer-use
# or pin a version
npm install @injaneity/pi-computer-use@0.2.1
Remove
pi remove git:github.com/injaneity/pi-computer-use#v0.2.1
npm remove @injaneity/pi-computer-use
First-Run Permissions
On first session, macOS will prompt for permissions for:
~/.pi/agent/helpers/pi-computer-use/bridge
Grant both:
- Accessibility — required for AX ref targeting
- Screen Recording — required for screenshots
How It Works
Three components:
- Pi extension (
extensions/computer-use.ts) — registers public tools and/computer-usecommand - TypeScript bridge (
src/bridge.ts) — manages window state, AX refs, fallback policy, batching, execution metadata - Native Swift helper (
native/macos/bridge.swift) — talks to macOS Accessibility, ScreenCaptureKit, AppKit, CoreGraphics
Available Tools
| Tool | Purpose |
|---|---|
list_apps |
List running apps |
list_windows |
List windows for an app |
screenshot |
Capture window + return AX state |
click |
Click element or coordinate |
double_click |
Double-click element or coordinate |
move_mouse |
Move cursor |
drag |
Drag from point to point |
scroll |
Scroll element or coordinate |
keypress |
Press key combination |
type_text |
Type raw text |
set_text |
Replace element value via AX |
wait |
Pause execution |
arrange_window |
Position/resize window |
computer_actions |
Batch multiple actions |
Core Workflow
Always start a session with screenshot to select the controlled window and obtain AX refs:
// 1. Discover apps and windows if target is ambiguous
list_apps()
list_windows({ app: "Safari" })
// 2. Select the window and get AX state
screenshot({ window: "@w1" })
// 3. Act on AX refs returned from screenshot
click({ window: "@w1", ref: "@e1" })
set_text({ ref: "@e2", text: "https://example.com" })
keypress({ keys: ["Enter"] })
AX Ref Targeting (Preferred)
AX refs like @e1, @e2 are returned by screenshot and carry capability metadata:
canSetValue— supportsset_textcanPress— supportsclickcanFocus— can receive focuscanScroll— supportsscrolladjust— supports value adjustment
// Click by AX ref — no coordinates needed
click({ ref: "@e1" })
// Scroll a specific element
scroll({ ref: "@e3", scrollY: 600 })
// Replace text field value atomically
set_text({ ref: "@e2", text: "hello world" })
Coordinate Fallback
Use coordinates only when no suitable AX target exists. Always include stateId from the latest screenshot to guard against stale state:
click({ x: 320, y: 180, stateId: "abc123" })
Batching Actions
Use computer_actions to batch obvious sequential steps. One semantic state update is returned after all actions:
computer_actions({
stateId: "abc123",
actions: [
{ type: "click", ref: "@e1" },
{ type: "set_text", ref: "@e2", text: "https://example.com" },
{ type: "keypress", keys: ["Enter"] }
]
})
Each action in the result includes execution metadata:
stealth— background-safe AX path (no focus takeover)default— required focus or raw event fallback
Window Management
// List windows for a specific app
list_windows({ app: "Finder" })
// Target a specific window in all subsequent calls
screenshot({ window: "@w2" })
// Arrange window by preset
arrange_window({ window: "@w1", preset: "left-half" })
// Arrange window with explicit frame
arrange_window({ window: "@w1", frame: { x: 0, y: 0, width: 1280, height: 800 } })
Screenshot Modes
Control when screenshots are attached with the image option:
screenshot({ window: "@w1", image: "auto" }) // default: attach when AX coverage is weak
screenshot({ window: "@w1", image: "always" }) // always attach
screenshot({ window: "@w1", image: "never" }) // never attach, AX state only
Common Patterns
Open URL in Safari
list_windows({ app: "Safari" })
screenshot({ window: "@w1" })
// @e1 = address bar (from AX state)
set_text({ ref: "@e1", text: "https://example.com" })
keypress({ keys: ["Enter"] })
Fill a Form
screenshot({ window: "@w1" })
// Use refs from AX state
set_text({ ref: "@e3", text: "Jane Doe" })
set_text({ ref: "@e4", text: "jane@example.com" })
click({ ref: "@e5" }) // Submit button
Keyboard Shortcut
keypress({ keys: ["Cmd", "T"] }) // New tab
keypress({ keys: ["Cmd", "Shift", "N"] }) // New incognito window
keypress({ keys: ["Escape"] })
Scroll a Page
scroll({ ref: "@e2", scrollY: 800 }) // Scroll element down
scroll({ ref: "@e2", scrollY: -400 }) // Scroll up
Drag and Drop
drag({ fromX: 100, fromY: 200, toX: 400, toY: 200 })
Strict AX Mode (Stealth / Background-Safe)
Enable strict AX mode to prevent focus changes, raw pointer events, raw keyboard events, and cursor takeover. All actions must succeed via background-safe AX paths:
// Via config (see Configuration section)
// Actions will report `stealth` in execution metadata when successful
Strict mode errors will surface if an action requires foreground focus and strict mode is active.
Configuration
Inspect effective config in Pi:
/computer-use
Config can be set via config files or environment variable overrides. Key options:
| Option | Description |
|---|---|
image |
"auto" | "always" | "never" — screenshot attachment mode |
strictAX |
Enable background-safe strict AX mode |
browser |
Browser-aware targeting preference |
See docs/configuration.md for full config file format and environment variable overrides.
Development
# Install dependencies
npm install
# Run checks
npm test
# Run local checkout without loading installed copy
pi --no-extensions -e .
Benchmarks
# Default QA benchmark
npm run benchmark:qa
# Full benchmark (may open apps)
npm run benchmark:qa:full
See benchmarks/README.md for metrics, regression policy, and comparison workflow.
Troubleshooting
Permissions not granted
Re-run and grant both Accessibility and Screen Recording to:
~/.pi/agent/helpers/pi-computer-use/bridge
On macOS, go to System Settings → Privacy & Security → Accessibility and Screen Recording.
AX refs are stale
Take a fresh screenshot to get updated stateId and new refs before acting. Stale-action detection uses stateId to reject outdated coordinates or refs.
Browser window not targeted correctly
Use list_windows({ app: "Safari" }) (or Chrome/Firefox) first, then explicitly pass window: "@wN" to screenshot and subsequent actions.
Strict AX mode errors
An action failed to complete via background-safe AX path. Either disable strict mode or identify an AX ref with canPress/canSetValue that supports the background path.
Helper not found
Ensure Pi installed the native helper:
ls ~/.pi/agent/helpers/pi-computer-use/bridge
If missing, reinstall: pi install git:github.com/injaneity/pi-computer-use#v0.2.1
Key Concepts
- AX refs (
@e1,@e2, …) — semantic element handles from macOS Accessibility API, stable within a state - Window refs (
@w1,@w2, …) — stable handles fromlist_windows - stateId — opaque ID from the latest screenshot; attach to coordinate-based actions to detect stale state
- stealth execution — action completed via AX without foregrounding the app or moving the real cursor
- semantic state — structured AX tree returned after every action, used instead of screenshots when coverage is sufficient