computer-use
Computer Use Skill
Step 1: Environment Setup
Before using Computer Use, ensure proper sandboxed environment:
-
Docker Container (Recommended):
# Use Anthropic's reference container docker run -it --rm \ -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \ -p 5900:5900 -p 8501:8501 \ ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest -
Virtual Machine: Dedicated VM with minimal privileges and isolated network
-
Never run on host machine with access to sensitive data or credentials
Step 2: Tool Configuration
Configure the computer use tool with display settings:
const computerTool = {
type: 'computer_20250124', // or "computer_20251124" for Opus 4.5
name: 'computer',
display_width_px: 1024,
display_height_px: 768,
display_number: 1,
};
Resolution Guidelines:
- XGA (1024x768): Default, works well for most tasks
- WXGA (1280x800): Better for wide content
- 1920x1080: Only if needed, may reduce accuracy
Step 3: Agent Loop Implementation
The core pattern for computer use is an agent loop:
async function computerUseAgentLoop(task, maxIterations = 50) {
const messages = [{ role: 'user', content: task }];
for (let i = 0; i < maxIterations; i++) {
// 1. Call Claude with computer tool
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
tools: [computerTool],
messages,
betas: ['computer-use-2025-01-24'],
});
// 2. Check if task complete
if (response.stop_reason === 'end_turn') {
return extractFinalResult(response);
}
// 3. Process tool use requests
const toolResults = [];
for (const block of response.content) {
if (block.type === 'tool_use' && block.name === 'computer') {
const result = await executeComputerAction(block.input);
toolResults.push({
type: 'tool_result',
tool_use_id: block.id,
content: result,
});
}
}
// 4. Add assistant response and tool results
messages.push({ role: 'assistant', content: response.content });
messages.push({ role: 'user', content: toolResults });
}
throw new Error('Max iterations reached');
}
Step 4: Action Execution
Execute computer actions based on Claude's requests:
async function executeComputerAction(input) {
const { action, coordinate, text, scroll_direction, scroll_amount } = input;
switch (action) {
case 'screenshot':
return await captureScreenshot();
case 'left_click':
await click(coordinate[0], coordinate[1]);
return await captureScreenshot();
case 'type':
await typeText(text);
return await captureScreenshot();
case 'key':
await pressKey(text);
return await captureScreenshot();
case 'mouse_move':
await moveMouse(coordinate[0], coordinate[1]);
return await captureScreenshot();
case 'scroll':
await scroll(coordinate, scroll_direction, scroll_amount);
return await captureScreenshot();
case 'left_click_drag':
await drag(input.start_coordinate, coordinate);
return await captureScreenshot();
case 'wait':
await sleep(input.duration * 1000);
return await captureScreenshot();
default:
throw new Error(`Unknown action: ${action}`);
}
}
Step 5: Coordinate Scaling
When display resolution differs from tool configuration:
function scaleCoordinates(x, y, fromWidth, fromHeight, toWidth, toHeight) {
return [Math.round((x * toWidth) / fromWidth), Math.round((y * toHeight) / fromHeight)];
}
// Example: Scale from 1024x768 to actual 1920x1080
const [scaledX, scaledY] = scaleCoordinates(
500,
400, // Claude's coordinates
1024,
768, // Tool configuration
1920,
1080 // Actual display
);
</execution_process>
<best_practices>
-
Sandboxed Execution: ALWAYS run in Docker container or VM with minimal privileges. Never grant access to sensitive data, authentication credentials, or unrestricted internet.
-
Human Confirmation: Implement human-in-the-loop confirmation for meaningful actions like form submissions, file deletions, or external communications.
-
Prompt Injection Protection: Be aware that malicious content in screenshots can attempt to manipulate Claude. Validate actions against the original task.
-
Resolution Consistency: Keep display resolution consistent throughout a session. XGA (1024x768) provides best balance of accuracy and visibility.
-
Screenshot After Actions: Always return a screenshot after each action so Claude can verify the result and determine next steps.
-
Error Recovery: Implement graceful error handling. If an action fails, capture screenshot and let Claude decide how to proceed.
-
Rate Limiting: Add delays between rapid actions to allow UI to update. Use the
waitaction when needed. -
Beta Headers: Always include the appropriate beta header for your model version.
</best_practices>
<code_example> Complete Agent Loop Example (Node.js)
const Anthropic = require('@anthropic-ai/sdk');
const anthropic = new Anthropic();
// Tool configuration
const computerTool = {
type: 'computer_20250124',
name: 'computer',
display_width_px: 1024,
display_height_px: 768,
display_number: 1,
};
// Main agent loop
async function runComputerUseTask(task) {
console.log(`Starting task: ${task}`);
const messages = [{ role: 'user', content: task }];
let iterations = 0;
const maxIterations = 50;
while (iterations < maxIterations) {
iterations++;
console.log(`Iteration ${iterations}`);
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
tools: [computerTool],
messages,
betas: ['computer-use-2025-01-24'],
});
// Check for completion
if (response.stop_reason === 'end_turn') {
const textBlocks = response.content.filter(b => b.type === 'text');
return textBlocks.map(b => b.text).join('\n');
}
// Process tool calls
const toolResults = [];
for (const block of response.content) {
if (block.type === 'tool_use' && block.name === 'computer') {
console.log(`Action: ${block.input.action}`);
// Execute action and get screenshot
const screenshot = await executeAction(block.input);
toolResults.push({
type: 'tool_result',
tool_use_id: block.id,
content: [
{
type: 'image',
source: {
type: 'base64',
media_type: 'image/png',
data: screenshot,
},
},
],
});
}
}
messages.push({ role: 'assistant', content: response.content });
messages.push({ role: 'user', content: toolResults });
}
throw new Error('Task did not complete within iteration limit');
}
// Example usage
runComputerUseTask('Open the calculator app and compute 25 * 47')
.then(result => console.log('Result:', result))
.catch(err => console.error('Error:', err));
</code_example>
<code_example> Python Implementation
import anthropic
import base64
from typing import Any
client = anthropic.Anthropic()
COMPUTER_TOOL = {
"type": "computer_20250124",
"name": "computer",
"display_width_px": 1024,
"display_height_px": 768,
"display_number": 1
}
def run_computer_use_task(task: str, max_iterations: int = 50) -> str:
"""Execute a computer use task with agent loop."""
messages = [{"role": "user", "content": task}]
for iteration in range(max_iterations):
print(f"Iteration {iteration + 1}")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=[COMPUTER_TOOL],
messages=messages,
betas=["computer-use-2025-01-24"]
)
# Check for completion
if response.stop_reason == "end_turn":
text_blocks = [b.text for b in response.content if b.type == "text"]
return "\n".join(text_blocks)
# Process tool calls
tool_results = []
for block in response.content:
if block.type == "tool_use" and block.name == "computer":
print(f"Action: {block.input['action']}")
# Execute action in your environment
screenshot_b64 = execute_action(block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": [{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": screenshot_b64
}
}]
})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
raise Exception("Max iterations reached")
def execute_action(action_input: dict[str, Any]) -> str:
"""Execute computer action and return screenshot as base64."""
action = action_input["action"]
# Implement using your automation framework
# (pyautogui, pynput, or container-specific tools)
if action == "screenshot":
pass # Just capture
elif action == "left_click":
x, y = action_input["coordinate"]
# click(x, y)
elif action == "type":
text = action_input["text"]
# type_text(text)
elif action == "key":
key = action_input["text"]
# press_key(key)
# ... handle other actions
# Capture and return screenshot
return capture_screenshot_base64()
</code_example>
<usage_example> Available Actions Reference
// Basic actions (all versions)
{ "action": "screenshot" }
{ "action": "left_click", "coordinate": [500, 300] }
{ "action": "type", "text": "Hello, world!" }
{ "action": "key", "text": "ctrl+s" }
{ "action": "mouse_move", "coordinate": [500, 300] }
// Enhanced actions (computer_20250124)
{ "action": "scroll", "coordinate": [500, 400], "scroll_direction": "down", "scroll_amount": 3 }
{ "action": "left_click_drag", "start_coordinate": [100, 100], "coordinate": [300, 300] }
{ "action": "right_click", "coordinate": [500, 300] }
{ "action": "middle_click", "coordinate": [500, 300] }
{ "action": "double_click", "coordinate": [500, 300] }
{ "action": "triple_click", "coordinate": [500, 300] }
{ "action": "left_mouse_down", "coordinate": [500, 300] }
{ "action": "left_mouse_up", "coordinate": [500, 300] }
{ "action": "hold_key", "text": "shift", "duration": 1.0 }
{ "action": "wait", "duration": 2.0 }
// Opus 4.5 only (computer_20251124 with enable_zoom: true)
{ "action": "zoom", "coordinate": [500, 300], "zoom_direction": "in", "zoom_amount": 2 }
</usage_example>
<usage_example> Docker Reference Container
# Pull and run the Anthropic reference container
docker pull ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest
# Run with API key
docker run -it --rm \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
-v $(pwd)/output:/home/computeruse/output \
-p 5900:5900 \
-p 8501:8501 \
-p 6080:6080 \
-p 8080:8080 \
ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest
# Access points:
# - VNC: localhost:5900 (password: "secret")
# - noVNC web: http://localhost:6080/vnc.html
# - Streamlit UI: http://localhost:8501
# - API: http://localhost:8080
</usage_example>
Tool Versions
| Version | Beta Header | Model Support | Key Features |
|---|---|---|---|
computer_20250124 |
computer-use-2025-01-24 |
Sonnet, Haiku | Enhanced actions (scroll, drag, wait, etc.) |
computer_20251124 |
computer-use-2025-11-24 |
Opus 4.5 only | Zoom action (requires enable_zoom: true) |
Security Requirements
CRITICAL: Sandboxing is Mandatory
Computer Use provides direct control over a computer environment. NEVER run without proper sandboxing:
- Use dedicated containers/VMs - Never on host machines with sensitive data
- Minimal privileges - No root access, limited filesystem access
- Network isolation - Restrict or block internet access
- No credentials - Never expose API keys, passwords, or tokens in the environment
- Human oversight - Require confirmation for destructive or external actions
Prompt Injection Risks
Malicious content displayed on screen can attempt to manipulate Claude:
- Validate that actions align with the original task
- Implement allowlists for permitted applications/websites
- Monitor for suspicious instruction patterns in screenshots
Error Handling
| Error | Cause | Resolution |
|---|---|---|
invalid_request_error |
Missing beta header | Add betas: ["computer-use-2025-01-24"] |
tool_use_error |
Invalid coordinates | Ensure coordinates within display bounds |
rate_limit_error |
Too many requests | Implement exponential backoff |
| Action has no effect | UI not ready | Add wait action before retrying |
| Wrong element clicked | Coordinate drift | Re-capture screenshot and recalculate |
Integration with Agents
Primary Agents
- developer: Automated testing, UI verification
- qa: End-to-end testing, visual regression
- devops-troubleshooter: System debugging, log inspection
Use Cases
- Automated form filling and data entry
- Application testing and QA automation
- Desktop application interaction
- Browser automation (when headless won't work)
- Legacy system integration
- Visual verification and screenshots
Memory Protocol (MANDATORY)
Before starting:
cat .claude/context/memory/learnings.md
Check for:
- Previous computer use configurations
- Known automation patterns
- Environment-specific settings
After completing:
- New pattern discovered ->
.claude/context/memory/learnings.md - Security concern found ->
.claude/context/memory/issues.md - Architecture decision ->
.claude/context/memory/decisions.md
ASSUME INTERRUPTION: If it's not in memory, it didn't happen.
Related
- Anthropic Documentation: https://docs.anthropic.com/en/docs/build-with-claude/computer-use
- Reference Implementation: https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo
- API Reference: https://docs.anthropic.com/en/api/computer-use