browser
SKILL.md
Browser Control Skill
Goal
Finish the user’s real task reliably.
Prioritize successful completion and correct results over aggressive call minimization.
Operating Rules
- Start with the intended action directly (
navigate/open/act/evaluate). Do not runstatusas a pre-check. - Use
snapshotonly when refs are required for interaction (click/type/select/drag/scrollIntoView). - Prefer
evaluatefor extraction. Return structured data in one comprehensive call when possible. - Use condition waits by default (
loadState/url→selector/text/textGone→fn). AvoidtimeMsunless explicitly needed. - Before clicking potentially off-screen elements, run
act.scrollIntoViewon the ref first. - Keep context stable: once
targetIdis known, pass it in follow-up calls when supported. - Avoid blind loops: every extra call must have a clear purpose.
Reliability and Recovery
- If
Ref not found, do not reuse stale refs. Take one freshsnapshot, retry once, then stop if still failing. - For repeated failures with the same cause, stop and explain the blocker clearly instead of retrying endlessly.
- Connection recovery is built into the tool. Allow auto-recovery once; if still disconnected, instruct user to install/connect extension.
Screenshot Policy
- Default: no screenshot.
- Use screenshots only when user asks, or when visual proof is required.
- Prefer element screenshots (
reforelement) over full-page screenshots. - Use full-page screenshots only for page-level evidence.
Recommended Flow
- Direct action first (
navigate/openor immediateact/evaluate). - If interaction needs refs, run
snapshot(interactive: truepreferred). - Wait for readiness using
act.waitwith explicit conditions. - Interact (
scrollIntoView→click/type/select/dragas needed). - Extract/verify with
evaluate(preferred) orsnapshot. - Provide screenshot evidence only when necessary.
Connection Handling
Connection recovery is built into the tool. On connection failure, let the tool auto-attach/launch/retry once. If still disconnected, stop and instruct the user to install/connect the extension.
Minimal CLI Usage
Use <BROWSER_TOOL_CMD> for commands:
- macOS/Linux:
~/.wegent-executor/bin/browser-tool - Windows:
~/.wegent-executor/bin/browser-tool.cmd
<BROWSER_TOOL_CMD> '<json>'
Quick Examples
# Navigate directly
<BROWSER_TOOL_CMD> '{"action":"navigate","url":"https://example.com"}'
# Snapshot only when refs are needed
<BROWSER_TOOL_CMD> '{"action":"snapshot","interactive":true}'
# Act on ref
<BROWSER_TOOL_CMD> '{"action":"act","request":{"kind":"click","ref":"e1"}}'
# Ensure element is visible before click (recommended on long pages)
<BROWSER_TOOL_CMD> '{"action":"act","request":{"kind":"scrollIntoView","ref":"e1"}}'
# Condition wait (preferred over fixed sleep)
<BROWSER_TOOL_CMD> '{"action":"act","request":{"kind":"wait","loadState":"domcontentloaded","timeoutMs":15000}}'
# URL-based wait
<BROWSER_TOOL_CMD> '{"action":"act","request":{"kind":"wait","url":"checkout","timeoutMs":10000}}'
# Run JS in page context via act.evaluate (function or expression)
<BROWSER_TOOL_CMD> '{"action":"act","request":{"kind":"evaluate","fn":"() => ({title: document.title, href: location.href})"}}'
# Run JS against a target element ref via act.evaluate
<BROWSER_TOOL_CMD> '{"action":"act","request":{"kind":"evaluate","ref":"e1","fn":"(el) => ({text: el.textContent?.trim() || \"\"})"}}'
# Close current tab (or pass targetId)
<BROWSER_TOOL_CMD> '{"action":"act","request":{"kind":"close"}}'
# Element screenshot (prefer over full-page when only target proof is needed)
<BROWSER_TOOL_CMD> '{"action":"screenshot","ref":"e1","type":"jpeg"}'
# Comprehensive extraction in one evaluate
<BROWSER_TOOL_CMD> '{"action":"evaluate","expression":"(() => ({title:document.title,url:location.href}))()"}'
Weekly Installs
1
Repository
wecode-ai/wegentGitHub Stars
485
First Seen
11 days ago
Security Audits
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1