computer-use-playbook
Computer Use Playbook
Overview
Use this skill for end-to-end computer automation across browser and desktop surfaces. Browser use is a major track, but not the only one. Prefer deterministic methods first, then escalate to visual/native automation only when required. For browser MCP workflows, treat tab_id as a required handle for all stateful actions.
Execution Mode (Default: Lesson-Lock)
When a matching topic already exists under references/learnings/<topic-slug>/lessons.md, run in lesson-lock mode.
- Execute the lesson checklist as written before trying any novel approach.
- Do not create a new topic slug when an existing topic clearly matches.
- Do not publish until all pre-publish lesson gates pass.
- If a lesson step fails, try only fallbacks documented in the same
lessons.mdfirst. - Use
experience-log.mdonly to fill missing detail, not to override lesson rules. - If documented lesson paths fail and the task is not human-gated, run bounded self-learning attempts, then codify the winning pattern.
Precedence Ladder (No Ambiguity)
Always follow this order:
- Reuse existing
lessons.mdfor the resolved topic slug. - If a lesson step fails, use only fallbacks already documented in that same
lessons.md. - If no documented fallback works and no human gate is present, perform bounded self-learning to discover a reliable path.
- Once a reliable path is found, update
lessons.mdandexperience-log.mdso the next run is mechanical. - Request human intervention only for login/2FA/CAPTCHA/security/policy gates or true hard blocks.
Playbook Structure
- Browser use (primary for web tasks): browser MCP tools, DOM snapshots, scripts, screenshots.
- Filesystem use: shell-native operations for deterministic file/process work.
- Native desktop use: coordinate and window automation only when DOM/shell are insufficient.
- Human-in-the-loop checkpoints: login, CAPTCHA, security prompts, or policy-gated steps.
Decision Order
- Identify the active surface: browser page, filesystem/process, or native desktop UI.
- For browser pages, use browser MCP tools first and keep a strict
tab_idcontract. - For filesystem/process work, use shell/system tools first (
rg,ls,find, etc.). - Escalate to vision or native UI automation only when deterministic methods are insufficient.
- If blocked by login, CAPTCHA, or security gates, switch to human-in-the-loop flow.
- Verify each critical step with state checks plus screenshot evidence.
Browser Automation (Major Track)
Use browser tools + DOM-first for browser flows. Avoid jumping to native desktop clicks while the target is still reachable by browser tools.
Preferred sequence:
open_taband capture returnedtab_id.navigate_to(tab_id, url)for explicit page transitions.dom_snapshot(tab_id, ...)orrun_script(tab_id, ...)to identify target.run_script(tab_id, ...)action (click/type/submit).read_page(tab_id, ...)/run_script(tab_id, ...)to verify URL/title/content.screenshot(tab_id, ...)as evidence.
Session behavior guidance:
- always pass
tab_idfornavigate_to,read_page,screenshot,dom_snapshot,run_script, andclose_tab. - never rely on implicit active-tab behavior.
- if a click opens a new tab/window, call
list_tabs, detect the newtab_id, and continue explicitly on thattab_id. - keep a local map of
purpose -> tab_idwhen handling multiple tabs.
Escalation triggers:
- dynamic overlays not stable via selectors,
- canvas/rendered controls,
- consent dialogs where selector path is inconsistent,
- native picker launched from browser (file upload dialog).
Do not overuse fallback:
- if a browser tool can do it, stay in browser tools.
- use native automation only for cross-app boundaries (OS dialogs, non-DOM UI).
File Explorer and Filesystem Automation
Prefer shell-native methods before GUI clicking.
Use shell when possible:
- search files:
rg --files,find - move/copy/rename:
mv,cp,mkdir - inspect metadata:
ls -la,stat
Use native UI only when the workflow is GUI-only:
- OS file picker from browser/app,
- drag-drop interactions not scriptable via API,
- app-specific explorer panes.
Native UI Automation
Use native UI automation for interactions outside application DOM/API.
Typical tools:
xdotoolfor key/click/type,xprop/xwininfofor window targeting.
Guidelines:
- ensure window focus before typing,
- prefer keyboard-driven deterministic paths,
- keep retries bounded and observable,
- re-check application state after each action.
Human-in-the-loop rules
Pause and ask for user intervention when blocked by:
- login/2FA challenges,
- CAPTCHA or anti-bot checkpoints,
- legal/security confirmation screens that require explicit human intent.
When waiting for user action:
- explain exactly what the user must do and where.
- issue an audible notification using
speakso the user notices immediately. - wait, then re-check state (
url,title, element visibility, screenshot) before continuing.
Special Cases
Consent dialogs
- DOM-first click (
Accept all/Reject all/localized variants). - if selector fails but button is visible, use coordinate/native fallback.
- confirm modal is not visible and main interaction path works.
CAPTCHA / anti-bot challenges
- do not attempt bypass logic.
- capture evidence and report blocked state clearly.
- require human-in-the-loop completion.
- notify user with
speakwhen intervention is required.
Login and account security gates
- try normal DOM steps first for username/password field fill and submit.
- if SSO, passkey, device approval, or 2FA requires human action, pause and request user action.
- after user confirms completion, re-snapshot and continue from verified page state.
File uploads
- use DOM file input assignment if available.
- if native picker opens, switch to native UI automation.
- verify upload appears in page/app state.
Verification Standard
Every important step should end with both:
- state evidence (URL/title/content/element state), and
- visual evidence (screenshot path).
If blocked, report:
- attempted method,
- blocker reason,
- evidence collected,
- next safe fallback.
Learning Library Structure
Use references/learnings/ as the canonical knowledge base.
references/learnings/index.md: topic registry and folder convention.references/learnings/general/: cross-task fallback logs.references/learnings/<topic-slug>/: topic-specific lessons and experience log.
Known canonical topic slugs:
x-posting(X / Twitter / Expost publishing)linkedin-posting(LinkedIn posting/comments)google-flowxiaohongshu-posting
Topic folder convention:
lessons.mdfor stable workflow rules.experience-log.mdfor incremental run learnings.
Continuous Learning Loop (Required)
Treat each real run as training data for future runs.
Priority contract:
lessons.mdis the source of truth for execution.experience-log.mdis supporting evidence used to refine or extend lessons.- If lesson and experience conflict, follow
lessons.mdand then update logs/lessons after the run.
Before starting similar work:
- Load
references/learnings/index.md. - Resolve the task to a canonical topic slug:
- X/Twitter/Expost ->
x-posting - LinkedIn/linkedin.com ->
linkedin-posting - Google Flow/labs.google/fx/tools/flow ->
google-flow - Xiaohongshu ->
xiaohongshu-posting
- X/Twitter/Expost ->
- If the canonical topic exists, use it directly and do not create a variant slug.
- Load topic
lessons.mdfirst when present:references/learnings/<topic-slug>/lessons.md
- Load topic
experience-log.mdsecond when present:references/learnings/<topic-slug>/experience-log.md
- Load
references/learnings/general/experience-log.mdonly as fallback context when topic files are missing or incomplete. - If no topic folder exists, create it with
lessons.mdandexperience-log.md, then run bounded self-learning to establish initial reliable lessons.
Before execution:
- Extract an ordered run checklist from topic
lessons.mdwith step IDs. - Execute step-by-step and mark each step pass/fail using state evidence.
- For publish flows, allow one publish action only after all pre-publish gates are passed.
- Use experience logs only to fill gaps not covered by lessons.
During execution:
- Capture failure signal and the exact checklist step where it appears.
- If blocked, apply only documented lesson fallback paths before any new approach.
- Keep one-action-at-a-time execution where UI state is fragile.
- If no documented fallback works and the issue is not human-gated, run bounded self-learning:
- try at most 2-3 alternative deterministic paths,
- verify each attempt with explicit state evidence,
- stop once one reliable path is found.
- If blocked by login/2FA/CAPTCHA/security/policy gates, pause and request human intervention with evidence.
After completion (or meaningful failure):
- Append a short run note to
references/learnings/<topic-slug>/experience-log.md. - Include: date, context, failure signal, root cause, fix pattern, reusable rule.
- If a new reliable rule or fallback was discovered, promote it into topic
lessons.mdimmediately. - Keep entries concise and deduplicated by updating prior rules instead of adding noisy repeats.
References
Load references/computer-use-techniques.md for command snippets and fallback templates.
Load references/learnings/index.md to select the right topic folder.
Load topic references/learnings/<topic-slug>/lessons.md first.
Load topic references/learnings/<topic-slug>/experience-log.md second.
Load references/learnings/general/experience-log.md only as fallback for cross-task patterns.
Load references/learnings/x-posting/lessons.md for all X/Twitter/Expost publishing.
Load references/learnings/linkedin-posting/lessons.md for all LinkedIn publishing.
Load references/learnings/google-flow/lessons.md when automating Google Flow video creation.
Load references/learnings/google-flow/experience-log.md after lessons for incremental learnings.
More from autobyteus/autobyteus-skills
software-engineering-workflow-skill
Run a staged software-engineering delivery feedback loop from bootstrap through investigation, requirements, design, runtime review, implementation, API/E2E and executable validation, code review, docs sync, and final handoff with durable artifacts and explicit re-entry.
55bilingual-style-article-writer
Write and revise publish-ready Chinese (WeChat) and English (Medium) articles in configurable author styles. Use when the user provides ideas, rough notes, or sample articles and asks Codex to match an existing style profile or create a new profile for any author, then iterate drafts until final.
1