web-browser
browser-automation
Overview
Browser automation skill with two approaches:
agent-browser - Snapshot-based interaction model optimized for AI agents
- Compact element refs (
@e1,@e2) reduce token usage dramatically - Workflow:
open→snapshot -i→ interact with refs → re-snapshot - Best for: dynamic exploration, form filling, scraping with unknown structure
playwright - Direct Playwright CLI and Node.js scripts
- Full Playwright API access via scripts
- Codegen for recording interactions
- Best for: scripted automation, testing, batch operations, complex workflows
Sub-skills
CRITICAL: You MUST load the appropriate sub-skill from the sub-skills/ directory based on user intent.
When to use each
| Sub-skill | When to use | Triggers |
|---|---|---|
| agent-browser.md | Interactive exploration, AI-driven navigation, unknown page structure | "navigate to", "fill this form", "click the button", "scrape this page", "explore the site" |
| playwright.md | Scripted automation, testing, batch screenshots, codegen | "write a script", "generate test", "batch screenshot", "record my actions", "create automation script" |
Default behavior
- If user intent is unclear, prefer agent-browser for interactive tasks
- If user asks for "a script" or "automation code", use playwright
- If user mentions "codegen" or "record", use playwright
Process
- Determine user intent from their request
- Load the appropriate sub-skill from
sub-skills/ - Execute the sub-skill process
- Verify expected outcome was achieved
Resources
- sub-skills/: Approach-specific instructions
agent-browser.md: Snapshot/refs workflow with npx agent-browserplaywright.md: Playwright CLI and Node.js scripts
- references/agent-browser/: Deep-dive documentation for agent-browser
- templates/agent-browser/: Ready-to-use shell scripts for agent-browser
Quick reference
agent-browser (default for interactive tasks)
# Session isolation (generate random slug like bright-falcon)
npx agent-browser --session <slug> open https://example.com
npx agent-browser --session <slug> snapshot -i
npx agent-browser --session <slug> click @e1
npx agent-browser --session <slug> fill @e2 "text"
playwright (for scripts and codegen)
# Quick screenshot
npx playwright screenshot https://example.com output.png
# Record interactions as code
npx playwright codegen https://example.com
# PDF generation
npx playwright pdf https://example.com output.pdf
More from nikhilmaddirala/gtd-cc
tools-catppuccin
Agent skill for creating and validating Catppuccin theme ports
18obsidian-gtd
Obsidian vault management and GTD workflows. Use when integrating with Obsidian vaults, managing notes, organizing knowledge, or supporting Getting Things Done methodology through note-based workflows.
13web-search
General web search patterns and techniques including Gemini CLI coordination. Use this skill when you need to perform web searches, find current information, or research topics online. Covers both Gemini CLI and built-in WebSearch tool usage with precise instruction crafting.
11tools-diagnostics
Interactive system resource analysis and troubleshooting for memory, disk, CPU, and performance issues
11web-content-extraction
Extract documentation and content from websites. Supports Mintlify, Starlight/Astro, Docusaurus, GitBook, ReadTheDocs, Sphinx, and generic sites. Uses a tiered approach - try the simplest method first (direct curl, Jina AI Reader) before falling back to Crawl4AI for JS-heavy sites.
10docs-pdf
Parse PDF documents into repository-friendly markdown and text artifacts. Use when users need to extract text, tables, or structure from PDF files.
10