browser-use
Browser Automation
Automate browser interactions via Playwright MCP server.
Server Lifecycle
Start Server
# Using helper script (recommended)
bash scripts/start-server.sh
# Or manually
npx @playwright/mcp@latest --port 8808 --shared-browser-context &
Stop Server
# Using helper script (closes browser first)
bash scripts/stop-server.sh
# Or manually
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_close -p '{}'
pkill -f "@playwright/mcp"
When to Stop
- End of task: Stop when browser work is complete
- Long sessions: Keep running if doing multiple browser tasks
- Errors: Stop and restart if browser becomes unresponsive
Important: The --shared-browser-context flag is required to maintain browser state across multiple mcp-client.py calls. Without it, each call gets a fresh browser context.
Quick Reference
Navigation
# Go to URL
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_navigate \
-p '{"url": "https://example.com"}'
# Go back
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_navigate_back -p '{}'
Get Page State
# Accessibility snapshot (returns element refs for clicking/typing)
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_snapshot -p '{}'
# Screenshot
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_take_screenshot \
-p '{"type": "png", "fullPage": true}'
Interact with Elements
Use ref from snapshot output to target elements:
# Click element
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_click \
-p '{"element": "Submit button", "ref": "e42"}'
# Type text
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_type \
-p '{"element": "Search input", "ref": "e15", "text": "hello world", "submit": true}'
# Fill form (multiple fields)
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_fill_form \
-p '{"fields": [{"ref": "e10", "value": "john@example.com"}, {"ref": "e12", "value": "password123"}]}'
# Select dropdown
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_select_option \
-p '{"element": "Country dropdown", "ref": "e20", "values": ["US"]}'
Wait for Conditions
# Wait for text to appear
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_wait_for \
-p '{"text": "Success"}'
# Wait for time (ms)
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_wait_for \
-p '{"time": 2000}'
Execute JavaScript
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_evaluate \
-p '{"function": "return document.title"}'
Multi-Step Playwright Code
For complex workflows, use browser_run_code to run multiple actions in one call:
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_run_code \
-p '{"code": "async (page) => { await page.goto(\"https://example.com\"); await page.click(\"text=Learn more\"); return await page.title(); }"}'
Tip: Use browser_run_code for complex multi-step operations that should be atomic (all-or-nothing).
Workflow: Form Submission
- Navigate to page
- Get snapshot to find element refs
- Fill form fields using refs
- Click submit
- Wait for confirmation
- Screenshot result
Workflow: Data Extraction
- Navigate to page
- Get snapshot (contains text content)
- Use browser_evaluate for complex extraction
- Process results
Tool Reference
See references/playwright-tools.md for complete tool documentation.
Troubleshooting
| Issue | Solution |
|---|---|
| Element not found | Run browser_snapshot first to get current refs |
| Click fails | Try browser_hover first, then click |
| Form not submitting | Use "submit": true with browser_type |
| Page not loading | Increase wait time or use browser_wait_for |
More from salmanferozkhan/cloud-and-fast-api
chainlit
Expert guidance for building conversational AI applications with Chainlit framework in Python. Use when (1) creating chat interfaces for LLM applications, (2) building apps with OpenAI, LangChain, LlamaIndex, or Mistral AI, (3) implementing streaming responses, (4) adding UI elements like images, files, charts, (5) handling user file uploads, (6) implementing authentication (OAuth, password), (7) creating multi-step workflows with visible steps, (8) building RAG applications with document upload, or (9) deploying chat apps to web, Slack, Discord, or Teams.
87sqlmodel
Expert guidance for SQLModel - the Python library combining SQLAlchemy and Pydantic for database models. Use when (1) creating database models that work as both SQLAlchemy ORM and Pydantic schemas, (2) building FastAPI apps with database integration, (3) defining model relationships (one-to-many, many-to-many), (4) performing CRUD operations with type safety, (5) setting up async database sessions, (6) integrating with Alembic migrations, (7) handling model inheritance and mixins, or (8) converting between database models and API schemas.
17fastapi
Expert guidance for building REST APIs with FastAPI framework in Python. Use when (1) creating new FastAPI projects from scratch, (2) implementing API endpoints with routing, (3) working with Pydantic models for validation, (4) setting up dependency injection, (5) implementing authentication (OAuth2, JWT, API keys), (6) integrating databases (SQLAlchemy sync/async), (7) writing tests for FastAPI apps, (8) deploying FastAPI to production (Docker, Gunicorn), or (9) implementing advanced features like WebSockets, middleware, background tasks.
5microsoft-agent-framework
Expert guidance for building AI agents and multi-agent workflows using Microsoft Agent Framework for .NET. Use when (1) creating AI agents with OpenAI or Azure OpenAI, (2) implementing function tools and structured outputs, (3) building multi-turn conversations, (4) designing graph-based workflows with streaming/checkpointing, (5) implementing middleware pipelines, (6) orchestrating multi-agent systems with fan-out/fan-in patterns, (7) adding human-in-the-loop interactions, (8) integrating OpenTelemetry observability, or (9) exposing agents as MCP tools.
3docx
Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When Claude needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks
3xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
3