browser-automation
Browser Automation for AI Web Interfaces
Use your ChatGPT Plus and Gemini Advanced subscriptions through browser automation. No API costs - just your monthly subscription.
How It Works
┌─────────────────────────────────────────────────────────────────┐
│ BROWSER AUTOMATION FLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Your Prompt ──► Playwright MCP ──► Browser Instance │
│ │ │
│ ┌────────────┴────────────┐ │
│ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ ChatGPT │ │ Gemini │ │
│ │ chat.openai│ │ gemini. │ │
│ │ .com │ │ google.com│ │
│ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │
│ ▼ ▼ │
│ Response captured & returned to you │
│ │
└─────────────────────────────────────────────────────────────────┘
Prerequisites
1. Playwright MCP Must Be Active
You have Playwright MCP configured. Verify it's working:
Use browser_snapshot to check if browser is available
2. Login Sessions
The browser automation uses saved sessions. You need to log in once:
First-time setup:
- Navigate to ChatGPT/Gemini
- Log in with your credentials
- Session is saved for future use
ChatGPT Automation
Step-by-Step Workflow
Step 1: Navigate to ChatGPT
browser_navigate → https://chat.openai.com
Step 2: Check if logged in
browser_snapshot → Look for chat input or login button
Step 3: If not logged in, authenticate
browser_click → "Log in" button
browser_type → Enter email
browser_click → Continue
browser_type → Enter password
browser_click → Log in
Step 4: Start new chat
browser_click → "New chat" button (or navigate to chat.openai.com)
Step 5: Type your prompt
browser_type → Your prompt text in the message input
Step 6: Submit and wait
browser_click → Send button
browser_wait_for → Wait for response to complete
Step 7: Capture response
browser_snapshot → Get the response text
Example: ChatGPT Writing Task
I will now use browser automation to get ChatGPT's response:
1. browser_navigate to https://chat.openai.com
2. browser_snapshot to see current state
3. browser_type to enter prompt in textarea
4. browser_click to send
5. browser_wait_for response
6. browser_snapshot to capture output
Gemini Automation
Step-by-Step Workflow
Step 1: Navigate to Gemini
browser_navigate → https://gemini.google.com
Step 2: Check if logged in
browser_snapshot → Look for chat input
Step 3: Type your prompt
browser_type → Your prompt in the input area
Step 4: Submit
browser_press_key → Enter (or click send button)
Step 5: Wait and capture
browser_wait_for → Response generation
browser_snapshot → Get response
Practical Commands
For Claude Code Session
When you want me to use browser automation, say:
"Use browser automation to ask ChatGPT: [your prompt]"
"Get Gemini's take on: [your prompt]"
"Compare browser outputs for: [your prompt]"
I will then:
- Use Playwright MCP tools
- Navigate to the appropriate site
- Enter your prompt
- Capture and return the response
Handling Authentication
Session Persistence
Browser automation works best with persistent sessions:
# The Playwright MCP maintains browser state
# Once logged in, sessions typically persist
If Session Expires
If you see a login screen:
- ChatGPT: Look for "Log in" button, click it
- Gemini: Look for "Sign in" button, click it
- Complete authentication flow
- Resume automation
Two-Factor Authentication
If 2FA is required:
- Automation will pause at 2FA screen
- You manually complete 2FA
- Automation continues
Limitations
Browser Automation Caveats
| Limitation | Workaround |
|---|---|
| Slower than API | Use for comparison, not bulk |
| Can break if UI changes | Report issues, I'll adapt |
| Requires active session | Keep browser open |
| Rate limits still apply | Don't spam requests |
| CAPTCHAs possible | May need manual intervention |
When NOT to Use Browser Automation
- Bulk content generation (use GLM-4.7 API instead)
- Time-critical tasks (APIs are faster)
- Fully automated pipelines (APIs more reliable)
When TO Use Browser Automation
- Comparing writing styles
- Using features only in Plus/Advanced
- Testing latest model versions
- When APIs are down
Comparison Workflow
Get Same Prompt from Multiple Sources
Step 1: Write with Claude (default, in this conversation)
Step 2: browser_navigate to ChatGPT, get response
Step 3: browser_navigate to Gemini, get response
Step 4: Compare all three side-by-side
Example Request
"Compare how you, ChatGPT, and Gemini would write a tweet about
the cardiovascular benefits of SGLT2 inhibitors"
I will:
- Write my version (Claude)
- Use browser automation to get ChatGPT's version
- Use browser automation to get Gemini's version
- Present all three for comparison
Troubleshooting
Browser Not Responding
browser_close → Close current browser
Then start fresh with browser_navigate
Wrong Page Loaded
browser_snapshot → Check current state
browser_navigate → Go to correct URL
Element Not Found
browser_snapshot → Get fresh page state
Look for correct element reference
Retry with updated reference
Session Logged Out
browser_navigate → Go to login page
Complete login flow
Resume automation
Integration with Multi-Model Writer
This skill works with multi-model-writer:
API Models:
- /write-glm → Z.AI API
- /write-gpt → OpenAI API
- /write-gemini → Google AI Studio API
Browser Models:
- /browser-chatgpt → ChatGPT Plus web
- /browser-gemini → Gemini Advanced web
Use APIs for speed and reliability. Use browser for subscription-only features or comparison.
Example Session
User: "Use browser to compare how ChatGPT writes about statins"
Claude: I'll get ChatGPT's perspective using browser automation.
[Uses browser_navigate to https://chat.openai.com]
[Uses browser_snapshot to verify page state]
[Uses browser_type to enter: "Write a patient-friendly explanation of how statins work"]
[Uses browser_click to send]
[Uses browser_wait_for to wait for response]
[Uses browser_snapshot to capture response]
Here's what ChatGPT wrote:
[Response text]
Compared to my approach:
[Claude's version]
Key differences:
- ChatGPT emphasized X while I focused on Y
- Tone: ChatGPT more conversational, mine more clinical
- Length: Similar word count
Browser automation gives you access to your paid subscriptions programmatically, complementing the API-based models in your arsenal.