Browser Automation for AI Web Interfaces

Use your ChatGPT Plus and Gemini Advanced subscriptions through browser automation. No API costs - just your monthly subscription.

How It Works

┌─────────────────────────────────────────────────────────────────┐
│                    BROWSER AUTOMATION FLOW                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Your Prompt ──► Playwright MCP ──► Browser Instance            │
│                                           │                      │
│                              ┌────────────┴────────────┐        │
│                              ▼                         ▼        │
│                      ┌─────────────┐           ┌─────────────┐  │
│                      │  ChatGPT    │           │   Gemini    │  │
│                      │  chat.openai│           │   gemini.   │  │
│                      │  .com       │           │   google.com│  │
│                      └──────┬──────┘           └──────┬──────┘  │
│                             │                         │         │
│                             ▼                         ▼         │
│                      Response captured & returned to you        │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Prerequisites

1. Playwright MCP Must Be Active

You have Playwright MCP configured. Verify it's working:

Use browser_snapshot to check if browser is available

2. Login Sessions

The browser automation uses saved sessions. You need to log in once:

First-time setup:

Navigate to ChatGPT/Gemini
Log in with your credentials
Session is saved for future use

ChatGPT Automation

Step-by-Step Workflow

Step 1: Navigate to ChatGPT

browser_navigate → https://chat.openai.com

Step 2: Check if logged in

browser_snapshot → Look for chat input or login button

Step 3: If not logged in, authenticate

browser_click → "Log in" button
browser_type → Enter email
browser_click → Continue
browser_type → Enter password
browser_click → Log in

Step 4: Start new chat

browser_click → "New chat" button (or navigate to chat.openai.com)

Step 5: Type your prompt

browser_type → Your prompt text in the message input

Step 6: Submit and wait

browser_click → Send button
browser_wait_for → Wait for response to complete

Step 7: Capture response

browser_snapshot → Get the response text

Example: ChatGPT Writing Task

I will now use browser automation to get ChatGPT's response:

1. browser_navigate to https://chat.openai.com
2. browser_snapshot to see current state
3. browser_type to enter prompt in textarea
4. browser_click to send
5. browser_wait_for response
6. browser_snapshot to capture output

Gemini Automation

Step-by-Step Workflow

Step 1: Navigate to Gemini

browser_navigate → https://gemini.google.com

Step 2: Check if logged in

browser_snapshot → Look for chat input

Step 3: Type your prompt

browser_type → Your prompt in the input area

Step 4: Submit

browser_press_key → Enter (or click send button)

Step 5: Wait and capture

browser_wait_for → Response generation
browser_snapshot → Get response

Practical Commands

For Claude Code Session

When you want me to use browser automation, say:

"Use browser automation to ask ChatGPT: [your prompt]"
"Get Gemini's take on: [your prompt]"
"Compare browser outputs for: [your prompt]"

I will then:

Use Playwright MCP tools
Navigate to the appropriate site
Enter your prompt
Capture and return the response

Handling Authentication

Session Persistence

Browser automation works best with persistent sessions:

# The Playwright MCP maintains browser state
# Once logged in, sessions typically persist

If Session Expires

If you see a login screen:

ChatGPT: Look for "Log in" button, click it
Gemini: Look for "Sign in" button, click it
Complete authentication flow
Resume automation

Two-Factor Authentication

If 2FA is required:

Automation will pause at 2FA screen
You manually complete 2FA
Automation continues

Limitations

Browser Automation Caveats

Limitation	Workaround
Slower than API	Use for comparison, not bulk
Can break if UI changes	Report issues, I'll adapt
Requires active session	Keep browser open
Rate limits still apply	Don't spam requests
CAPTCHAs possible	May need manual intervention

When NOT to Use Browser Automation

Bulk content generation (use GLM-4.7 API instead)
Time-critical tasks (APIs are faster)
Fully automated pipelines (APIs more reliable)

When TO Use Browser Automation

Comparing writing styles
Using features only in Plus/Advanced
Testing latest model versions
When APIs are down

Comparison Workflow

Get Same Prompt from Multiple Sources

Step 1: Write with Claude (default, in this conversation)
Step 2: browser_navigate to ChatGPT, get response
Step 3: browser_navigate to Gemini, get response
Step 4: Compare all three side-by-side

Example Request

"Compare how you, ChatGPT, and Gemini would write a tweet about
the cardiovascular benefits of SGLT2 inhibitors"

I will:

Write my version (Claude)
Use browser automation to get ChatGPT's version
Use browser automation to get Gemini's version
Present all three for comparison

Troubleshooting

Browser Not Responding

browser_close → Close current browser
Then start fresh with browser_navigate

Wrong Page Loaded

browser_snapshot → Check current state
browser_navigate → Go to correct URL

Element Not Found

browser_snapshot → Get fresh page state
Look for correct element reference
Retry with updated reference

Session Logged Out

browser_navigate → Go to login page
Complete login flow
Resume automation

Integration with Multi-Model Writer

This skill works with multi-model-writer:

API Models:
- /write-glm → Z.AI API
- /write-gpt → OpenAI API
- /write-gemini → Google AI Studio API

Browser Models:
- /browser-chatgpt → ChatGPT Plus web
- /browser-gemini → Gemini Advanced web

Use APIs for speed and reliability. Use browser for subscription-only features or comparison.

Example Session

User: "Use browser to compare how ChatGPT writes about statins"

Claude: I'll get ChatGPT's perspective using browser automation.

[Uses browser_navigate to https://chat.openai.com]
[Uses browser_snapshot to verify page state]
[Uses browser_type to enter: "Write a patient-friendly explanation of how statins work"]
[Uses browser_click to send]
[Uses browser_wait_for to wait for response]
[Uses browser_snapshot to capture response]

Here's what ChatGPT wrote:
[Response text]

Compared to my approach:
[Claude's version]

Key differences:
- ChatGPT emphasized X while I focused on Y
- Tone: ChatGPT more conversational, mine more clinical
- Length: Similar word count

Browser automation gives you access to your paid subscriptions programmatically, complementing the API-based models in your arsenal.

browser-automation

Browser Automation for AI Web Interfaces

How It Works

Prerequisites

1. Playwright MCP Must Be Active

2. Login Sessions

ChatGPT Automation

Step-by-Step Workflow

Example: ChatGPT Writing Task

Gemini Automation

Step-by-Step Workflow

Practical Commands

For Claude Code Session

Handling Authentication

Session Persistence

If Session Expires

Two-Factor Authentication

Limitations

Browser Automation Caveats

When NOT to Use Browser Automation

When TO Use Browser Automation

Comparison Workflow

Get Same Prompt from Multiple Sources

Example Request

Troubleshooting

Browser Not Responding

Wrong Page Loaded

Element Not Found

Session Logged Out

Integration with Multi-Model Writer

Example Session