scrapfly-browser
Scrapfly Cloud Browser
Use the Scrapfly Cloud Browser API with Python Playwright to automate remote cloud browsers with built-in proxy rotation, anti-bot fingerprinting, and geo-targeting.
When to use
- Automating browser interactions on remote cloud browsers (login flows, form filling, navigation)
- Scraping JavaScript-heavy sites that require full browser automation
- Bypassing anti-bot protections with managed browser fingerprinting
- Running Playwright scripts through geo-targeted proxies
- Maintaining persistent browser sessions across multiple connections
- Downloading files through browser interactions
Setup
pip install playwright
The API key must be provided via environment variable SCRAPFLY_API_KEY or passed directly in the connection URL.
API Reference
Connection: WebSocket CDP (Chrome DevTools Protocol)
wss://browser.scrapfly.io?api_key=YOUR_API_KEY
Connection Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key |
str | required | Scrapfly API key |
proxy_pool |
str | "datacenter" |
Proxy type: "datacenter" or "residential" |
os |
str | random | OS fingerprint: "linux", "windows", or "macos" |
country |
str | None | Proxy country (ISO 3166-1 alpha-2, e.g. "us", "de") |
session |
str | None | Session ID for browser state persistence |
auto_close |
str | "true" |
Close session on disconnect. Set "false" to keep alive |
timeout |
int | 900 | Max session duration in seconds (900-1800) |
debug |
str | "false" |
Enable session video recording |
Building the Connection URL
import os
from urllib.parse import urlencode
params = {
"api_key": os.environ["SCRAPFLY_API_KEY"],
"proxy_pool": "datacenter",
"os": "linux",
}
BROWSER_WS = f"wss://browser.scrapfly.io?{urlencode(params)}"
Examples
Basic page navigation
from playwright.sync_api import sync_playwright
import os
API_KEY = os.environ["SCRAPFLY_API_KEY"]
BROWSER_WS = f"wss://browser.scrapfly.io?api_key={API_KEY}&proxy_pool=datacenter&os=linux"
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(BROWSER_WS)
try:
context = browser.contexts[0]
page = context.pages[0] if context.pages else context.new_page()
page.goto("https://web-scraping.dev")
print("Title:", page.title())
print("Content:", page.content()[:500])
finally:
browser.close()
Async Playwright
from playwright.async_api import async_playwright
import asyncio
import os
API_KEY = os.environ["SCRAPFLY_API_KEY"]
BROWSER_WS = f"wss://browser.scrapfly.io?api_key={API_KEY}&proxy_pool=datacenter&os=linux"
async def main():
async with async_playwright() as p:
browser = await p.chromium.connect_over_cdp(BROWSER_WS)
try:
context = browser.contexts[0]
page = context.pages[0] if context.pages else await context.new_page()
await page.goto("https://web-scraping.dev")
print("Title:", await page.title())
finally:
await browser.close()
asyncio.run(main())
Login flow with form filling
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(BROWSER_WS)
try:
context = browser.contexts[0]
page = context.pages[0] if context.pages else context.new_page()
page.goto("https://web-scraping.dev/login")
page.fill("input[name=username]", "user123")
page.fill("input[name=password]", "password")
page.click("button[type=submit]")
page.wait_for_selector("div#secret-message")
print("Logged in:", page.url)
finally:
browser.close()
Take a screenshot
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(BROWSER_WS)
try:
context = browser.contexts[0]
page = context.pages[0] if context.pages else context.new_page()
page.goto("https://web-scraping.dev")
page.screenshot(path="screenshot.png", full_page=True)
finally:
browser.close()
Scrape data with selectors
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(BROWSER_WS)
try:
context = browser.contexts[0]
page = context.pages[0] if context.pages else context.new_page()
page.goto("https://web-scraping.dev/products")
page.wait_for_selector("div.product")
products = page.query_selector_all("div.product")
for product in products:
name = product.query_selector("a").inner_text()
price = product.query_selector(".price").inner_text()
print(f"{name}: {price}")
finally:
browser.close()
Geo-targeted browsing with residential proxy
BROWSER_WS = (
f"wss://browser.scrapfly.io?"
f"api_key={API_KEY}"
f"&proxy_pool=residential"
f"&country=de"
f"&os=windows"
)
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(BROWSER_WS)
try:
context = browser.contexts[0]
page = context.pages[0] if context.pages else context.new_page()
page.goto("https://web-scraping.dev")
print("Page content from DE proxy:", page.title())
finally:
browser.close()
Persistent sessions (multi-step workflows)
Uses the same login credentials and selectors as the Login flow with form filling example.
LOGIN_URL = "https://web-scraping.dev/login"
USERNAME = "user123"
PASSWORD = "password"
# Step 1: Login and keep session alive
BROWSER_WS = (
f"wss://browser.scrapfly.io?"
f"api_key={API_KEY}"
f"&session=my-session"
f"&auto_close=false"
f"&proxy_pool=datacenter"
)
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(BROWSER_WS)
try:
context = browser.contexts[0]
page = context.pages[0] if context.pages else context.new_page()
page.goto(LOGIN_URL)
page.fill("input[name=username]", USERNAME)
page.fill("input[name=password]", PASSWORD)
page.click("button[type=submit]")
page.wait_for_selector("div#secret-message")
print("Login complete, session preserved")
finally:
browser.close() # Disconnects but session stays alive
# Step 2: Reconnect to same session (still authenticated with same credentials)
BROWSER_WS = (
f"wss://browser.scrapfly.io?"
f"api_key={API_KEY}"
f"&session=my-session"
f"&proxy_pool=datacenter"
)
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(BROWSER_WS)
try:
context = browser.contexts[0]
page = context.pages[0] if context.pages else context.new_page()
page.goto(LOGIN_URL)
page.wait_for_selector("div#secret-message")
print("Accessed protected page (still logged in):", page.title())
finally:
browser.close()
Infinite scroll scraping
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(BROWSER_WS)
try:
context = browser.contexts[0]
page = context.pages[0] if context.pages else context.new_page()
page.goto("https://web-scraping.dev/testimonials")
page.wait_for_selector("span.rating")
all_items = []
for _ in range(10): # Scroll 10 times
items = page.query_selector_all("span.rating")
all_items = items
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
page.wait_for_timeout(2000)
print(f"Collected {len(all_items)} reviews")
finally:
browser.close()
Execute JavaScript in page
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(BROWSER_WS)
try:
context = browser.contexts[0]
page = context.pages[0] if context.pages else context.new_page()
page.goto("https://web-scraping.dev")
# Extract data using JavaScript
data = page.evaluate("""
() => {
return {
title: document.title,
links: Array.from(document.querySelectorAll('a')).map(a => ({
text: a.innerText,
href: a.href,
})),
};
}
""")
print(f"Found {len(data['links'])} links")
finally:
browser.close()
Multi-page navigation
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(BROWSER_WS)
try:
context = browser.contexts[0]
page = context.pages[0] if context.pages else context.new_page()
urls = [
"https://web-scraping.dev/products",
"https://web-scraping.dev/products?page=2",
"https://web-scraping.dev/products?page=3",
]
results = []
for url in urls:
page.goto(url)
results.append({"url": url, "content": content})
print(f"Scraped {len(results)} pages")
finally:
browser.close()
Anti-Bot Features
The Cloud Browser automatically manages browser fingerprinting:
- TLS/JA3 fingerprints matching real browsers
- User-Agent, device metrics, timezone, locale (based on
country) - WebGL and Canvas fingerprint emulation
- HTTP/2 fingerprints
Important Notes
- Always use
connect_over_cdp()(notconnect()) to connect to the Cloud Browser - Always use
browser.contexts[0]to get the pre-configured context. Do NOT create new contexts - the Cloud Browser provides a context with proper fingerprinting - Always close the browser in a
try/finallyblock to stop billing. Forgetting to close means continuous charges - Do NOT set fingerprint overrides (user agent, viewport, timezone, etc.) - they are managed automatically
- Sessions expire after 1 hour of inactivity
- Maximum session duration is 30 minutes (1800 seconds)
- Maximum file download size is 25 MB per file
- Start with
proxy_pool=datacenter, upgrade toresidentialonly when needed for anti-bot bypass - Use
debug=truefor session video recording when troubleshooting
More from scrapfly/skills
scrapfly-scraper
Web scraping using the Scrapfly Scraper API with the Python SDK
13scrapfly-extraction
Extract structured data from web content using the Scrapfly Extraction API with the Python SDK
12scrapfly-crawler
Crawl entire websites using the Scrapfly Crawler API with the Python SDK
10scrapfly-screenshot
Capture web page screenshots using the Scrapfly Screenshot API with the Python SDK
9