python-sdk-best-practices
Bright Data Python SDK - Best Practices for Coding Agents
You are writing code that uses the brightdata-sdk Python package. Follow these rules precisely.
Installation
pip install brightdata-sdk
Critical Rules
- Always use context managers. The client MUST be used with
async with(orwithfor sync). Forgetting this causesRuntimeError: BrightDataClient not initialized. - Async is the default. The primary client is
BrightDataClient(async). UseSyncBrightDataClientonly when you cannot use async. - Never use SyncBrightDataClient inside async functions. It raises
RuntimeError. UseBrightDataClientinstead. - Token auto-loads from environment. Set
BRIGHTDATA_API_TOKENenv var or passtoken=param. Do not hardcode tokens. - All scraper methods are awaitable. Every call on the async client must be
awaited.
Authentication
# Option 1: Environment variable (preferred)
# export BRIGHTDATA_API_TOKEN="your_token"
async with BrightDataClient() as client:
...
# Option 2: Explicit token
async with BrightDataClient(token="your_token") as client:
...
# Option 3: .env file (requires python-dotenv)
# BRIGHTDATA_API_TOKEN=your_token
async with BrightDataClient() as client:
...
Imports
# Main clients
from brightdata import BrightDataClient # Async (primary)
from brightdata import SyncBrightDataClient # Sync wrapper
# Result models
from brightdata import ScrapeResult, SearchResult, CrawlResult
# Job model (for manual trigger/poll/fetch)
from brightdata import ScrapeJob
# Exceptions
from brightdata import (
BrightDataError, # Base exception
ValidationError, # Invalid input
AuthenticationError, # Bad/missing token
APIError, # API request failed (has .status_code, .response_text)
ZoneError, # Zone operation failed
NetworkError, # Network issue
SSLError, # SSL cert error
)
# Scraper Studio
from brightdata import ScraperStudioJob, JobStatus
# Dataset export utility
from brightdata.datasets import export
Core Patterns
Pattern 1: Web Scraping (Web Unlocker)
Scrapes any URL through Bright Data's proxy network, bypassing bot detection.
import asyncio
from brightdata import BrightDataClient
async def main():
async with BrightDataClient() as client:
# Single URL - returns ScrapeResult
result = await client.scrape_url("https://example.com")
print(result.success) # bool
print(result.data) # HTML string or parsed data
print(result.cost) # float, USD
# With options
result = await client.scrape_url(
url="https://example.com",
country="us", # Proxy country
response_format="raw", # "raw" (HTML) or "json"
method="GET", # HTTP method
timeout=60, # Request timeout seconds
)
asyncio.run(main())
Async mode (non-blocking, for batch/background):
result = await client.scrape_url(
url="https://example.com",
mode="async", # Triggers, polls, returns when ready
poll_interval=5, # Seconds between polls
poll_timeout=180, # Max wait (Web Unlocker async ~2 min)
)
# Batch: pass a list of URLs
results = await client.scrape_url(
url=["https://example.com/1", "https://example.com/2"],
mode="async",
poll_timeout=180,
)
# Returns List[ScrapeResult]
Pattern 2: Platform-Specific Scrapers
Structured data extraction from major platforms. Pattern: client.scrape.<platform>.<method>(url=...).
async with BrightDataClient() as client:
# Amazon
product = await client.scrape.amazon.products(url="https://amazon.com/dp/B0CRMZHDG8")
reviews = await client.scrape.amazon.reviews(url="https://amazon.com/dp/B0CRMZHDG8")
sellers = await client.scrape.amazon.sellers(url="https://amazon.com/dp/B0CRMZHDG8")
# LinkedIn
profile = await client.scrape.linkedin.profiles(url="https://linkedin.com/in/username")
company = await client.scrape.linkedin.companies(url="https://linkedin.com/company/name")
posts = await client.scrape.linkedin.posts(url="https://linkedin.com/posts/...")
# Instagram
ig_profile = await client.scrape.instagram.profiles(url="https://instagram.com/username")
ig_posts = await client.scrape.instagram.posts(url="https://instagram.com/p/...")
ig_comments = await client.scrape.instagram.comments(url="https://instagram.com/p/...")
ig_reels = await client.scrape.instagram.reels(url="https://instagram.com/reel/...")
# Facebook
fb_posts = await client.scrape.facebook.posts_by_profile(url="https://facebook.com/user", num_of_posts=10)
fb_group = await client.scrape.facebook.posts_by_group(url="https://facebook.com/groups/...", num_of_posts=10)
fb_comments = await client.scrape.facebook.comments(url="https://facebook.com/post/...", num_of_comments=20)
fb_reels = await client.scrape.facebook.reels(url="https://facebook.com/reel/...")
# YouTube
yt_profile = await client.scrape.youtube.profiles(url="https://youtube.com/@channel")
yt_video = await client.scrape.youtube.videos(url="https://youtube.com/watch?v=...")
yt_comments = await client.scrape.youtube.comments(url="https://youtube.com/watch?v=...")
# ChatGPT
response = await client.scrape.chatgpt.prompt(prompt="What is Python?")
# Batch prompts
responses = await client.scrape.chatgpt.prompts(prompts=["Q1", "Q2", "Q3"])
# TikTok
tt_profile = await client.scrape.tiktok.profiles(url="https://tiktok.com/@user")
# Reddit
reddit_post = await client.scrape.reddit.posts(url="https://reddit.com/r/...")
All scraper methods return ScrapeResult with .success, .data, .cost, .status.
Pattern 3: Search Discovery (keyword-based)
Find content by keyword instead of URL:
async with BrightDataClient() as client:
# Amazon product search
results = await client.scrape.amazon.products_search(keyword="wireless headphones")
# LinkedIn searches
profiles = await client.scrape.linkedin.profiles_search(keyword="data engineer", location="San Francisco")
jobs = await client.scrape.linkedin.jobs_search(keyword="python developer", location="New York")
companies = await client.scrape.linkedin.companies_search(keyword="AI startup")
# Instagram search
ig_profiles = await client.scrape.instagram.profiles_search(user_name="photography")
ig_posts = await client.scrape.instagram.posts_search(url="https://instagram.com/user", num_of_posts=20)
ig_reels = await client.scrape.instagram.reels_search(url="https://instagram.com/user", num_of_posts=10)
# YouTube search
videos = await client.scrape.youtube.videos_search(keyword="python tutorial", num_of_videos=10)
Pattern 4: SERP (Search Engine Results)
async with BrightDataClient() as client:
# Google
result = await client.search.google(
query="python web scraping",
location="United States", # Optional
language="en", # Default: "en"
device="desktop", # "desktop" or "mobile"
num_results=10, # Number of results
)
for item in result.data:
print(item["title"], item["link"])
# Bing
result = await client.search.bing(query="python tutorial", num_results=10)
# Yandex
result = await client.search.yandex(query="python", num_results=10)
SERP async mode:
result = await client.search.google(
query="python",
mode="async",
poll_interval=2,
poll_timeout=30,
)
SERP returns SearchResult with .data (list of dicts), .query, .search_engine.
Pattern 5: Datasets API
Access 175+ pre-collected, structured datasets.
async with BrightDataClient() as client:
# Filter a dataset - returns snapshot_id (string)
snapshot_id = await client.datasets.imdb_movies(
filter={"name": "title", "operator": "includes", "value": "black"},
records_limit=5,
)
# Download results (polls until ready)
data = await client.datasets.imdb_movies.download(snapshot_id)
print(f"Got {len(data)} records")
# Quick sample (no filter needed)
snapshot_id = await client.datasets.amazon_products.sample(records_limit=10)
data = await client.datasets.amazon_products.download(snapshot_id)
# Get field metadata
metadata = await client.datasets.imdb_movies.get_metadata()
for name, field in metadata.fields.items():
print(f"{name}: {field.type}")
Export to file:
from brightdata.datasets import export
export(data, "results.json") # JSON
export(data, "results.csv") # CSV
export(data, "results.jsonl") # JSONL
Available datasets include: amazon_products, amazon_reviews, linkedin_profiles, linkedin_companies, linkedin_jobs, airbnb_properties, imdb_movies, google_maps_reviews, yelp_businesses, glassdoor_companies, zillow_properties, instagram_profiles, tiktok_profiles, facebook_pages_posts, reddit_posts, goodreads_books, nba_players_stats, and 150+ more.
Pattern 6: Scraper Studio (Custom Scrapers)
Run custom scrapers built in Bright Data's Scraper Studio.
async with BrightDataClient() as client:
# High-level: trigger + poll + return
data = await client.scraper_studio.run(
collector="c_abc123", # Collector ID from dashboard
input={"url": "https://example.com/page"}, # Input for the scraper
timeout=180, # Max wait seconds
poll_interval=10, # Poll frequency
)
# Manual control
job = await client.scraper_studio.trigger(
collector="c_abc123",
input={"url": "https://example.com/page"},
)
print(job.response_id)
status = await job.status() # Returns JobStatus enum
data = await job.wait_and_fetch(timeout=120, poll_interval=10)
Pattern 7: Browser API (CDP)
Connect to Bright Data cloud browsers via Chrome DevTools Protocol.
from brightdata import BrightDataClient
client = BrightDataClient(
browser_username="brd-customer-hl_xxx-zone-scraping_browser1",
browser_password="your_password",
)
# Or use env vars: BRIGHTDATA_BROWSERAPI_USERNAME, BRIGHTDATA_BROWSERAPI_PASSWORD
url = client.browser.get_connect_url(country="us") # Optional country
# With Playwright
from playwright.async_api import async_playwright
async with async_playwright() as pw:
browser = await pw.chromium.connect_over_cdp(url)
page = await browser.new_page()
await page.goto("https://example.com")
content = await page.content()
await browser.close()
Pattern 8: Manual Trigger/Poll/Fetch
For fine-grained control over long-running scrapes:
async with BrightDataClient() as client:
# Step 1: Trigger (non-blocking)
job = await client.scrape.amazon.products_trigger(url="https://amazon.com/dp/B123")
print(f"Snapshot ID: {job.snapshot_id}")
# Step 2: Check status
status = await job.status() # "ready", "running", etc.
# Step 3: Wait for completion
await job.wait(timeout=180, poll_interval=10, verbose=True)
# Step 4: Fetch results
data = await job.fetch()
# Or combine wait + fetch into ScrapeResult:
result = await job.to_result(timeout=180)
print(result.data)
Pattern 9: Concurrent Batch Operations
import asyncio
from brightdata import BrightDataClient
async def main():
async with BrightDataClient() as client:
# Concurrent scraping
urls = [
"https://amazon.com/dp/B001",
"https://amazon.com/dp/B002",
"https://amazon.com/dp/B003",
]
tasks = [client.scrape.amazon.products(url=u) for u in urls]
results = await asyncio.gather(*tasks)
for r in results:
print(f"{r.url}: success={r.success}, cost=${r.cost:.4f}")
# Concurrent SERP queries
queries = ["python", "javascript", "rust"]
search_tasks = [client.search.google(query=q) for q in queries]
search_results = await asyncio.gather(*search_tasks)
asyncio.run(main())
Pattern 10: Sync Client
For scripts, notebooks, or non-async codebases:
from brightdata import SyncBrightDataClient
with SyncBrightDataClient() as client:
# All methods are synchronous - no await needed
result = client.scrape_url("https://example.com")
print(result.data)
result = client.scrape.amazon.products(url="https://amazon.com/dp/B123")
result = client.search.google(query="python")
# Datasets
snapshot_id = client.datasets.imdb_movies(
filter={"name": "title", "operator": "includes", "value": "black"},
records_limit=5,
)
data = client.datasets.imdb_movies.download(snapshot_id)
WARNING: Never use SyncBrightDataClient inside an async def function. It will raise a RuntimeError.
Result Objects Reference
All results inherit from BaseResult:
result.success # bool - operation succeeded
result.cost # Optional[float] - cost in USD
result.error # Optional[str] - error message if failed
result.elapsed_ms() # Optional[float] - total time in ms
result.to_dict() # Dict - serializable dictionary
result.to_json(indent=2) # str - JSON string
result.save_to_file("out.json") # Save to file
ScrapeResult additional fields:
result.url # str - original URL
result.status # "ready" | "error" | "timeout" | "in_progress"
result.data # Any - scraped data (dict, list, or HTML string)
result.snapshot_id # Optional[str] - Bright Data snapshot ID
result.platform # Optional[str] - "amazon", "linkedin", etc.
result.row_count # Optional[int] - number of data rows
SearchResult additional fields:
result.query # Dict - original query params
result.data # List[Dict] - search results
result.search_engine # "google" | "bing" | "yandex"
result.total_found # Optional[int] - total results found
Error Handling
from brightdata import (
BrightDataClient,
BrightDataError,
ValidationError,
AuthenticationError,
APIError,
NetworkError,
)
async with BrightDataClient() as client:
try:
result = await client.scrape_url("https://example.com")
except AuthenticationError:
print("Invalid API token")
except APIError as e:
print(f"API error {e.status_code}: {e.message}")
print(f"Response: {e.response_text}")
except NetworkError:
print("Network connectivity issue")
except ValidationError:
print("Invalid input parameters")
except BrightDataError as e:
print(f"Bright Data error: {e.message}")
Client Configuration
client = BrightDataClient(
token="...", # API token (or use env var)
timeout=30, # Default request timeout (seconds)
web_unlocker_zone="sdk_unlocker", # Web Unlocker zone name
serp_zone="sdk_serp", # SERP zone name
auto_create_zones=True, # Auto-create zones if missing
validate_token=False, # Validate token on init
rate_limit=10.0, # Max requests per rate_period (None to disable)
rate_period=1.0, # Rate limit window (seconds)
)
Zone auto-creation: By default, the SDK creates sdk_unlocker and sdk_serp zones on first use. Set auto_create_zones=False to disable.
Zone Management
async with BrightDataClient() as client:
# List all active zones
zones = await client.list_zones()
for zone in zones:
print(f"{zone['name']}: {zone.get('type', 'unknown')}")
# Delete a zone
await client.delete_zone("test_zone")
# Test connection
is_valid = await client.test_connection()
Common Mistakes to Avoid
-
Forgetting the context manager:
# WRONG - will raise RuntimeError client = BrightDataClient() result = await client.scrape_url("https://example.com") # CORRECT async with BrightDataClient() as client: result = await client.scrape_url("https://example.com") -
Using sync client in async code:
# WRONG - will raise RuntimeError async def main(): with SyncBrightDataClient() as client: result = client.scrape_url("...") # CORRECT async def main(): async with BrightDataClient() as client: result = await client.scrape_url("...") -
Forgetting await:
# WRONG - returns coroutine, not result result = client.scrape_url("https://example.com") # CORRECT result = await client.scrape_url("https://example.com") -
Not checking result.success:
result = await client.scrape_url("https://example.com") # Always check success before using data if result.success: process(result.data) else: print(f"Failed: {result.error}") -
Hardcoding API tokens:
# WRONG client = BrightDataClient(token="abc123secret") # CORRECT - use environment variable # export BRIGHTDATA_API_TOKEN=abc123secret client = BrightDataClient()
Environment Variables
| Variable | Purpose | Default |
|---|---|---|
BRIGHTDATA_API_TOKEN |
API authentication token | Required |
WEB_UNLOCKER_ZONE |
Web Unlocker zone name | sdk_unlocker |
SERP_ZONE |
SERP zone name | sdk_serp |
BRIGHTDATA_BROWSERAPI_USERNAME |
Browser API username | None |
BRIGHTDATA_BROWSERAPI_PASSWORD |
Browser API password | None |
Quick Decision Guide
| Task | Method |
|---|---|
| Scrape any URL (HTML) | client.scrape_url(url) |
| Scrape Amazon/LinkedIn/etc. (structured) | client.scrape.<platform>.<method>(url=...) |
| Search Google/Bing/Yandex | client.search.google(query=...) |
| Find products/profiles by keyword | client.scrape.<platform>.<type>_search(keyword=...) |
| Access pre-collected datasets | client.datasets.<name>(filter=..., records_limit=...) |
| Run custom Scraper Studio scraper | client.scraper_studio.run(collector=..., input=...) |
| Automate browser (Playwright/Puppeteer) | client.browser.get_connect_url() |
| Long-running scrape with manual control | client.scrape.<platform>.<method>_trigger(url=...) then job.wait() + job.fetch() |
For the full API surface and advanced patterns, read references/api-reference.md.