data-feeds
Bright Data — Data Feeds (Pipelines)
Extract structured data from supported platforms via bdata pipelines. One call, clean JSON, no scraping logic. For unsupported URLs, hand off to scrape. To find target URLs first, hand off to search.
Setup gate (run first)
if ! command -v bdata >/dev/null 2>&1; then
echo "bdata CLI not installed — see bright-data-best-practices/references/cli-setup.md"
elif ! bdata zones >/dev/null 2>&1; then
echo "bdata not authenticated — run: bdata login (or: bdata login --device for SSH)"
fi
Halt and route to skills/bright-data-best-practices/references/cli-setup.md if either check fails.
Supported pipeline types (verified 2026-04-19)
Always verify with bdata pipelines list before hardcoding names — they change. Current 43 types:
amazon_product, amazon_product_reviews, amazon_product_search, apple_app_store, bestbuy_products, booking_hotel_listings, crunchbase_company, ebay_product, etsy_products, facebook_company_reviews, facebook_events, facebook_marketplace_listings, facebook_posts, github_repository_file, google_maps_reviews, google_play_store, google_shopping, homedepot_products, instagram_comments, instagram_posts, instagram_profiles, instagram_reels, linkedin_company_profile, linkedin_job_listings, linkedin_people_search, linkedin_person_profile, linkedin_posts, reddit_posts, reuter_news, tiktok_comments, tiktok_posts, tiktok_profiles, tiktok_shop, walmart_product, walmart_seller, x_posts, yahoo_finance_business, youtube_comments, youtube_profiles, youtube_videos, zara_products, zillow_properties_listing, zoominfo_company_profile
Naming note: inconsistent across platforms. amazon_product (singular), tiktok_profiles (plural), linkedin_person_profile (not linkedin_profile). Always copy from bdata pipelines list.
Pick your path
| Situation | Action |
|---|---|
| Know the platform + have URL(s) | bdata pipelines <type> <url> |
| Don't know which pipeline fits | bdata pipelines list first |
| Pipeline takes keyword or multi-arg input | See "Keyword- and multi-arg pipelines" below |
| Multiple URLs on the same pipeline type | shell loop with parallelism cap (see references/patterns.md) |
| Long job (reviews, company employees, big post feeds) | raise --timeout 1800 |
| URL is on an unsupported platform | stop — hand off to scrape |
| Need to find URLs first | hand off to search |
Keyword- and multi-arg pipelines (do NOT take a single URL)
A few pipelines take non-URL or multi-positional inputs. Invoke with no args to see the exact usage line from the CLI:
| Pipeline | Args |
|---|---|
amazon_product_search |
<keyword> <domain_url> — e.g., "running shoes" https://www.amazon.com |
linkedin_people_search |
<url> <first_name> <last_name> — search a company/school/URL for a named person |
facebook_company_reviews |
<url> [num_reviews] — optional num_reviews defaults to 10 |
google_maps_reviews |
<url> [days_limit] — optional days_limit defaults to 3 |
youtube_comments |
<url> [num_comments] — optional num_comments defaults to 10 |
All other 37 pipelines take a single URL.
Action
Core commands:
# List available pipeline types (source of truth)
bdata pipelines list
# Amazon product
bdata pipelines amazon_product \
"https://www.amazon.com/dp/B08N5WRWNW" \
--format json --pretty -o product.json
# Amazon product reviews (slower — reviews can be hundreds)
bdata pipelines amazon_product_reviews \
"https://www.amazon.com/dp/B08N5WRWNW" \
--timeout 1200 -o reviews.json
# Amazon product search (keyword + domain URL)
bdata pipelines amazon_product_search \
"noise cancelling headphones" "https://www.amazon.com" \
--format json --pretty -o search.json
# LinkedIn person profile
bdata pipelines linkedin_person_profile \
"https://www.linkedin.com/in/example" -o person.json
# LinkedIn company
bdata pipelines linkedin_company_profile \
"https://www.linkedin.com/company/example" -o company.json
# LinkedIn people search (url + first + last name)
bdata pipelines linkedin_people_search \
"https://www.linkedin.com/company/example" "Jane" "Doe" \
-o people.json
# Instagram posts
bdata pipelines instagram_posts \
"https://www.instagram.com/example/" -o posts.json
# Google Maps reviews (url + days_limit, default 3)
bdata pipelines google_maps_reviews \
"https://maps.google.com/?cid=1234567890" 90 -o reviews.json
# YouTube comments (url + num_comments, default 10)
bdata pipelines youtube_comments \
"https://www.youtube.com/watch?v=abc123" 100 -o yt-comments.json
# NDJSON for big feeds (one record per line)
bdata pipelines linkedin_posts "https://www.linkedin.com/in/example" \
--format ndjson -o posts.ndjson
# Raise polling timeout for long jobs
bdata pipelines amazon_product_reviews "<url>" --timeout 1800 -o out.json
Full flag reference + full type table: references/flags.md.
Verification gate
-
JSON parses cleanly:
jq . <output>returns 0 (or for--format ndjson, each line parses). -
Record count matches expected. One URL usually = one record, but reviews/posts/comments pipelines return arrays sized by what the platform shows. Always check:
jq 'length' out.json # top-level array count # OR jq 'if type == "array" then length else 1 end' out.json -
No top-level error:
jq -e 'if type == "object" then has("error") | not else true end' out.json \ || { echo "pipeline reported error"; exit 1; } -
No per-record error: for array results, ensure no record has an
errorfield:jq -e 'if type == "array" then map(has("error")) | any | not else true end' out.json \ || echo "WARN: one or more records have error fields"Partial failures are silent — this check is non-optional.
-
Core fields present for the pipeline type (examples):
amazon_product→.title+.price(or.final_price)linkedin_person_profile→.name+.headline(or.position)instagram_posts→.captionor.description+.urlor.post_idyoutube_videos→.title+.video_idor.url
Spot-check with
jq keyson the first record to learn the exact schema. -
On failure: double
--timeoutand retry once. If still failing,bdata pipelines listto confirm the type name hasn't changed.
Red flags
- Using
bdata scrapeon Amazon/LinkedIn/TikTok/etc. whenbdata pipelines <type>returns structured fields in one call. Loses structure and costs more time. - Looping
bdata pipelinesfor large jobs without rate-limiting — each call can trigger a long-running pipeline on the server. Cap parallelism at 2–3. - Claiming success without the record-count + per-record error check. Partial failures are silent in pipeline output.
- Hardcoding pipeline type names (
amazon_productswith ans,linkedin_profilewithout_person_, etc.) — they're inconsistent across platforms. Always copy frombdata pipelines list. - Using a tight
--timeouton pipelines that legitimately take 5–15 minutes (reviews, company employees, big post feeds). Default 600s is a floor for small inputs; raise for long ones. - Calling a keyword- or multi-arg pipeline (
amazon_product_search,linkedin_people_search,google_maps_reviews,facebook_company_reviews,youtube_comments) with URL-only args — will fail with"Usage: ...". Always checkbdata pipelines <type>error output when in doubt. - Passing a
pages_to_searchthird arg toamazon_product_search— it's hardcoded to1by the CLI and extra args are ignored.
References
references/flags.md— fullpipelinesflags + complete table of all 43 types with input shapes.references/patterns.md— sync timeout tuning, shell-loop batching with parallelism cap, partial-failure detection, keyword-shaped pipeline cheatsheet, legacycurlfallback, shared verification checklist.references/examples.md— (1) single Amazon product, (2) batch LinkedIn companies, (3) long reviews job with raised timeout, (4) mixed-platform workflow callingpipelines listfirst, (5) keyword-shapedamazon_product_search.
More from brightdata/skills
scrape
Scrape web content as clean markdown/HTML/JSON via the Bright Data CLI (`bdata scrape`). Use when the user wants to fetch a page, extract content from a list of URLs, or crawl paginated listings. Hands off to `data-feeds` for supported platforms (Amazon, LinkedIn, TikTok, Instagram, YouTube, Reddit, etc.) and to `search` when URLs must be discovered first. Requires the Bright Data CLI; proactively guides install + login if missing.
10.3Ksearch
Search the web via the Bright Data CLI — `bdata search` for Google/Bing/Yandex SERP, `bdata discover` for intent-ranked semantic results. Use when the user wants SERP results, needs URLs to feed into scraping, or wants semantic web discovery with optional page content. Hands off to `scrape` once target URLs are chosen, and to `data-feeds` when the user wants structured data from a known platform. Requires the Bright Data CLI; proactively guides install + login if missing.
7.1Kbrightdata-cli
Guide for using the Bright Data CLI (`brightdata` / `bdata`) to scrape websites, search the web, extract structured data from 40+ platforms, manage proxy zones, and check account budget. Use this skill whenever the user wants to scrape a URL, search Google/Bing/Yandex, extract data from Amazon/LinkedIn/Instagram/TikTok/YouTube/Reddit or any other platform, check their Bright Data balance or zones, or do anything involving web data collection from the terminal. Also trigger when the user mentions brightdata, bdata, web scraping CLI, SERP API, or wants to install Bright Data skills into their coding agent.
1.6Kseo-audit
When the user wants to audit, review, or diagnose SEO issues on their site. Uses live web data via the Bright Data CLI for accurate detection of JS-injected schema, hreflang, canonicals, and live SERP-based ranking checks. Also use when the user mentions "SEO audit," "technical SEO," "why am I not ranking," "SEO issues," "on-page SEO," "meta tags review," "SEO health check," "my traffic dropped," "lost rankings," "not showing up in Google," "site isn't ranking," "Google update hit me," "page speed," "core web vitals," "crawl errors," or "indexing issues." Use this even if the user just says something vague like "my SEO is bad" or "help with SEO" — start with an audit. For building pages at scale to target keywords, see programmatic-seo. For implementing structured data, see schema-markup. For AI search optimization, see ai-seo.
1.5Kbright-data-best-practices
Build production-ready Bright Data integrations with best practices baked in. Reference documentation for developers using coding assistants (Claude Code, Cursor, etc.) to implement web scraping, search, browser automation, and structured data extraction. Covers Web Unlocker API, SERP API, Web Scraper API, and Browser API (Scraping Browser).
1.4Kbright-data-mcp
|
746