openwebninja_universal_scraper
OpenWeb Ninja Universal Scraper
Data extraction from 35+ OpenWeb Ninja APIs. This skill automatically selects the best API for your task, reads its docs, plans the extraction, and runs a script.
Prerequisites
(No need to check upfront)
.envfile withRAPIDAPI_KEY(orOPENWEBNINJA_API_KEY)- Node.js 20.6+ (for native
--env-filesupport)
Missing API Key — Setup Instructions
Before making any API call, check that the required key exists in .env. If it is missing:
- Read
meta.jsonfor the selected API to getopenwebninja_urlandrapidapi_url - Open the subscription page in the user's browser (prefer OpenWeb Ninja):
open "{openwebninja_url}" # preferred # or: open "{rapidapi_url}" # if user prefers RapidAPI - Tell the user: "I've opened the subscription page. Subscribe to the API, then paste your API key here."
- When the user pastes the key, append it to
.env:- If the key starts with
ak_: appendOPENWEBNINJA_API_KEY={key} - Otherwise: append
RAPIDAPI_KEY={key} - Check if the line already exists in
.envfirst — replace it rather than duplicating
- If the key starts with
- Continue with the original request
Workflow
Copy this checklist and track progress:
Task Progress:
- [ ] Step 1: Understand user goal and select API(s)
- [ ] Step 2: Read API docs (README.md, recipes.md, meta.json)
- [ ] Step 3: Estimate and confirm cost (number of API requests required for each relevant API) with user
- [ ] Step 4: Ask user preferences (format, filename, count)
- [ ] Step 5: Run the script
- [ ] Step 6: Summarize results and offer follow-ups
Step 1: Understand User Goal and Select API
First, understand what the user wants to achieve. Then select the best API from the catalog below.
Each API has its own folder at apis/{api_id}/ containing:
README.md— endpoints, params, pagination, response fields (source of truth)meta.json— host, pricing notes, subscription URLsscrape.js— per-API CLI script (if available for this API)recipes.md— common use cases with exact commands (if available)
API Catalog
| API ID | What It Does | Best For |
|---|---|---|
local-business-data |
Google Maps businesses with emails, phones, social profiles | Lead gen, competitor research, local market analysis |
realtime-amazon-data |
Amazon products, details, reviews by ASIN | Product research, price tracking, review mining |
realtime-web-search |
Google organic search results with rich snippets | General research, competitor analysis, content discovery |
realtime-news-data |
News articles by keyword with source/topic/date filters | Content monitoring, trend research, brand monitoring |
jsearch |
Job listings from Google for Jobs + salary estimates | Job market research, recruitment, salary benchmarking |
job-salary-data |
Salary estimates by job title and location | Salary benchmarking (also available via jsearch /estimated-salary) |
website-contacts-scraper |
Emails, phones, social links from domains (batch up to 20) | Contact enrichment, lead enrichment from domain lists |
trustpilot-company-and-reviews |
Trustpilot company profiles and reviews (~200 max) | Reputation analysis, review mining, brand monitoring |
realtime-glassdoor-data |
Company profiles, employee reviews, salaries | Employer intelligence, comp benchmarking, due diligence |
yelp-business-data |
Yelp businesses and customer reviews | Local business reviews, reputation monitoring |
realtime-product-search |
Google Shopping cross-retailer product search | Price comparison, product discovery, deal tracking |
realtime-walmart-data |
Walmart products, details, reviews | Retail research, price comparison |
realtime-costco-data |
Costco products (US/Canada) | Retail research |
realtime-zillow-data |
Zillow properties for sale, rent, or recently sold | Real estate research, market analysis |
realtime-forums-search |
Reddit, Quora, Stack Overflow discussions | Sentiment analysis, trend research, content ideas |
realtime-events-search |
Google Events by keyword + location | Event discovery, local activity monitoring |
realtime-finance-data |
Stocks, ETFs, forex, crypto quotes + history | Finance research, market monitoring |
realtime-image-search |
Google Images with size/color/license filters | Visual research, content sourcing |
realtime-shorts-search |
YouTube Shorts, TikTok, Instagram Reels | Short-form video discovery, trend tracking |
realtime-books-data |
Google Books search | Book research, content discovery |
realtime-lens-data |
Google Lens visual search | Visual product matching, reverse image lookup |
play-store-apps |
Google Play apps, top charts | App research, market analysis |
social-links-search |
Social media profiles for any person/brand | Social profile discovery, lead enrichment |
email-search |
Email addresses by name + domain | Lead gen, contact discovery |
local-rank-tracker |
Local SEO keyword rankings + grid heatmaps | Local SEO monitoring, competitor rank tracking |
web-search-autocomplete |
Google autocomplete suggestions (bulk supported) | Keyword research, search intent discovery |
reverse-image-search |
Web pages containing a given image | Image provenance, unauthorized usage detection |
driving-directions |
Routes with distance, duration, turn-by-turn steps | Navigation, commute analysis, logistics |
ev-charge-finder |
EV charging stations by location | EV infrastructure research, trip planning |
waze |
Real-time traffic alerts and jams | Traffic monitoring, incident tracking |
web-unblocker |
Fetch any URL with JS rendering + anti-bot bypass | Web scraping, page extraction |
chatgpt |
ChatGPT conversation (POST, stateful) | Data summarization, AI enrichment |
gemini |
Google Gemini conversation (POST, stateful) | Data analysis, AI enrichment |
copilot |
Microsoft Copilot conversation (POST, stateful) | Research, AI enrichment |
ai-overviews |
Google AI Overview with cited sources | Quick research summaries |
google-ai-mode |
Google AI Mode (Gemini 2.5) structured results | AI-augmented research |
API Selection by Use Case
| Use Case | Primary APIs |
|---|---|
| Lead Generation | local-business-data (with extract_emails_and_contacts=true), website-contacts-scraper, email-search, social-links-search |
| Lead Enrichment from Domains | website-contacts-scraper, social-links-search, email-search |
| Job Market Research | jsearch, job-salary-data, realtime-glassdoor-data |
| Employer / Talent Intelligence | jsearch, realtime-glassdoor-data, job-salary-data, realtime-news-data |
| Product / Price Research | realtime-amazon-data, realtime-product-search, realtime-costco-data, realtime-walmart-data, realtime-lens-data |
| Retail Review Mining | realtime-amazon-data, realtime-walmart-data, trustpilot-company-and-reviews, yelp-business-data |
| Brand & Review Monitoring | yelp-business-data, trustpilot-company-and-reviews, realtime-glassdoor-data, realtime-news-data, realtime-forums-search |
| Competitor Analysis | realtime-web-search, social-links-search, realtime-news-data, website-contacts-scraper, realtime-glassdoor-data, trustpilot-company-and-reviews |
| Content & Trend Research | realtime-news-data, realtime-forums-search, realtime-shorts-search, realtime-image-search, realtime-books-data, web-search-autocomplete |
| Search Intent / Keyword Discovery | web-search-autocomplete, realtime-web-search, realtime-news-data, realtime-forums-search |
| Real Estate | realtime-zillow-data |
| Real Estate + Commute / Traffic Overlay | realtime-zillow-data, driving-directions, waze |
| Finance / Markets | realtime-finance-data, realtime-news-data |
| Social Profile Discovery | social-links-search, website-contacts-scraper, email-search, realtime-web-search |
| Events & Local Activity | realtime-events-search, local-business-data, waze, driving-directions |
| App Research | play-store-apps, realtime-news-data, realtime-forums-search |
| Visual / Image Search | realtime-image-search, realtime-lens-data, reverse-image-search |
| Navigation & Mobility | driving-directions, ev-charge-finder, waze |
| Traffic / Incident Monitoring | waze, driving-directions |
| Local SEO & Rank Tracking | local-rank-tracker, local-business-data, realtime-web-search |
| Reputation / Trust Analysis | trustpilot-company-and-reviews, yelp-business-data, realtime-news-data, realtime-forums-search |
| Web Scraping (any website) | web-unblocker |
| AI-Augmented Enrichment | chatgpt, gemini, copilot, google-ai-mode, ai-overviews |
Multi-API Workflows
For complex tasks, chain multiple APIs:
| Workflow | Step 1 | Step 2 |
|---|---|---|
| Domain → contacts pipeline | website-contacts-scraper /scrape-contacts → |
email-search /search |
| Contact → LinkedIn discovery | social-links-search /search → |
realtime-web-search /search |
| Review deep-dive | yelp-business-data /business-search → |
yelp-business-data /business-reviews |
| Trustpilot reputation analysis | trustpilot-company-and-reviews /company-search → |
trustpilot-company-and-reviews /company-reviews |
| Product research (multi-store) | realtime-product-search /search → |
realtime-amazon-data /product-details |
| Retail price comparison | realtime-product-search /search → |
realtime-walmart-data /product-details |
| Product + reviews dataset | realtime-amazon-data /product-details → |
realtime-amazon-data /product-reviews |
| Product intelligence report | realtime-amazon-data /product-details → |
chatgpt /chat |
| Visual product discovery | realtime-lens-data /search-by-image → |
realtime-product-search /search |
| Competitor intelligence | realtime-web-search /search → |
local-business-data /search (with extract_emails_and_contacts=true) |
| Brand monitoring pipeline | realtime-news-data /search → |
realtime-forums-search /search |
| Content trend discovery | web-search-autocomplete /autocomplete → |
realtime-web-search /search |
| News summarization | realtime-news-data /search → |
gemini /chat |
| Forum discussion analysis | realtime-forums-search /search → |
copilot /chat |
| App market research | play-store-apps /search → |
realtime-forums-search /search |
| App reputation analysis | play-store-apps /app-details → |
realtime-news-data /search |
| Job market research | jsearch /search → |
jsearch /estimated-salary |
| Employer intelligence | jsearch /search → |
realtime-glassdoor-data /company-overview |
| Local SEO rank tracking | local-rank-tracker /search → |
local-business-data /business-details |
| Local market analysis | local-business-data /search → |
yelp-business-data /business-search |
| Real estate dataset | realtime-zillow-data /search → |
driving-directions /get-directions |
| Property + traffic insights | realtime-zillow-data /search → |
waze /alerts-and-jams |
| EV trip planning | driving-directions /get-directions → |
ev-charge-finder /search-by-location |
| Event discovery | realtime-events-search /search → |
local-business-data /search |
| Image provenance discovery | reverse-image-search /search → |
realtime-web-search /search |
| Web page extraction workflow | realtime-web-search /search → |
web-unblocker /fetch |
| Knowledge-augmented research | realtime-web-search /search → |
ai-overviews /ai-overviews |
| Dataset summarization | realtime-product-search /search → |
chatgpt /chat |
Step 2: Read API Docs
CRITICAL RULE: Always read the API's README.md before making any API call. Never guess endpoints, parameters, or request structure. The README.md is the single source of truth — check it every time, including for quick tests or diagnostic calls.
Read the docs and meta for the selected API:
apis/{api_id}/README.md ← endpoints, params, response schema (source of truth)
apis/{api_id}/meta.json ← host, pricing notes, subscription URLs
apis/{api_id}/recipes.md ← common use cases with exact commands (if available)
From these files, determine:
- Which endpoint(s) to call
- Required and optional parameters
- Pagination style for the specific endpoint (
page_number,offset,cursor,none) - Any pricing multipliers or quirks
- Response field paths for the data you need
Step 3: Estimate and Confirm Cost
Before asking preferences or running anything, tell the user exactly what calls will be made:
- Which API(s) will be called and which endpoint(s)
- How many API calls are required (based on requested result count ÷ page size, plus any multi-step lookups)
- If multiple APIs are chained, break down the call count per API
Example format:
Planned API calls:
• local-business-data /search — 1 call per zip code × 50 zip codes = 50 calls
• local-business-data /business-details (extract_emails_and_contacts=true) — 1 call per business × up to 500 = 500 calls
Total: ~550 calls
Then ask: "Does that look okay? Would you like to proceed?"
Only continue to Step 4 once the user confirms.
Step 4: Ask User Preferences
Before running, ask:
- Output destination — if the user did not specify where to send the data, always present all available options:
- Chat — display top results inline (no file saved)
- Local file (JSON or CSV) — saved to
./output/ - Google Sheets — requires
GOOGLE_CLIENT_CREDENTIALS,SPREADSHEET_ID,SHEET_NAMEin.env - Webhook — HTTP POST to any URL (Zapier, Make, n8n, custom); requires
WEBHOOK_URLin.env - Airtable — requires
AIRTABLE_API_KEY,AIRTABLE_BASE_ID,AIRTABLE_TABLE_NAMEin.env - Slack — post summary + data to a channel; requires
SLACK_WEBHOOK_URLin.env - S3 — requires
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_REGION,S3_BUCKET,S3_KEYin.env - FTP — requires
FTP_HOST,FTP_USER,FTP_PASS,FTP_PATHin.env
- Number of results (default: 100)
- Output filename (default: auto-generated with timestamp) — only if saving to a file
Step 5: Run the Script
If the API has a scrape.js, use it directly:
# Full export to file
node --env-file=.env apis/{api_id}/scrape.js --query "search terms" --count 100 --format csv --output output/results.csv
# Quick answer (no file, display top results in chat)
node --env-file=.env apis/{api_id}/scrape.js --query "search terms" --dry-run
Quick answer mode: For simple lookups (e.g., "what's Nike's rating on Trustpilot?", "find me 3 coffee shops in LA"), use --dry-run. It fetches one page of results and prints them to the console without saving a file. Use this when the user just needs a quick answer, not a full data export.
Check apis/{api_id}/recipes.md for exact command examples.
Run node apis/{api_id}/scrape.js --help to see all available flags for that API.
For multi-API workflows or APIs without scrape.js, write a custom script importing from lib/utils.js:
const { getApiKey, loadMeta, apiCall, fetchAll, toCSV, writeOutput, displayQuickAnswer, sleep } = require('lib/utils');
lib/utils.js exports:
getApiKey()— readsRAPIDAPI_KEY/OPENWEBNINJA_API_KEYfrom envloadMeta(apiId)— loadsmeta.jsonapiCall(host, endpoint, params, apiKey, method, body)— single HTTP call (GET or POST)fetchAll({ host, endpoint, params, apiKey, count, pagination, ... })— paginated fetch, returns{ results, totalCallsMade }toCSV(records)— converts array of objects to CSV stringwriteOutput(records, outputPath, format, manifest)— writes file +.meta.jsondisplayQuickAnswer(records, { limit, fields })— print top N results to chat (no file)pushWebhook(records, { url, batchMode, delay })— POST to Zapier/Make/n8n/custom webhookpushAirtable(records, { apiKey, baseId, tableName })— push to Airtable tablepostSlack(message)/slackSummary(records, outputPath)— post to Slack channelpushS3(content, { bucket, key, region })— upload JSON/CSV to S3pushFTP(localFilePath, { host, user, pass, remotePath })— upload file via FTPpushGoogleSheets(records, { credentialsPath, spreadsheetId, sheetName })— write to Google Sheetssleep(ms)— promise-based delay
Step 6: Summarize Results and Offer Follow-ups
After completion, report:
- Number of results found
- File location and name
- Key fields available in the output
- Suggested follow-up workflows based on results:
| If the User Retrieved | Suggested Next Workflow |
|---|---|
| Product listings | Fetch reviews with realtime-amazon-data / realtime-walmart-data or generate insights with chatgpt |
| Reviews or feedback data | Summarize sentiment and themes with gemini, copilot, or chatgpt |
| Job listings | Enrich compensation data using jsearch /estimated-salary or company insights with realtime-glassdoor-data |
| News / forum discussions | Generate trend analysis using gemini, copilot, or ai-overviews |
| Property listings | Add commute insights using driving-directions or traffic context with waze |
| Search keyword ideas | Expand queries using web-search-autocomplete and validate with realtime-web-search |
| App listings | Analyze reputation using realtime-forums-search or summarize feedback with chatgpt |
General Usage Tips
- Lead generation: Use
local-business-datawithextract_emails_and_contacts=true. For full coverage of a region, use--gridmode with a bounding box (auto-subdivides dense areas). For city-level, use--zipsmode. The scrape.js script loadsgmb_categories.jsonandus_zipcodes.jsoninternally when needed. - Contact enrichment from domains:
website-contacts-scraper→email-search→social-links-search. - Multi-store price comparison: Chain
realtime-amazon-data+realtime-walmart-data+realtime-product-search. Note: price formats differ across APIs (string vs numeric). - AI enrichment:
chatgpt,gemini,copilotuse POST endpoints — use theirscrape.jsor write a custom script importing fromlib/utils.js. - Known limitations:
- Yelp name matching is unreliable for cross-referencing with other APIs
- Trustpilot reviews capped at ~200 without authentication
realtime-shorts-searchmay return empty results for some queries- Company name searches (Glassdoor, Trustpilot) need exact names for disambiguation — "Disney" ≠ "Walt Disney Company"
Error Handling
RAPIDAPI_KEY not found/OPENWEBNINJA_API_KEY not found— Follow the Missing API Key — Setup Instructions section aboveHTTP 401— API key invalid or expired; check subscriptionHTTP 403— Not subscribed to this API; check subscription on RapidAPI or OpenWeb Ninja dashboardHTTP 429— Rate limit hit; increase--delay(try 1000ms)No results on page 1— Check params against the API'sREADME.md; required params may be missingCost cap exceeded— Increase--max-callsor reduce--count
Output Destinations
All destinations are implemented in lib/utils.js and can be imported in any custom script:
const { pushWebhook, pushAirtable, postSlack, slackSummary, pushS3, pushFTP, pushGoogleSheets } = require('lib/utils');
| Destination | Function | Env Vars Required | npm Package |
|---|---|---|---|
| Local file | writeOutput(records, path, format) |
— | — |
| Chat (quick answer) | displayQuickAnswer(records) |
— | — |
| Webhook (Zapier/Make/n8n) | pushWebhook(records, { url, batchMode, delay }) |
WEBHOOK_URL |
— |
| Airtable | pushAirtable(records, { apiKey, baseId, tableName }) |
AIRTABLE_API_KEY, AIRTABLE_BASE_ID, AIRTABLE_TABLE_NAME |
— |
| Slack | postSlack(message) / slackSummary(records, outputPath) |
SLACK_WEBHOOK_URL |
— |
| S3 | pushS3(content, { bucket, key, region, contentType }) |
S3_BUCKET, S3_KEY, AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY |
@aws-sdk/client-s3 |
| FTP | pushFTP(localFilePath, { host, user, pass, remotePath }) |
FTP_HOST, FTP_USER, FTP_PASS, FTP_PATH |
basic-ftp |
| Google Sheets | pushGoogleSheets(records, { credentialsPath, spreadsheetId, sheetName }) |
GOOGLE_CLIENT_CREDENTIALS, SPREADSHEET_ID, SHEET_NAME |
googleapis |
Notes:
- Webhook
batchMode=true(default) sends all records in one POST as{ records: [...] }. SetbatchMode=falsefor Zapier (one POST per record). - Airtable field names must match existing column names in the table exactly.
- S3/FTP/Google Sheets require their npm package installed:
npm install @aws-sdk/client-s3 basic-ftp googleapis - Google Sheets requires a service account JSON file with the Sheets API enabled.