web-scraper
Firecrawl Web Scraper
Use this managed skill when the user wants to scrape web pages, crawl websites, discover URLs, search the web, or extract structured data from websites.
This skill uses Shift's local Skill Router. Do not ask the user to paste credentials into chat.
Invocation
Send a POST request to:
${SHIFT_LOCAL_GATEWAY}/skill-router/invoke
Scrape a single page
Extracts content from a URL as clean markdown.
{
"skillProvider": "firecrawl",
"skill": "web-scraper",
"action": "scrape",
"input": {
"url": "https://example.com"
}
}
Optional input fields:
formats: array of output formats, default["markdown"]. Options:markdown,html,links,screenshotonlyMainContent: boolean, defaulttrue. Exclude nav/footerincludeTags/excludeTags: arrays of HTML tags to include or excludewaitFor: integer, milliseconds to wait before scrapingtimeout: integer, milliseconds, default30000mobile: boolean, defaultfalse. Emulate mobile viewportlocation: object withcountry(ISO 3166-1 alpha-2) andlanguagesarray
Crawl a website (async)
Starts an async crawl job. Returns a job ID to poll with crawl_status.
{
"skillProvider": "firecrawl",
"skill": "web-scraper",
"action": "crawl",
"input": {
"url": "https://example.com",
"limit": 50
}
}
Optional input fields:
limit: max pages to crawl, default10maxDepth: crawl depthincludePaths/excludePaths: regex path filtersallowExternalLinks: boolean, defaultfalseallowSubdomains: boolean, defaultfalsescrapeOptions: object with same options as scrape (e.g.{"formats": ["markdown"]})
Check crawl status
Poll for crawl job results using the ID returned from crawl.
{
"skillProvider": "firecrawl",
"skill": "web-scraper",
"action": "crawl_status",
"input": {
"crawlId": "crawl-job-uuid"
}
}
Status values: scraping, completed, failed.
Map a website's URLs
Discovers all URLs on a site without scraping content. Use this to understand site structure before crawling.
{
"skillProvider": "firecrawl",
"skill": "web-scraper",
"action": "map",
"input": {
"url": "https://example.com"
}
}
Optional input fields:
search: order results by relevance to this querylimit: max URLs, default5000includeSubdomains: boolean, defaulttrueignoreSitemap: boolean, defaultfalse
Search the web
Search the web and optionally scrape the result pages.
{
"skillProvider": "firecrawl",
"skill": "web-scraper",
"action": "search",
"input": {
"query": "latest AI research papers"
}
}
Optional input fields:
limit: max results, 1-100, default5lang: language codecountry: country code, defaultUStbs: time filter (qdr:hhour,qdr:dday,qdr:wweek,qdr:mmonth,qdr:yyear)scrapeOptions: object to control content extraction from results
Extract structured data (async)
Extracts structured data from URLs using a prompt and/or JSON schema. Returns a job ID to poll with extract_status.
{
"skillProvider": "firecrawl",
"skill": "web-scraper",
"action": "extract",
"input": {
"urls": ["https://example.com/pricing"],
"prompt": "Extract all pricing plans with name, price, and features"
}
}
Optional input fields:
schema: JSON Schema defining expected output structureenableWebSearch: boolean, defaultfalse
Check extract status
Poll for extract job results.
{
"skillProvider": "firecrawl",
"skill": "web-scraper",
"action": "extract_status",
"input": {
"extractId": "extract-job-uuid"
}
}
Authentication
This skill requires a Firecrawl API key configured in Shift. Get one at https://www.firecrawl.dev.
Do not ask the user to paste raw credentials into the conversation. Shift handles authentication automatically when the required connection is configured.
Agent behavior
- Prefer
scrapefor single pages. Usecrawlonly when multiple pages are needed. - Use
mapfirst to discover site structure before starting a large crawl. - For async actions (
crawl,extract), poll status every few seconds untilcompletedorfailed. - Default to
markdownformat unless the user specifically needs HTML or screenshots. - When scraping fails, suggest the user check the URL or try with
waitForfor JavaScript-heavy pages.