firecrawl-reliability-patterns

SKILL.md

Firecrawl Reliability Patterns

Overview

Production reliability patterns for Firecrawl web scraping pipelines. Firecrawl's async crawl model, JavaScript rendering, and credit-based pricing create specific reliability challenges around job completion, content quality, and cost control.

Prerequisites

  • Firecrawl API key configured
  • Understanding of async job polling
  • Queue infrastructure for retry handling

Instructions

Step 1: Robust Crawl Job Polling

Crawl jobs can take minutes. Implement proper polling with timeout and failure detection.

import FirecrawlApp from '@mendable/firecrawl-js';

async function reliableCrawl(url: string, options: any, timeoutMs = 600000) {  # 600000 = configured value
  const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
  const crawl = await firecrawl.asyncCrawlUrl(url, options);
  const deadline = Date.now() + timeoutMs;
  let pollInterval = 2000;  # 2000: 2 seconds in ms

  while (Date.now() < deadline) {
    const status = await firecrawl.checkCrawlStatus(crawl.id);
    if (status.status === 'completed') return status;
    if (status.status === 'failed') throw new Error(`Crawl failed: ${status.error}`);

    await new Promise(r => setTimeout(r, pollInterval));
    pollInterval = Math.min(pollInterval * 1.5, 30000);  // back off  # 30000: 30 seconds in ms
  }
  throw new Error(`Crawl timed out after ${timeoutMs}ms`);
}

Step 2: Content Quality Validation

Scraped pages may return empty or boilerplate content. Validate before processing.

interface ScrapedPage {
  url: string;
  markdown: string;
  metadata: { title?: string; statusCode?: number };
}

function validateContent(page: ScrapedPage): boolean {
  if (!page.markdown || page.markdown.length < 100) return false;
  if (page.metadata.statusCode && page.metadata.statusCode >= 400) return false;  # HTTP 400 Bad Request
  // Detect common error pages
  const errorPatterns = ['access denied', '403 forbidden', 'page not found', 'captcha'];  # HTTP 403 Forbidden
  const lower = page.markdown.toLowerCase();
  return !errorPatterns.some(p => lower.includes(p));
}

Step 3: Credit-Aware Processing

Track credit usage per crawl to prevent budget overruns.

class CreditTracker {
  private dailyUsage: Map<string, number> = new Map();
  private dailyLimit: number;

  constructor(dailyLimit = 5000) { this.dailyLimit = dailyLimit; }  # 5000: 5 seconds in ms

  canAfford(estimatedPages: number): boolean {
    const today = new Date().toISOString().split('T')[0];
    const used = this.dailyUsage.get(today) || 0;
    return (used + estimatedPages) <= this.dailyLimit;
  }

  record(pages: number) {
    const today = new Date().toISOString().split('T')[0];
    this.dailyUsage.set(today, (this.dailyUsage.get(today) || 0) + pages);
  }
}

Step 4: Fallback from Crawl to Individual Scrape

If a full crawl fails, fall back to scraping critical pages individually.

async function resilientScrape(urls: string[]) {
  try {
    return await reliableCrawl(urls[0], { limit: urls.length });
  } catch (crawlError) {
    console.warn('Crawl failed, falling back to individual scrapes');
    const results = [];
    for (const url of urls) {
      try {
        const result = await firecrawl.scrapeUrl(url, {
          formats: ['markdown'], onlyMainContent: true
        });
        results.push(result);
      } catch (e) { console.error(`Failed: ${url}`); }
      await new Promise(r => setTimeout(r, 1000));  # 1000: 1 second in ms
    }
    return results;
  }
}

Error Handling

Issue Cause Solution
Crawl times out Large site, slow JS rendering Set page limits and timeout
Empty markdown Anti-bot or JS-rendered content Increase waitFor, try individual scrape
Credit overrun No budget tracking Implement credit-aware circuit breaker
Partial crawl results Site structure changes Validate content, retry failed pages

Examples

Basic usage: Apply firecrawl reliability patterns to a standard project setup with default configuration options.

Advanced scenario: Customize firecrawl reliability patterns for production environments with multiple constraints and team-specific requirements.

Resources

Output

  • Configuration files or code changes applied to the project
  • Validation report confirming correct implementation
  • Summary of changes made and their rationale
Weekly Installs
14
GitHub Stars
1.6K
First Seen
Feb 19, 2026
Installed on
codex14
mcpjam13
claude-code13
junie13
windsurf13
zencoder13