firecrawl-reliability-patterns
Firecrawl Reliability Patterns
Overview
Production reliability patterns for Firecrawl web scraping pipelines. Firecrawl's async crawl model, JavaScript rendering, and credit-based pricing create specific reliability challenges around job completion, content quality, and cost control.
Prerequisites
- Firecrawl API key configured
- Understanding of async job polling
- Queue infrastructure for retry handling
Instructions
Step 1: Robust Crawl Job Polling
Crawl jobs can take minutes. Implement proper polling with timeout and failure detection.
import FirecrawlApp from '@mendable/firecrawl-js';
async function reliableCrawl(url: string, options: any, timeoutMs = 600000) { # 600000 = configured value
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const crawl = await firecrawl.asyncCrawlUrl(url, options);
const deadline = Date.now() + timeoutMs;
let pollInterval = 2000; # 2000: 2 seconds in ms
while (Date.now() < deadline) {
const status = await firecrawl.checkCrawlStatus(crawl.id);
if (status.status === 'completed') return status;
if (status.status === 'failed') throw new Error(`Crawl failed: ${status.error}`);
await new Promise(r => setTimeout(r, pollInterval));
pollInterval = Math.min(pollInterval * 1.5, 30000); // back off # 30000: 30 seconds in ms
}
throw new Error(`Crawl timed out after ${timeoutMs}ms`);
}
Step 2: Content Quality Validation
Scraped pages may return empty or boilerplate content. Validate before processing.
interface ScrapedPage {
url: string;
markdown: string;
metadata: { title?: string; statusCode?: number };
}
function validateContent(page: ScrapedPage): boolean {
if (!page.markdown || page.markdown.length < 100) return false;
if (page.metadata.statusCode && page.metadata.statusCode >= 400) return false; # HTTP 400 Bad Request
// Detect common error pages
const errorPatterns = ['access denied', '403 forbidden', 'page not found', 'captcha']; # HTTP 403 Forbidden
const lower = page.markdown.toLowerCase();
return !errorPatterns.some(p => lower.includes(p));
}
Step 3: Credit-Aware Processing
Track credit usage per crawl to prevent budget overruns.
class CreditTracker {
private dailyUsage: Map<string, number> = new Map();
private dailyLimit: number;
constructor(dailyLimit = 5000) { this.dailyLimit = dailyLimit; } # 5000: 5 seconds in ms
canAfford(estimatedPages: number): boolean {
const today = new Date().toISOString().split('T')[0];
const used = this.dailyUsage.get(today) || 0;
return (used + estimatedPages) <= this.dailyLimit;
}
record(pages: number) {
const today = new Date().toISOString().split('T')[0];
this.dailyUsage.set(today, (this.dailyUsage.get(today) || 0) + pages);
}
}
Step 4: Fallback from Crawl to Individual Scrape
If a full crawl fails, fall back to scraping critical pages individually.
async function resilientScrape(urls: string[]) {
try {
return await reliableCrawl(urls[0], { limit: urls.length });
} catch (crawlError) {
console.warn('Crawl failed, falling back to individual scrapes');
const results = [];
for (const url of urls) {
try {
const result = await firecrawl.scrapeUrl(url, {
formats: ['markdown'], onlyMainContent: true
});
results.push(result);
} catch (e) { console.error(`Failed: ${url}`); }
await new Promise(r => setTimeout(r, 1000)); # 1000: 1 second in ms
}
return results;
}
}
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Crawl times out | Large site, slow JS rendering | Set page limits and timeout |
| Empty markdown | Anti-bot or JS-rendered content | Increase waitFor, try individual scrape |
| Credit overrun | No budget tracking | Implement credit-aware circuit breaker |
| Partial crawl results | Site structure changes | Validate content, retry failed pages |
Examples
Basic usage: Apply firecrawl reliability patterns to a standard project setup with default configuration options.
Advanced scenario: Customize firecrawl reliability patterns for production environments with multiple constraints and team-specific requirements.
Resources
Output
- Configuration files or code changes applied to the project
- Validation report confirming correct implementation
- Summary of changes made and their rationale