flyscrape
Flyscrape
Flyscrape is a command-line web scraping tool that uses JavaScript scraping scripts. It's standalone (single binary), supports jQuery-like selectors, and can render JavaScript-heavy pages via headless browser.
Quick Reference
flyscrape new script.js # Create new script from template
flyscrape dev script.js # Dev mode: watch & re-run on changes (cached)
flyscrape run script.js # Run the scraper
flyscrape run script.js --url "http://example.com" --depth 3 # Override config via CLI
Script Structure
Every script has two parts: config (controls behavior) and default function (extracts data).
export const config = {
url: "https://example.com",
// See references/config.md for all options
};
export default function({ doc, url, absoluteURL, scrape, follow }) {
// doc - parsed HTML document with jQuery-like API
// url - the current page URL
// absoluteURL(path) - converts relative URLs to absolute
// scrape(url, fn) - nested scraping of linked pages
// follow(url) - manually follow a link (use with follow: [])
return {
title: doc.find("h1").text(),
// Return object becomes JSON output
};
}
Essential Config Options
| Option | Default | Description |
|---|---|---|
url |
- | Starting URL |
urls |
[] |
Multiple starting URLs |
depth |
0 |
How deep to follow links (0 = no following) |
follow |
["a[href]"] |
CSS selectors for links to follow |
browser |
false |
Enable headless Chromium for JS-heavy sites |
cache |
- | Set to "file" to cache requests |
rate |
- | Requests per minute limit |
concurrency |
- | Max concurrent requests |
See references/config.md for complete configuration reference.
Query API (jQuery-like)
const el = doc.find(".selector"); // Find element(s)
el.text() // Get text content
el.html() // Get inner HTML
el.attr("href") // Get attribute
el.hasAttr("data-id") // Check attribute exists
el.hasClass("active") // Check class exists
// Collections
const items = doc.find("li");
items.length() // Count
items.first() / items.last() // First/last element
items.get(0) // Element by index
items.map(el => el.text()) // Map to array
items.filter(el => el.hasClass("x")) // Filter elements
// Traversal
el.parent() // Parent element
el.children() // Direct children
el.siblings() // Sibling elements
el.prev() / el.next() // Adjacent siblings
el.prevAll() / el.nextAll() // All prev/next siblings
el.prevUntil("selector") // Siblings until selector
See references/query-api.md for full API reference.
Common Patterns
Follow Pagination
export const config = {
url: "https://example.com/posts",
depth: 10,
follow: [".pagination a.next"],
};
Scrape with Browser Mode (JS-heavy sites)
export const config = {
url: "https://spa-site.com",
browser: true,
headless: true,
};
Nested Scraping (detail pages)
export default function({ doc, scrape, absoluteURL }) {
const links = doc.find(".product-link");
return {
products: links.map(link => {
const detailUrl = absoluteURL(link.attr("href"));
return scrape(detailUrl, ({ doc }) => ({
name: doc.find("h1").text(),
price: doc.find(".price").text(),
}));
}),
};
}
Download Files
import { download } from "flyscrape/http";
export default function({ doc, absoluteURL }) {
doc.find("img").each(img => {
download(absoluteURL(img.attr("src")), "images/");
});
return { downloaded: true };
}
Rate Limiting & Caching (be polite)
export const config = {
url: "https://example.com",
rate: 30, // 30 requests/minute
concurrency: 2, // Max 2 concurrent
cache: "file", // Cache to scriptname.cache
};
Workflow
- Create:
flyscrape new myscript.js - Develop:
flyscrape dev myscript.js- iterates with cached responses - Run:
flyscrape run myscript.js- full execution - Output:
flyscrape run myscript.js --output.file results.json
Troubleshooting Quick Tips
| Problem | Solution |
|---|---|
| Getting blocked (403) | Add User-Agent header, reduce rate, use browser: true |
| Empty results | Check if site needs browser mode, verify selectors |
| Links not followed | Set depth > 0, check follow selectors |
| Slow performance | Increase concurrency, enable cache: "file" |
See references/troubleshooting.md for detailed solutions.
Reference Files
references/config.md- Complete configuration optionsreferences/query-api.md- Full Query API documentationreferences/recipes.md- Common patterns and code snippetsreferences/troubleshooting.md- Problem solving guideexamples/- Ready-to-use example scripts
External Resources
- Documentation: https://flyscrape.com/docs/getting-started/
- GitHub: https://github.com/philippta/flyscrape
- Examples: https://github.com/philippta/flyscrape/tree/master/examples
More from aaronflorey/agent-skills
amber-lang
Write, debug, and explain Amber code, the `amber` language that compiles `.ab` files to Bash. Use this skill when the user asks to write an Amber script, convert Bash to Amber, compile Amber to Bash, debug Amber syntax or type errors, or asks about Amber 0.5.1-alpha syntax, functions, types, error handling, the standard library, or the `amber` CLI.
26go-cobra
Write, scaffold, and debug Go CLI applications with `github.com/spf13/cobra`. Use this skill whenever the user mentions Cobra, `cobra.Command`, a Go command-line app, subcommands, persistent or local flags, required flags, argument validation, shell completions, generated docs, or wants to build or refactor a cobra-based CLI.
24laravel-actions
Write, scaffold, explain, and refactor code using the `lorisleiva/laravel-actions` package. Use this skill whenever the user mentions Laravel Actions, `AsAction`, `php artisan make:action`, action classes, converting a controller, job, listener, or command into an action, dispatching an action as a job, using an action as a controller or listener, or adding validation, authorization, testing, or mocking around an action.
24num30-config
Write, debug, and explain Go configuration code using `github.com/num30/config`. Use this skill when the user mentions `num30/config`, wants config structs, file plus env plus CLI flag loading, validation, config watching, precedence rules, or asks how to integrate the num30/config package into a Go application.
22pelican-panel-plugins
Write, scaffold, explain, and debug plugins for the Pelican gaming panel. Use this skill whenever the user mentions Pelican plugins, extending Pelican, FilamentPHP resources or pages for Pelican, plugin service providers, custom permissions, plugin settings, routes, models, widgets, or asks how to add new functionality to the Pelican panel.
21go-viper
Write, debug, and explain Go configuration code with `github.com/spf13/viper`. Use this skill whenever the user mentions Viper, `viper`, config structs, reading config from files plus env vars plus flags, Cobra or `pflag` integration, unmarshaling into structs, env key replacers, config precedence, config watching, or a clean Viper bootstrap.
20