crawlio-mcp

Installation

SKILL.md

Crawlio MCP Server

Crawlio MCP exposes 37 tools (full mode) or 6 tools (code mode) over stdio transport. The server connects to Crawlio.app's ControlServer for live operations and reads local state files for offline access.

Modes

Code Mode (default)

6 tools: search_api, execute_api, trigger_capture, extract_text_from_image, analyze_page, compare_pages. Use search_api to discover endpoints, then execute_api to call them. extract_text_from_image runs Vision OCR locally (no app required). Lower tool count, better for context-constrained clients.

Full Mode (`--full`)

35 individual tools with typed parameters and annotations. Better for clients that can handle many tools.

Full Mode Tools (37)

Status & Monitoring (6)

get_crawl_status — Engine state + progress counters.

since (int, opt): Sequence number for change detection.

get_crawl_logs — Recent log entries with filtering.

category (string, opt): engine | download | parser | localizer | network | ui
level (string, opt): debug | info | default | error | fault
limit (int, opt): Max entries (default 100).

get_errors — Error/fault-level logs only. No params.

get_downloads — All download items with status, HTTP code, bytes, timing. No params.

get_failed_urls — Failed items with URL + error. No params.

get_site_tree — File paths as directory tree. No params.

Control (4)

start_crawl — Start a new crawl.

url (string, opt): Single URL.
urls (string[], opt): Multi-seed URLs.
destinationPath (string, opt): Save directory.

stop_crawl — Stop crawl, cancel downloads, clear queue. No params.

pause_crawl — Pause (in-progress downloads complete). No params.

resume_crawl — Resume paused crawl. No params.

Settings & Configuration (3)

get_settings — Current pending settings + policy. No params.

update_settings — Partial merge (idle only).

settings (object, opt): maxConcurrent, crawlDelay, timeout, downloadImages, downloadVideo, downloadFonts, downloadScripts, downloadStyles, userAgent, maxRetries, stripTrackingParams, customCookies, customHeaders, preferHTTP2 (bool), proxyConfiguration ({type: "http"/"https"/"socks5", host, port, username?, password?, noProxyHosts?}).
policy (object, opt): scopeMode, maxDepth, maxPagesPerCrawl, respectRobotsTxt, excludePatterns, includePatterns, includeSupportingFiles, downloadCrossDomainAssets, autoUpgradeHTTP, pinnedPublicKeys ({hostname: [sha256HexStrings]}).

recrawl_urls — Re-crawl specific URLs.

urls (string[], required).

Projects (5)

list_projects — All saved projects. No params.

save_project — Save current project.

name (string, opt).

load_project — Load project by ID.

id (string, required).

delete_project — Delete project by ID.

id (string, required).

get_project — Full project details.

id (string, required).

Export & Extraction (5)

export_site — Export downloaded site.

format (string, required): folder | zip | singleHTML | warc
destinationPath (string, required).
warcConfiguration (object, opt): compressionEnabled (bool, default true), maxFileSize (int, default 1GB, 0=no split), cdxEnabled (bool, default true), dedupEnabled (bool, default true).

get_export_status — Export state + progress. No params.

extract_site — Run RSC extraction pipeline.

destinationPath (string, opt).

get_extraction_status — Extraction state + progress. No params.

trigger_capture — WebKit runtime capture (framework detection, network, console, DOM).

url (string, required).

OCR (1)

extract_text_from_image — Extract text from a local image using Vision OCR. No Crawlio.app required.

path (string, required): Absolute file path to image.
languages (string[], opt): Recognition languages (e.g. ["en-US"]).
recognitionLevel (string, opt): accurate (default) or fast.

Enrichment (6)

get_enrichment — Browser enrichment data.

url (string, opt): Filter by URL.

submit_enrichment_bundle — Complete enrichment bundle.

url (string, required).
framework (object, opt), networkRequests (array, opt), consoleLogs (array, opt), domSnapshotJSON (string, opt).

submit_enrichment_framework — Framework detection.

url (string, required), framework (object, required).

submit_enrichment_network — Network requests.

url (string, required), networkRequests (array, required).

submit_enrichment_console — Console logs.

url (string, required), consoleLogs (array, required).

submit_enrichment_dom — DOM snapshot.

url (string, required), domSnapshotJSON (string, required).

Observations & Findings (5)

get_observations — Append-only observation timeline.

host (string, opt), op (string, opt), source (string, opt), since (number, opt), limit (int, opt).

get_observation — Look up a single observation or finding by ID.

id (string, required): Observation ID (obs_xxx or fnd_xxx). Use to verify evidence chains.

create_finding — Create curated finding with evidence.

title (string, required), url (string, opt), evidence (string[], opt), synthesis (string, opt), confidence (string, opt: high/medium/low/none), category (string, opt).

get_findings — List curated findings.

host (string, opt), limit (int, opt).

get_crawled_urls — Downloaded URLs with pagination.

status (string, opt), type (string, opt), limit (int, opt), offset (int, opt).

Code Mode Tools (6)

search_api — Search available endpoints by keyword.

search_api(query: "enrichment", limit: 10)

execute_api — Execute HTTP request against ControlServer.

execute_api(method: "GET", path: "/status")
execute_api(method: "POST", path: "/start", body: {"url": "https://example.com"})
execute_api(method: "PATCH", path: "/settings", body: {"policy": {"maxDepth": 2}})
execute_api(method: "GET", path: "/crawled-urls?status=completed&limit=50")

trigger_capture — WebKit runtime capture (same as full mode).

trigger_capture(url: "https://example.com")

extract_text_from_image — Vision OCR on local image (same as full mode).

extract_text_from_image(path: "/path/to/image.png")
extract_text_from_image(path: "/path/to/image.jpg", languages: ["en-US"], recognitionLevel: "fast")

analyze_page — Composite analysis of a single page (capture + enrich + crawl status). Returns evidenceId, evidenceQuality, gaps.

analyze_page(url: "https://example.com")

compare_pages — Compare two pages side-by-side (runs analyze_page on each). Returns comparisonReadiness, symmetric, degradationNotes, timingDelta.

compare_pages(urlA: "https://example.com", urlB: "https://competitor.com")

HTTP-Only Endpoints (3)

Accessible via execute_api but not as MCP tools:

GET /health — Server health, version, uptime, PID.
GET /debug/metrics — Engine metrics: connections, queue depth, memory.
POST /debug/dump-state — Full engine state dump.

Resources (4)

URI	Description
`crawlio://status`	Engine state and progress
`crawlio://settings`	Current crawl settings
`crawlio://site-tree`	Downloaded file tree
`crawlio://enrichment`	All browser enrichment data

Template (1)

crawlio://enrichment/{url} — Per-URL enrichment data.

Prompts (4)

Prompt	Arguments	Description
`crawl-and-analyze`	url (req), maxDepth (opt)	Crawl + analyze results
`export-site`	url (req), format (req), destination (opt)	Crawl + export
`compare-sites`	url1 (req), url2 (req)	Compare two sites
`fix-failed-urls`	none	Diagnose + retry failures

Common Workflows

Crawl → Wait → Export

update_settings — Configure depth, scope, asset options.
start_crawl — Begin crawl.
get_crawl_status — Poll until engineState is completed. Use since param for efficient polling.
export_site — Export as zip/folder/singleHTML/warc.
get_export_status — Confirm export finished.

WARC Export with Options

update_settings — Configure proxy/pinning if needed: {settings: {proxyConfiguration: {type: "http", host: "proxy.corp", port: 8080}}}.
start_crawl — Crawl the target site.
get_crawl_status — Poll until completed.
export_site — Export with WARC options: {format: "warc", destinationPath: "/tmp/archive.warc.gz", warcConfiguration: {compressionEnabled: true, cdxEnabled: true, dedupEnabled: true, maxFileSize: 0}}.
Validate: CDX sidecar created, revisit records for dedup, GZIP compression.

Enrichment Pipeline

trigger_capture(url) — Run WebKit capture.
get_enrichment(url) — Read framework detection, network, console, DOM.
create_finding — Record insights with evidence.

Error Recovery

get_failed_urls — List failures.
recrawl_urls — Retry failed URLs.
get_crawl_status — Poll until re-crawl completes.
get_failed_urls — Check remaining failures.

Status Polling Pattern

1. status = get_crawl_status()
2. seq = status.seq
3. Loop:
   status = get_crawl_status(since: seq)
   if status != "no changes": update seq, check engineState
   sleep 5s

Related skills

More from crawlio-app/crawlio-plugin

Installs

Repository

crawlio-app/cra…o-plugin

First Seen

Mar 10, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykWarn

crawlio-mcp

Crawlio MCP Server

Modes

Code Mode (default)

Full Mode (`--full`)

Full Mode Tools (37)

Status & Monitoring (6)

Control (4)

Settings & Configuration (3)

Projects (5)

Export & Extraction (5)

OCR (1)

Enrichment (6)

Observations & Findings (5)

Code Mode Tools (6)

HTTP-Only Endpoints (3)

Resources (4)

Template (1)

Prompts (4)

Common Workflows

Crawl → Wait → Export

WARC Export with Options

Enrichment Pipeline

Error Recovery

Status Polling Pattern

More from crawlio-app/crawlio-plugin

crawl-site

audit-site

observe

finding

web-research

extract-and-export

crawlio-mcp

Crawlio MCP Server

Modes

Code Mode (default)

Full Mode (--full)

Full Mode Tools (37)

Status & Monitoring (6)

Control (4)

Settings & Configuration (3)

Projects (5)

Export & Extraction (5)

OCR (1)

Enrichment (6)

Observations & Findings (5)

Code Mode Tools (6)

HTTP-Only Endpoints (3)

Resources (4)

Template (1)

Prompts (4)

Common Workflows

Crawl → Wait → Export

WARC Export with Options

Enrichment Pipeline

Error Recovery

Status Polling Pattern

More from crawlio-app/crawlio-plugin

crawl-site

audit-site

observe

finding

web-research

extract-and-export

Full Mode (`--full`)