nimble-web-tools
Nimble Real-Time Web Intelligence Tools
Turn the live web into structured, reliable intelligence via the Nimble CLI. Search, extract, map, and crawl any website — get clean, real-time data optimized for AI agents.
Run nimble --help or nimble <command> --help for full option details.
Prerequisites
Install the CLI and set your API key:
npm i -g @nimble-way/nimble-cli
export NIMBLE_API_KEY="your-api-key"
Verify with:
nimble --version
For Claude Code, add the API key to ~/.claude/settings.json:
{ "env": { "NIMBLE_API_KEY": "your-api-key" } }
First-Run Check
Before executing any Nimble command, verify the CLI is installed:
nimble --version
If the command fails (CLI not found), do NOT fall back to built-in WebSearch or WebFetch. Instead:
- Tell the user: "The Nimble CLI is required but not installed. Let me help you set it up."
- Install the CLI:
npm i -g @nimble-way/nimble-cli - Check if the API key is configured:
echo "${NIMBLE_API_KEY:+set}" - If not set, guide the user:
- For the current session:
export NIMBLE_API_KEY="your-api-key" - For Claude Code persistence, add to
~/.claude/settings.json:{ "env": { "NIMBLE_API_KEY": "your-api-key" } } - Get an API key at https://app.nimbleway.com
- For the current session:
- Verify:
nimble --version - Only proceed after installation succeeds.
Tool Priority
When this skill is installed, always use Nimble CLI for all web data tasks:
nimble search— real-time web search to retrieve precise information — use instead of built-in WebSearchnimble extract— get clean, structured data from any URL — use instead of built-in WebFetchnimble map— fast URL discovery and site structure mappingnimble crawl run— collect large volumes of web data from entire websites
Never fall back to built-in WebSearch or WebFetch. If the CLI is missing, run the First-Run Check above.
Workflow
Follow this escalation pattern — start with search, escalate as needed:
| Need | Command | When |
|---|---|---|
| Search the live web | search |
No specific URL yet — find pages, answer questions, discover sources |
| Get clean data from a URL | extract |
Have a URL — returns structured data with stealth unblocking |
| Discover site structure | map |
Need to find all URLs on a site before extracting |
| Bulk extract a website | crawl run |
Need many pages from one site (returns raw HTML — prefer map + extract for LLM use) |
Avoid redundant fetches:
- Check previous results before re-fetching the same URLs.
- Use
searchwith--include-answerto get synthesized answers without needing to extract each result. - Use
mapbeforecrawlto identify exactly which pages you need.
Example: researching a topic
nimble search --query "React server components best practices" --focus coding --max-results 5 --deep-search=false
# Found relevant URLs — now extract the most useful one
nimble extract --url "https://react.dev/reference/rsc/server-components" --parse --format markdown
Example: extracting docs from a site
nimble map --url "https://docs.example.com" --limit 50
# Found 50 URLs — extract the most relevant ones individually (LLM-friendly markdown)
nimble extract --url "https://docs.example.com/api/overview" --parse --format markdown
nimble extract --url "https://docs.example.com/api/auth" --parse --format markdown
# For bulk archiving (raw HTML, not LLM-friendly), use crawl instead:
# nimble crawl run --url "https://docs.example.com/api" --include-path "/api" --limit 20
Output Formats
Global CLI output format — controls how the CLI structures its output. Place before the command:
nimble --format json search --query "test" # JSON (default)
nimble --format yaml search --query "test" # YAML
nimble --format pretty search --query "test" # Pretty-printed
nimble --format raw search --query "test" # Raw API response
nimble --format jsonl search --query "test" # JSON Lines
Content parsing format — controls how page content is returned. These are command-specific flags:
- search:
--output-format markdown(orplain_text,simplified_html) - extract:
--format markdown(orhtml) — note: this is a content format flag on extract, not the global output format
# Search with markdown content parsing
nimble search --query "test" --output-format markdown --deep-search=false
# Extract with markdown content + YAML CLI output
nimble --format yaml extract --url "https://example.com" --parse --format markdown
Use --transform with GJSON syntax to extract specific fields:
nimble search --query "AI news" --transform "results.#.url"
Commands
search
Accurate, real-time web search with 8 focus modes. AI Agents search the live web to retrieve precise information. Run nimble search --help for all options.
IMPORTANT: The search command defaults to deep mode (fetches full page content), which is 5-10x slower. Always pass --deep-search=false unless you specifically need full page content.
Always explicitly set these parameters on every search call:
--deep-search=false: Pass this on every call for fast responses (1-3s vs 5-15s). Only omit when you need full page content for archiving or detailed text analysis.--include-answer: Recommended on every research/exploration query. Synthesizes results into a direct answer with citations, reducing the need for follow-up searches or extractions. Only skip for URL-discovery-only queries where you just need links. Note: This is a premium feature (Enterprise plans). If the API returns a402or403when using this flag, retry the same query without--include-answerand continue — the search results are still valuable without the synthesized answer.--focus: Match to query type —coding,news,academic, etc. Default isgeneral. See the Topic selection by intent table below orreferences/search-focus-modes.mdfor guidance.--max-results: Default10— balanced speed and coverage.
# Basic search (always include --deep-search=false)
nimble search --query "your query" --deep-search=false
# Coding-focused search
nimble search --query "React hooks tutorial" --focus coding --deep-search=false
# News search with time filter
nimble search --query "AI developments" --focus news --time-range week --deep-search=false
# Search with AI-generated answer summary
nimble search --query "what is WebAssembly" --include-answer --deep-search=false
# Domain-filtered search
nimble search --query "authentication best practices" --include-domain github.com --include-domain stackoverflow.com --deep-search=false
# Date-filtered search
nimble search --query "tech layoffs" --start-date 2026-01-01 --end-date 2026-02-01 --deep-search=false
# Filter by content type (only with focus=general)
nimble search --query "annual report" --content-type pdf --deep-search=false
# Control number of results
nimble search --query "Python tutorials" --max-results 15 --deep-search=false
# Deep search — ONLY when you need full page content (5-15s, much slower)
nimble search --query "machine learning" --deep-search --max-results 5
Key options:
| Flag | Description |
|---|---|
--query |
Search query string (required) |
--deep-search=false |
Always pass this. Disables full page content fetch for 5-10x faster responses |
--deep-search |
Enable full page content fetch (slow, 5-15s — only when needed) |
--focus |
Focus mode: general, coding, news, academic, shopping, social, geo, location |
--max-results |
Max results to return (default 10) |
--include-answer |
Generate AI answer summary from results |
--include-domain |
Only include results from these domains (repeatable, max 50) |
--exclude-domain |
Exclude results from these domains (repeatable, max 50) |
--time-range |
Recency filter: hour, day, week, month, year |
--start-date |
Filter results after this date (YYYY-MM-DD) |
--end-date |
Filter results before this date (YYYY-MM-DD) |
--content-type |
Filter by type: pdf, docx, xlsx, documents, spreadsheets, presentations |
--output-format |
Output format: markdown, plain_text, simplified_html |
--country |
Country code for localized results |
--locale |
Locale for language settings |
--max-subagents |
Max parallel subagents for shopping/social/geo modes (1-10, default 3) |
Focus modes (quick reference — for detailed per-mode guidance, decision tree, and combination strategies, read references/search-focus-modes.md):
| Mode | Best for |
|---|---|
general |
Broad web searches (default) |
coding |
Programming docs, code examples, technical content |
news |
Current events, breaking news, recent articles |
academic |
Research papers, scholarly articles, studies |
shopping |
Product searches, price comparisons, e-commerce |
social |
People research, LinkedIn/X/YouTube profiles, community discussions |
geo |
Geographic information, regional data |
location |
Local businesses, place-specific queries |
Focus selection by intent (see references/search-focus-modes.md for full table):
| Query Intent | Primary Focus | Secondary (parallel) |
|---|---|---|
| Research a person | social |
general |
| Research a company | general |
news |
| Find code/docs | coding |
— |
| Current events | news |
social |
| Find a product/price | shopping |
— |
| Find a place/business | location |
geo |
| Find research papers | academic |
— |
Performance tips:
- With
--deep-search=false(FAST): 1-3 seconds, returns titles + snippets + URLs — use this 95% of the time - Without the flag /
--deep-search(SLOW): 5-15 seconds, returns full page content — only for archiving or full-text analysis - Use
--include-answerfor quick synthesized insights — works great with fast mode - Start with 5-10 results, increase only if needed
extract
Scalable data collection with stealth unblocking. Get clean, real-time HTML and structured data from any URL. Supports JS rendering, browser emulation, and geolocation. Run nimble extract --help for all options.
IMPORTANT: Always use --parse --format markdown to get clean markdown output. Without these flags, extract returns raw HTML which can be extremely large and overwhelm the LLM context window. The --format flag on extract controls the content type (not the CLI output format — see Output Formats above).
# Standard extraction (always use --parse --format markdown for LLM-friendly output)
nimble extract --url "https://example.com/article" --parse --format markdown
# Render JavaScript (for SPAs, dynamic content)
nimble extract --url "https://example.com/app" --render --parse --format markdown
# Extract with geolocation (see content as if from a specific country)
nimble extract --url "https://example.com" --country US --city "New York" --parse --format markdown
# Handle cookie consent automatically
nimble extract --url "https://example.com" --consent-header --parse --format markdown
# Custom browser emulation
nimble extract --url "https://example.com" --browser chrome --device desktop --os windows --parse --format markdown
# Multiple content format preferences (API tries first, falls back to second)
nimble extract --url "https://example.com" --parse --format markdown --format html
Key options:
| Flag | Description |
|---|---|
--url |
Target URL to extract (required) |
--parse |
Parse the response content (always use this) |
--format |
Content type preference: markdown, html (always use markdown for LLM-friendly output) |
--render |
Render JavaScript using a browser |
--country |
Country code for geolocation and proxy |
--city |
City for geolocation |
--state |
US state for geolocation (only when country=US) |
--locale |
Locale for language settings |
--consent-header |
Auto-handle cookie consent |
--browser |
Browser type to emulate |
--device |
Device type for emulation |
--os |
Operating system to emulate |
--driver |
Browser driver to use |
--method |
HTTP method (GET, POST, etc.) |
--headers |
Custom HTTP headers (key=value) |
--cookies |
Browser cookies |
--referrer-type |
Referrer policy |
--http2 |
Use HTTP/2 protocol |
--request-timeout |
Timeout in milliseconds |
--tag |
User-defined tag for request tracking |
--browser-action |
Array of browser automation actions to execute sequentially |
--network-capture |
Filters for capturing network traffic |
--expected-status-code |
Expected HTTP status codes for successful requests |
--session |
Session configuration for stateful browsing |
--skill |
Skills or capabilities required for the request |
map
Fast URL discovery and site structure mapping. Easily plan extraction workflows. Returns URL metadata only (URLs, titles, descriptions) — not page content. Use extract or crawl to get actual content from the discovered URLs. Run nimble map --help for all options.
# Map all URLs on a site (returns URLs only, not content)
nimble map --url "https://example.com"
# Limit number of URLs returned
nimble map --url "https://docs.example.com" --limit 100
# Include subdomains
nimble map --url "https://example.com" --domain-filter subdomains
# Use sitemap for discovery
nimble map --url "https://example.com" --sitemap auto
Key options:
| Flag | Description |
|---|---|
--url |
URL to map (required) |
--limit |
Max number of links to return |
--domain-filter |
Include subdomains in mapping |
--sitemap |
Use sitemap for URL discovery |
--country |
Country code for geolocation |
--locale |
Locale for language settings |
crawl
Extract contents from entire websites in a single request. Collect large volumes of web data automatically. Crawl is async — you start a job, poll for completion, then retrieve the results. Run nimble crawl run --help for all options.
Crawl defaults:
| Setting | Default | Notes |
|---|---|---|
--sitemap |
auto |
Automatically uses sitemap if available |
--max-discovery-depth |
5 |
How deep the crawler follows links |
--limit |
No limit | Always set a limit to avoid crawling entire sites |
Start a crawl:
# Crawl a site section (always set --limit)
nimble crawl run --url "https://docs.example.com" --limit 50
# Crawl with path filtering
nimble crawl run --url "https://example.com" --include-path "/docs" --include-path "/api" --limit 100
# Exclude paths
nimble crawl run --url "https://example.com" --exclude-path "/blog" --exclude-path "/archive" --limit 50
# Control crawl depth
nimble crawl run --url "https://example.com" --max-discovery-depth 3 --limit 50
# Allow subdomains and external links
nimble crawl run --url "https://example.com" --allow-subdomains --allow-external-links --limit 50
# Crawl entire domain (not just child paths)
nimble crawl run --url "https://example.com/docs" --crawl-entire-domain --limit 100
# Named crawl for tracking
nimble crawl run --url "https://example.com" --name "docs-crawl-feb-2026" --limit 200
# Use sitemap for discovery
nimble crawl run --url "https://example.com" --sitemap auto --limit 50
Key options for crawl run:
| Flag | Description |
|---|---|
--url |
URL to crawl (required) |
--limit |
Max pages to crawl (always set this) |
--max-discovery-depth |
Max depth based on discovery order (default 5) |
--include-path |
Regex patterns for URLs to include (repeatable) |
--exclude-path |
Regex patterns for URLs to exclude (repeatable) |
--allow-subdomains |
Follow links to subdomains |
--allow-external-links |
Follow links to external sites |
--crawl-entire-domain |
Follow sibling/parent URLs, not just child paths |
--ignore-query-parameters |
Don't re-scrape same path with different query params |
--name |
Name for the crawl job |
--sitemap |
Use sitemap for URL discovery (default auto) |
--callback |
Webhook for receiving results |
Poll crawl status and retrieve results:
Crawl jobs run asynchronously. After starting a crawl, poll for completion, then retrieve content using individual task IDs (not the crawl ID):
# 1. Start the crawl → returns a crawl_id
nimble crawl run --url "https://docs.example.com" --limit 5
# Returns: crawl_id "abc-123"
# 2. Poll status until completed → returns individual task_ids per page
nimble crawl status --id "abc-123"
# Returns: tasks: [{ task_id: "task-456" }, { task_id: "task-789" }, ...]
# Status values: running, completed, failed, terminated
# 3. Retrieve content using INDIVIDUAL task_ids (NOT the crawl_id)
nimble tasks results --task-id "task-456"
nimble tasks results --task-id "task-789"
# ⚠️ Using the crawl_id here returns 404 — you must use the per-page task_ids from step 2
IMPORTANT: nimble tasks results requires the individual task IDs from crawl status (each crawled page gets its own task ID), not the crawl job ID. Using the crawl ID will return a 404 error.
Polling guidelines:
- Poll every 15-30 seconds for small crawls (< 50 pages)
- Poll every 30-60 seconds for larger crawls (50+ pages)
- Stop polling after status is
completed,failed, orterminated - Note:
crawl statusmay occasionally misreport individual task statuses (showing "failed" for tasks that actually succeeded). Ifcrawl statusshows failed tasks, try retrieving their results withnimble tasks resultsbefore assuming failure
List crawls:
# List all crawls
nimble crawl list
# Filter by status
nimble crawl list --status running
# Paginate results
nimble crawl list --limit 10
Cancel a crawl:
nimble crawl terminate --id "crawl-task-id"
Best Practices
Search Strategy
- Always pass
--deep-search=false— the default is deep mode (slow). Fast mode covers 95% of use cases: URL discovery, research, comparisons, answer generation - Only use deep mode when you need full page text — archiving articles, extracting complete docs, building datasets
- Start with the right focus mode — match
--focusto your query type (seereferences/search-focus-modes.md) - Use
--include-answer— get AI-synthesized insights without extracting each result. If it returns 402/403, retry without it. - Filter domains — use
--include-domainto target authoritative sources - Add time filters — use
--time-rangefor time-sensitive queries
Multi-Search Strategy
When researching a topic in depth, run 2-3 searches in parallel with:
- Different topics — e.g.,
social+generalfor people research - Different query angles — e.g., "Jane Doe current job" + "Jane Doe career history" + "Jane Doe publications"
This is faster than sequential searches and gives broader coverage. Deduplicate results by URL before extracting.
Disambiguating Common Names
When searching for a person with a common name:
- Include distinguishing context in the query: company name, job title, city
- Use
--focus social— LinkedIn results include location and current company, making disambiguation easier - Cross-reference results across searches to confirm you're looking at the right person
Extraction Strategy
- Always use
--parse --format markdown— returns clean markdown instead of raw HTML, preventing context window overflow - Try without
--renderfirst — it's faster for static pages - Add
--renderfor SPAs — when content is loaded by JavaScript - Set geolocation — use
--countryto see region-specific content
Crawl Strategy
- Prefer
map+extractovercrawlfor LLM use — crawl results return raw HTML (60-115KB per page) which overwhelms LLM context. For LLM-friendly output, usemapto discover URLs, thenextract --parse --format markdownon individual pages - Use
crawlonly for bulk archiving or data pipelines — when you need raw content from many pages and will post-process it outside the LLM context - Always set
--limit— crawl has no default limit, so always specify one to avoid crawling entire sites - Use path filters —
--include-pathand--exclude-pathto target specific sections - Name your crawls — use
--namefor easy tracking - Retrieve with individual task IDs —
crawl statusreturns per-page task IDs; use those (not the crawl ID) withnimble tasks results --task-id
Common Recipes
Researching a person
# Step 1: Run social + general in parallel for max coverage
nimble search --query "Jane Doe Head of Engineering" --focus social --deep-search=false --max-results 10 --include-answer
nimble search --query "Jane Doe Head of Engineering" --focus general --deep-search=false --max-results 10 --include-answer
# Step 2: Broaden with different query angles in parallel
nimble search --query "Jane Doe career history Acme Corp" --deep-search=false --include-answer
nimble search --query "Jane Doe publications blog articles" --deep-search=false --include-answer
# Step 3: Extract the most promising non-auth-walled URLs (skip LinkedIn — see Known Limitations)
nimble extract --url "https://www.companysite.com/team/jane-doe" --parse --format markdown
Researching a company
# Step 1: Overview + recent news in parallel
nimble search --query "Acme Corp" --focus general --deep-search=false --include-answer
nimble search --query "Acme Corp" --focus news --time-range month --deep-search=false --include-answer
# Step 2: Extract company page
nimble extract --url "https://acme.com/about" --parse --format markdown
Technical research
# Step 1: Find docs and code examples
nimble search --query "React Server Components migration guide" --focus coding --deep-search=false --include-answer
# Step 2: Extract the most relevant doc
nimble extract --url "https://react.dev/reference/rsc/server-components" --parse --format markdown
Error Handling
| Error | Solution |
|---|---|
NIMBLE_API_KEY not set |
Set the environment variable: export NIMBLE_API_KEY="your-key" |
401 Unauthorized |
Verify API key is active at nimbleway.com |
402/403 with --include-answer |
Premium feature not available on current plan. Retry the same query without --include-answer and continue |
429 Too Many Requests |
Reduce request frequency or upgrade API tier |
| Timeout | Ensure --deep-search=false is set, reduce --max-results, or increase --request-timeout |
| No results | Try different --focus, broaden query, remove domain filters |
Known Limitations
| Site | Issue | Workaround |
|---|---|---|
| LinkedIn profiles | Auth wall blocks extraction (returns redirect/JS, status 999) | Use --focus social search instead — it returns LinkedIn data directly via subagents. Do NOT try to extract LinkedIn URLs. |
| Sites behind login | Extract returns login page instead of content | No workaround — use search snippets instead |
| Heavy SPAs | Extract returns empty or minimal HTML | Add --render flag to execute JavaScript before extraction |
| Crawl results | Returns raw HTML (60-115KB per page), no markdown option | Use map + extract --parse --format markdown on individual pages for LLM-friendly output |
| Crawl status | May misreport individual task statuses as "failed" when they actually succeeded | Always try nimble tasks results --task-id before assuming failure |