parallel-ai-search
Parallel AI Search (CLI Master)
This is a single “master” skill that replaces the earlier Node-script-based version of parallel-ai-search.
It routes to the right parallel-cli capability for the task:
- Search: quick web lookup with citations (
parallel-cli search) - Extract: turn URLs (including PDFs and JS-heavy pages) into clean, LLM-ready text (
parallel-cli extract) - Deep research: multi-source reports with processor tiers (
parallel-cli research ...) - Enrich: add web-sourced columns to CSV/JSON (
parallel-cli enrich ...) - FindAll: discover entities from the web with optional enrichments (
parallel-cli findall ...) - Monitor: track web changes on a cadence, optionally via webhook (
parallel-cli monitor ...)
Routing rules (pick ONE)
Choose the smallest / cheapest action that solves the user’s request:
- Extract — if the user gives one or more URLs or says “read/summarise this page”, “extract”, “quote”, “pull the content”, “what does this page say”.
- Deep research — ONLY if the user explicitly asks for deep, exhaustive, comprehensive, thorough investigation, or a multi-source “report”.
- Enrich — if the user provides a list/table (CSV/JSON/inline objects) and wants new columns like CEO, revenue, funding, contact info, etc.
- FindAll — if the user wants you to discover many entities (companies/people/venues/etc.) that match criteria.
- Monitor — if the user wants ongoing tracking (“alert me”, “track changes”, “monitor this weekly”) rather than a one-off answer.
- Search — default for everything else that needs current web info or citations.
Optional manual prefixes if the user invoked this skill directly:
search: ...extract: ...research: ...enrich: ...findall: ...monitor: ...
If a prefix is present, honour it.
Setup and authentication (only when needed)
Before running any Parallel command, ensure auth works:
parallel-cli auth
If parallel-cli is missing, install it:
curl -fsSL https://parallel.ai/install.sh | bash
If you cannot use the install script, use pipx:
pipx install "parallel-web-tools[cli]"
pipx ensurepath
Then authenticate (choose one):
# Interactive OAuth (opens browser)
parallel-cli login
# Headless / SSH / CI
parallel-cli login --device
# Or environment variable
export PARALLEL_API_KEY="your_api_key"
Output & citation rules
- Always cite web-sourced facts with inline markdown links:
[Source Title](https://...). - End with a Sources list whenever you used Search/Extract/Research output.
- Prefer official/primary sources when available.
- For long outputs, save to files in
/tmp/and summarise in-chat.
Search (default web lookup)
Use Search for fast, cost-effective answers with citations.
Command template
parallel-cli search "$OBJECTIVE" --mode agentic --max-results 10 --json
Add any of these only when relevant:
--after-date YYYY-MM-DD(freshness constraint)--include-domains a.com b.org(restrict sources)--exclude-domains spam.com(block sources)- one or more
-q "keyword query"flags (extra keyword probes) -o "/tmp/$SLUG.search.json"(save full JSON to a file)
Parse + respond
From the JSON results, extract title, url, and any publish_date / excerpt fields. Answer the user’s question, and cite each claim inline.
Extract (read one or more URLs)
Use Extract when you need the actual contents of specific URLs (webpages, PDFs, JS-heavy sites).
Command template
parallel-cli extract "$URL" --json
Add when relevant:
--objective "Focus area"(e.g., pricing, API usage, constraints)--full-content(only if the user needs the whole page)--no-excerpts(if you only want full content)-o "/tmp/$SLUG.extract.json"(save full JSON to a file)
Respond
- If the user asked for a summary, summarise with citations to the extracted URL.
- If the user asked for the verbatim text, provide the extracted markdown only if it is reasonably sized; otherwise provide the key sections + offer to read more from the saved output.
Deep research (only when explicitly requested)
Deep research is slower and may cost more than Search. Use it only when the user explicitly wants depth.
Step 1 — start (always async)
parallel-cli research run "$QUESTION" --processor pro-fast --no-wait --json
Parse run_id (and any monitoring URL) from JSON and tell the user the run started.
Step 2 — poll (bounded timeout)
Choose a short slug filename (lowercase-hyphen), then:
parallel-cli research poll "$RUN_ID" -o "/tmp/$SLUG" --timeout 540
- Share the executive summary printed by the poll command.
- Mention the output files:
/tmp/$SLUG.md/tmp/$SLUG.json
If polling times out, re-run the same poll command — the run continues server-side.
Enrich (CSV/JSON or inline data)
Use Enrich to add web-sourced columns to structured data.
Step 1 — (optional) suggest columns
parallel-cli enrich suggest "$INTENT" --json
Use this when the user knows the goal but not the exact output schema.
Step 2 — run (always async for large jobs)
For CSV:
parallel-cli enrich run --source-type csv --source "input.csv" --target "/tmp/enriched.csv" --source-columns '[{"name":"company","description":"Company name"}]' --intent "$INTENT" --no-wait --json
For inline JSON rows:
parallel-cli enrich run --data '[{"company":"Google"},{"company":"Apple"}]' --target "/tmp/enriched.csv" --intent "$INTENT" --no-wait --json
Parse taskgroup_id from JSON.
Step 3 — poll
parallel-cli enrich poll "$TASKGROUP_ID" --timeout 540 --json
After completion:
- Tell the user the output file path (the
--targetyou chose). - Preview a few rows (using file read tools if available) and report row counts.
If poll times out, re-run it — the job continues server-side.
FindAll (entity discovery)
Use FindAll when the user wants you to discover a set of entities (e.g., “AI startups in healthcare”, “roofing companies in Charlotte”, “YC devtools companies”).
Step 1 — run
parallel-cli findall run "$OBJECTIVE" --generator core --match-limit 25 --no-wait --json
Useful options:
--dry-run --jsonto preview schema before spending money--exclude '[{"name":"Example Corp","url":"example.com"}]'to avoid known entities--generator preview|base|core|pro(core default; pro for hardest queries)
Parse run_id from JSON.
Step 2 — poll + fetch results
parallel-cli findall poll "$RUN_ID" --json
parallel-cli findall result "$RUN_ID" --json
Respond with:
- total entities found
- a clean list/table of the best matches (name + URL + key attributes)
- any caveats about ambiguous matches
Monitor (web change tracking)
Use Monitor when the user wants ongoing tracking.
Create:
parallel-cli monitor create "$OBJECTIVE" --cadence daily --json
Optional:
--cadence hourly|daily|weekly|every_two_weeks--webhook https://example.com/hook(deliver events externally)--output-schema '<JSON schema string>'(structured events)
Manage:
parallel-cli monitor list --json
parallel-cli monitor get "$MONITOR_ID" --json
parallel-cli monitor update "$MONITOR_ID" --cadence weekly --json
parallel-cli monitor delete "$MONITOR_ID"
parallel-cli monitor events "$MONITOR_ID" --json
parallel-cli monitor simulate "$MONITOR_ID" --json
Respond with the monitor id and how to retrieve events (or confirm webhook delivery).
Reference material
- Copy/paste command templates and patterns:
references/command-templates.md - Troubleshooting common failures:
references/troubleshooting.md