scry
Scry Skill
Scry's canonical substrate is read-only SQL over the ExoPriors public corpus.
Most agents should reach that substrate through the hosted Scry HTTP surface and Scry tools,
not raw database credentials. You write Postgres SQL against a curated scry.* schema
and get JSON rows back. There is no ORM, no GraphQL, no pagination token -- just SQL.
When SQL is unnecessary, the portable typed-search front door is POST /v1/scry/search,
and GET /v1/scry/search/records/{record_ref} hydrates record details.
Use GET /v1/stats or GET /v1/scry/context for live corpus counts instead of relying on static numbers in docs.
Skill generation: 2026041201
A) When to use / not use
Use this skill when:
- Searching, filtering, or aggregating content across the ExoPriors corpus
- Running lexical (BM25) or hybrid searches
- Exploring author networks, public cross-platform identities, or publication patterns
- Navigating the OpenAlex academic graph (authors, citations, institutions, concepts)
- Creating shareable artifacts from query results
- Emitting structured agent judgements about entities or external references
Do NOT use this skill when:
- The user wants semantic/vector search composition or embedding algebra (use the scry-vectors skill)
- The user wants LLM-based reranking (use the scry-rerank skill)
- The user is querying their own local database
B) Golden Rules
-
Context handshake first. At session start, call
GET /v1/scry/context?skill_generation=2026041201. This endpoint is public; you do not need a key for the handshake itself. Use the returnedofferingsblock for the current product summary budgets, canonical env var, default skill, and specialized skill catalog. Readofferings.portable_entryas the canonical/scryflow:context -> schema -> route -> query. Readofferings.paymentsas the payment-role contract: it tells you which protocols are live vs planned, which ones fund the reusable prepaid ledger, which ones are hot-path query payment vs delegated authorization, and which authenticated control-plane endpoints expose reusable instruments and stored mandate artifacts. Readofferings.accelerator_familiesandofferings.relation_accelerator_policiesas the accelerator contract: they tell you which relations are canonical defaults versus optional convenience helpers, which tracked objects gate them, and what fallback surfaces to use. If you need a concise shareable bootstrap prompt for another agent, useofferings.public_agent_prompt.copy_textinstead of paraphrasing your own. If you need deeper docs, useofferings.canonical_doc_path, each skill'srepo_path, andreference_pathsinstead of guessing where the maintained docs live. If you cache descriptive bootstrap context across turns or sessions, also tracksurface_context_generationand refresh when it changes. Use typed search,scry.search_federated(...), or source-nativescry.search_*helpers for lexical work. Pivot to source-localscry.*surfaces or semantic retrieval when those helpers do not fit the task. Thelexical_searchblock reports health for the shared BM25 diagnostic path. If you are validating deploy/runtime conformance from this repo rather than just using the surface, run the canonical proof command:cd src/api SCRY_API_KEY=... cargo run --features cli --bin scry-contract-audit -- --output jsonThis proof is strict by default: any non-pass manifest drift or bounded probe failure exits non-zero. The default
agent-experience-e2erun now includes the same manifest conformance audit after signed-in continuity. Ifshould_update_skill=true, tell the user to runnpx skills update. If the response reportsclient_skill_generation: nullwhile you're using packaged skills, or if local instructions still mention legacy ExoPriors hostnames or legacy console routes, treat the install as stale and ask the user to runnpx skills updatebefore more debugging. -
Schema first. ALWAYS call
GET /v1/scry/schemabefore writing SQL. Never guess column names or types. The schema endpoint returns live column metadata and row-count estimates for every view. For semantic SQL, also read each embedding-capable relation'svector_indexedfield and prefertruesurfaces first;falsemeans the relation is exposed but not ANN-index-backed on the public path, so similarity ordering may degrade into seq-scan behavior that will not fit the sync envelope.If the task targets publication-first Parquet datasets rather than the live Scry SQL corpus, call
GET /v1/scry/datasets, inspectGET /v1/scry/datasets/{id}, and then usePOST /v1/scry/datasets/{id}/resolvebefore writing SQL. The resolve response gives you short-lived DuckDB-ready HTTPS URLs and bootstrap SQL. -
Check operational status when search looks wrong. If lexical search, materialized-view freshness, or corpus behavior seems off, call
GET /v1/scry/index-view-statuswith any Scry key before assuming the query or schema is wrong. If/v1/scry/contextmarks a relation asfast_pathorconditional, do not rely on it until the required tracked objects are healthy. -
Clarify ambiguous intent before broad or likely-expensive queries. If the request is vague ("search Reddit for X", "find things about Y"), ask one short clarification question about the goal/output format before running expensive SQL.
-
Start with a cheap probe. Before any query likely to run >5s, read
GET /v1/scry/pricingplus/v1/scry/estimateand/or run a tight exploratory query (LIMIT 20plus scoped source/window filters), then scale only after confirming relevance. UseGET /v1/scry/pricewhen you specifically want the lightweight current epoch oracle (base_fee,utilization,load_pressure,recommended_max_fee,epoch). UseGET /v1/scry/price/historywhen you need to know whether the current base fee is a spike or the recent norm; it defaults to the trailing 6 hours and rejects windows wider than 24 hours. -
Treat congested queries as budget-bounded. Scry queries are free unless there is congestion. When there is congestion, Scry reserves nanodollar credits up front and derives a runtime timeout from the authorized spend envelope. The runtime enforces that envelope with a live-burn watchdog first and a timeout fallback second. Lead with the simplified surface: use
X-Scry-Budgetas the primary per-query cost control,eagerandpatientas the two execution modes, andGET /v1/scry/accountas the one-stop status check before or after broad or expensive query work. UseGET /v1/scry/pricingplus/v1/scry/estimatewhen you need the live cost details behind that surface. When acting under a stored delegated mandate, also sendX-Scry-Subject-Agent: <agent-id>so Scry can apply the matchingquery_accessmandate cap. Under congestion,eagermode uses uniform clearing: winners pay the epoch clearing price, not their full submitted maximum. The account's storedmax_bid_multiplierfromGET/PATCH /v1/scry/preferencesstill caps the effective eager bid before admission. Agents that prefer to wait can switchpricing_modetopatient, which keeps FIFO ordering and runs at base price when capacity opens. Use thepayment_surfaceblock in/v1/scry/pricingto distinguish live direct query payment (x402) and live account-funding rails (stripe_checkout,crypto_topup) from control-plane / future artifacts (stripe_acp,ap2,visa_tap,mastercard_agent_pay). Readpayment_surface.card_fundingandpayment_surface.card_funding_read_orderbefore assuming cards are either fully blocked or fully API-native; the contract is a one-time setup handoff, then API-only saved-method funding. When delegated funding or agent authorization matters, inspectGET /v1/billing/auto-topup,GET /v1/billing/payment-instruments, andGET /v1/billing/payment-mandatesrather than assuming those artifacts are hidden inside the query surface. backward-compat note: older clients may still useX-Scry-Max-Cost,X-Scry-Max-Exposure,X-Scry-Bid, orpricing_mode: "dynamic" | "queue", but new integrations should lead withX-Scry-Budgetandeager/patient. -
Choose lexical SQL vs semantic explicitly. Use
scry.search_federated(...), source-nativescry.search_*helpers, or other lexical SQL when the user wants fast provenance-bearing first results. Use the shared BM25 diagnostic helpers only when a task explicitly needs that path. Widen into broader SQL when you need joins, aggregation, or full control over filters. For conceptual intent ("themes", "things like", "similar to"), route to scry-vectors first, then optionally hybridize. -
LIMIT always. Every query MUST include a LIMIT clause. Max 10,000 rows. Queries without LIMIT are rejected by the SQL validator.
-
Prefer canonical surfaces with tight filters.
scry.entitiesis large enough that you should not scan it blindly. Usescry.search_federated(...), typed search, or source-nativescry.search_*helpers for lexical retrieval,scry.chunk_embeddingsfor chunk-level semantic retrieval,scry.entity_embeddingsorscry.entities_with_embeddingsonly when you want one entity-level vector row per entity,scry.embedding_coverageto inspect public vs staged vs ready source/kind coverage, source-localscry.*_embeddingsviews when you need the exact semantic owner table, and source-native tables or aliases such asscry.hackernews,scry.wikipedia,scry.pubmed,scry.repec,scry.kalshi,scry.nih_reporter,scry.govinfo_crec,scry.offshoreleaks,scry.openalex,scry.bluesky,scry.reddit_posts,scry.forum_posts,scry.huggingface,scry.huggingface_papers,scry.huggingface_collections,scry.huggingface_discussions,scry.huggingface_accounts,scry.huggingface_models,scry.huggingface_datasets,scry.huggingface_spaces,scry.huggingface_account_hardware,scry.huggingface_repo_text_artifacts,scry.huggingface_paper_artifacts,scry.kalshi_markets,scry.nih_reporter_projects,scry.govinfo_crec_granules,scry.hackernews_items,scry.wikipedia_articles,scry.pubmed_papers,scry.repec_records,scry.openalex_works,scry.bluesky_posts,scry.mailing_list_messages,scry.openlibrary_*,scry.stackexchange,scry.caselaw,scry.gutenberg_books,scry.wikidata_items,scry.wikidata_claims, andscry.kl3mwhen a corpus no longer lives canonically inscry.entities. Reach for a specificmv_*convenience view only when/v1/scry/schemaconfirms it is healthy and useful for the task, and do not treat it as corpus-complete truth when completeness matters.For Hugging Face specifically, prefer
scry.search_huggingface()when you need one discovery-first entry point across repos, artifacts, papers, collections, discussions, accounts, and paper-artifact hops.For package registry search, use
scry.search_packages(query, registries, limit_n)for cross-registry discovery across npm, PyPI, crates.io, RubyGems, Go modules, NuGet, Maven, Hex.pm, Packagist, pub.dev, CocoaPods, conda-forge, JSR, and Homebrew. Individual per-registry functions are also available (see schema guide).For cross-platform social search, use
scry.social_search(query, mode, limit_n)which searches Twitter, Bluesky, StackExchange, and mailing lists in one call. -
Cross-table composition is normal. If the best records live in multiple source-native tables, combine them in one SQL statement with CTEs,
UNION ALL, and joins throughscry.source_records. This is the intended contract, not a workaround. -
Filter dangerous content. On
scry.entities,scry.entities_with_embeddings, andscry.chunk_embeddings, includeWHERE content_risk IS DISTINCT FROM 'dangerous'unless the user explicitly asks for unfiltered results. If a source-native view does not exposecontent_risk, join it toscry.entitiesonentity_idand filter there. Dangerous content contains adversarial prompt-injection content. -
Raw SQL, not JSON.
POST /v1/scry/querytakesContent-Type: text/plainwith raw SQL in the body. Not JSON-wrapped SQL. -
File rough edges promptly. If Scry blocks the task, misses an obvious result set, or exposes a rough edge, submit a brief note to
POST /v1/feedback?feedback_type=suggestion|bug|other&channel=scry_skillusingContent-Type: text/plainby default (text/markdownalso works). Do not silently work around it. Logged-in users can review their submissions withGET /v1/feedback.
For full query limits, timeout policies, and degradation strategies, see Shared Guardrails.
B.1 API Key Setup (Canonical)
Recommended default for less-technical users: in the directory where you launch the agent, store SCRY_API_KEY in .env so skills and copied prompts use the same place.
Canonical key naming for this skill:
- Env var:
SCRY_API_KEY - Anonymous bootstrap key format:
scry_anon_*fromPOST /v1/scry/anonymous-key - Personal key format: personal Scry API key with Scry access
- Recommended anonymous client header:
X-Scry-Client-Tag: <short-stable-tag>
Durable machine bootstrap paths:
- Operator-provisioned (default for non-wallet agents): a signed-in human operator calls
POST /v1/auth/api-keys, creates a Scry-scoped key, hands the secret to the agent, and the agent stores it inSCRY_API_KEY. - Wallet-native:
POST /v1/auth/agent/signupfor agents that already have an EVM wallet; the response returns a session token plus API key.
Both paths end with the same Authorization: Bearer $SCRY_API_KEY contract.
printf '%s\n' 'SCRY_API_KEY=<your key>' >> .env
set -a && source .env && set +a
Verify:
echo "$SCRY_API_KEY"
Anonymous bootstrap flow when the user wants immediate public access without signup:
CLIENT_TAG="${SCRY_CLIENT_TAG:-dev-laptop}"
ANON_KEY="$(curl -s https://api.scry.io/v1/scry/anonymous-key -X POST -H "X-Scry-Client-Tag: $CLIENT_TAG" | python3 -c 'import json,sys; print(json.load(sys.stdin)[\"api_key\"])')"
curl -s https://api.scry.io/v1/scry/schema \
-H "Authorization: Bearer $ANON_KEY" \
-H "X-Scry-Client-Tag: $CLIENT_TAG"
curl -s https://api.scry.io/v1/scry/query \
-H "Authorization: Bearer $ANON_KEY" \
-H "X-Scry-Client-Tag: $CLIENT_TAG" \
-H "Content-Type: text/plain" \
--data "SELECT 1 LIMIT 1"
Use this for fast trial access only. The anonymous bootstrap lane is intentionally generous for the first few queries and then degrades. For sustained usage, prefer a personal Scry API key.
Keep the same X-Scry-Client-Tag value on the same device when staying anonymous so the backend can distinguish a real first-use session from abuse behind shared IPs.
The same anonymous key can also call POST /v1/scry/embed, GET /v1/scry/vectors, and DELETE /v1/scry/vectors/{name}. Those handles stay bound to the current anonymous session rather than a durable account namespace.
If using packaged skills, keep them current:
npx skills add exopriors/skills
npx skills update
B.1a Lexical SQL quickstart
Use lexical SQL helpers when the user wants fast first-pass discovery without building a larger query first.
curl -s https://api.scry.io/v1/scry/query \
-H "Authorization: Bearer $SCRY_API_KEY" \
-H "Content-Type: text/plain" \
--data "SELECT * FROM scry.search_federated('mechanistic interpretability', NULL, NULL, 5, 2) LIMIT 5"
Use scry.search_federated(...) or a source-native scry.search_* helper for
quick lexical discovery, then widen into broader SQL when you need joins,
aggregation, or exact control.
B.1b x402 Query-Only Access
POST /v1/scry/query still supports standard x402, but it is now an explicit
paid path rather than the default no-auth bootstrap path. Use x402 when the
user already has an x402-capable client or wallet and only wants direct paid query
execution. For public trial use, use POST /v1/scry/anonymous-key. For
schema/context, shares, judgements, feedback, or repeated multi-endpoint usage,
prefer a personal Scry API key.
The x402 flow is challenge-first. If x402 is enabled and the request has no
Authorization header, the first unsigned POST /v1/scry/query returns
402 Payment Required with machine-readable payment requirements. When the
caller also sends X-Scry-Budget, Scry asks the wallet to fund at least that
budget (subject to the configured x402 base quantum). After settlement,
the paid amount converts into reusable Scry credits on the shared ledger, so
overpayment remains available for later queries instead of being lost.
If the agent already has an EVM wallet and wants wallet-native durable identity
plus a reusable key, use POST /v1/auth/agent/signup first. If it does not
have a wallet, have a signed-in operator create a Scry-scoped key via
POST /v1/auth/api-keys and hand it to the agent. Both paths end with the same
Bearer-key contract.
Minimal client shape:
import { wrapFetchWithPayment } from 'x402-fetch';
const paidFetch = wrapFetchWithPayment(fetch, walletClient);
const resp = await paidFetch('https://api.scry.io/v1/scry/query', {
method: 'POST',
headers: { 'content-type': 'text/plain' },
body: 'SELECT 1 LIMIT 1',
});
B.1c Query Budgeting
When there is congestion, these are the key billing controls:
X-Scry-Budget: <nanodollars>is the primary per-query cost control. Send it on/v1/scry/estimateand/v1/scry/querywhen you want one number to bound the estimate check, runtime authorization, and eager-mode bid cap.GET /v1/scry/accountis the one-stop billing status check. It returns balance, current mode, max budget, today's spend/query count, live base fee, live utilization, and whether auto-topup is enabled.GET /v1/scry/preferencesreturns the caller's persistedpricing_mode(eagerorpatient) andmax_bid_multiplier.PATCH /v1/scry/preferencesupdatespricing_modeandmax_bid_multiplier. Usepricing_mode: "patient"when the user wants FIFO waiting at base price during congestion instead of bidding into the eager auction.GET /v1/scry/pricereturns the livebase_fee,utilization,load_pressure,recommended_max_fee, and current epoch metadata. It is the lightweight current epoch oracle; use it right before deciding whether to run now ineagermode or wait inpatient.GET /v1/scry/price/historyreturns sampledepoch_id/timestamp/base_fee/utilizationhistory plus sampling metadata for large windows. If no bounds are provided it returns the trailing 6 hours, and requests wider than 24 hours are rejected.GET /v1/scry/price/streamreturns an SSE feed withpriceevents at epoch cadence andpingkeepalives while no new epoch arrives.GET /v1/scry/spendreturns the authenticated caller's own spend history:total_credits_spent,query_count,avg_cost_per_query, and recent per-query cost breakdowns.GET /v1/scry/pricingreturns the live billing/market authority: the live query access contract (free_unless_congested), whether congestion pricing is currently active, the live compute rate, bandwidth rate, load multiplier, reservation headroom, bid thresholds, the congestion-admission auction contract, and the budget-enforcement contract.GET /v1/scry/pricingalso exposes the x402 base funding quantum and the fact that x402 funding now scales offX-Scry-Budgetwhen that header is present.GET /v1/scry/accountreturns the authenticated funding summary. Readfunding.card_fundingfirst when the question is "can this agent use cards right now?" because it makes the current card state explicit (requires_operator_setup,saved_method_ready,auto_topup_attention_required,auto_topup_active, ordisabled) and lists the next endpoints to call.- Funding-control endpoints such as
GET /v1/scry/account,POST /v1/billing/agent-topup,POST /v1/billing/payment-mandates, andPATCH /v1/billing/auto-topuprequire account or billing scope. POST /v1/scry/estimatereturnsestimated_cost_nanodollars,suggested_reserve_nanodollars,authorized_exposure_nanodollars,exposure_timeout_ms, and a bid-adjusted upper-boundcost_breakdown.POST /v1/scry/query?receipt=summaryor?receipt=fullreturns an optional execution receipt inline with the result. Usesummarywhen you only need the stable ID, SQL fingerprint, and main cost/runtime facts; usefullwhen you want the estimate, billing, execution, and structured security details in one object.GET /v1/scry/query-receipts/{id}re-hydrates the durable query receipt for the authenticated caller fromscry_query_log. Raw SQL is omitted by default; add?include_sql=truewhen the owner explicitly needs the original statement back.X-Scry-Subject-Agent: <agent-id>activates delegated query policy. If the authenticated account has a matching activequery_accessmandate, Scry applies that mandate'smax_query_exposureas an additional cap and returns adelegated_authorizationobject. If not,/v1/scry/queryfails withdelegated_authorization_required.- Cards are a two-stage rail, not a zero-setup hot path.
POST /v1/billing/setup-payment-methodcreates a Stripe Checkout setup session that saves a card without charging it and returnssetup_urlfor one operator browser visit. After completion, the card is persisted as a payment instrument and set as the default. This is the entry point for enabling Stripe-backed auto-topup and agent-topup.
POST /v1/billing/agent-topupcharges the default stored payment instrument for a pricing tier amount, granting credits immediately. Designed for agent-initiated funding without browser interaction and requires only a saved payment method.- Recurring Stripe rescue is a separate opt-in that requires an active auto_topup mandate via
POST /v1/billing/payment-mandatesplusPATCH /v1/billing/auto-topup. GET /v1/billing/auto-topupandPATCH /v1/billing/auto-topupcontrol Stripe-backed replenishment into the same prepaid ledger. If enabled with a verified default Stripe payment method plus an activeauto_topupmandate,/v1/scry/querygets one off-session topup attempt after aninsufficient_creditsreservation failure and then retries reservation once.GET /v1/billing/auto-topup/eligibilityexplains why recurring saved-method funding is not yet ready whenfunding.card_funding.statereportsauto_topup_attention_required.GET /v1/billing/pricingreturns available credit pricing tiers with id, usd_cents, credits (nanodollars), and display_label.GET /v1/billing/payment-instrumentslists saved payment methods.- Live funding rails are
x402, Stripe saved-method funding, and crypto topup.stripe_acp,ap2,visa_tap, andmastercard_agent_payare control-plane / future artifact layers, not interchangeable live funding rails. - In
eagermode, uniform clearing means the charged priority fee comes from the lowest winning bid in the epoch, not from every winner's submitted max bid. max_bid_multiplierclamps the effective eager bid before the request enters the auction.- Billable bandwidth uses the executor-tracked streamed row payload bytes, not the outer HTTP/JSON envelope. Full response-body size still matters for delivery limits and alerts.
- backward-compat note: legacy clients may still send
X-Scry-Max-Cost,X-Scry-Max-Exposure,X-Scry-Bid, orpricing_mode: "dynamic" | "queue". New integrations should lead withX-Scry-Budget,eager,patient, andGET /v1/scry/account.
Useful response headers from POST /v1/scry/query:
x-scry-cost: charged nanodollars for the completed query when congestion pricing appliesx-scry-receipt-id: stable execution-receipt id when receipt mode is enabledx-scry-authorized-exposure: the hard exposure authorization applied to this runx-scry-reserved: the reserved/pre-authorized nanodollar amountx-scry-exposure-timeout-ms: exposure-derived runtime cutoffx-scry-bid-accepted/x-scry-bid-charged: submitted max bid vs clearing-price multiplier actually chargedx-scry-admission/x-scry-admission-wait-ms: whether the request started immediately or through a congestion epoch, plus the admission waitX-Scry-Base-Fee: current epoch base-fee multiplier used to price computeX-Scry-Priority-Fee: congestion premium implied by the accepted clearing priceX-Scry-Compute-Units: normalized compute units charged for the queryX-Scry-Utilization: price-epoch utilization snapshotX-Scry-Epoch: current price epoch idX-Scry-Budget-Remaining: credits or free-tier budget remaining after a congestion-priced settlement
If a congestion-priced query runs into its spend envelope, the API returns
402 with query_exposure_exhausted. That is enforced by live runtime burn
first, with the exposure timeout as fallback. The fix is to narrow the query
or raise X-Scry-Budget, not to keep retrying the same request unchanged.
C) Quickstart
One end-to-end example: find recent high-scoring LessWrong posts about RLHF.
Step 1: Get dynamic context + update advisory
GET https://api.scry.io/v1/scry/context?skill_generation=2026041201
Authorization: Bearer $SCRY_API_KEY
Step 2: Get schema
GET https://api.scry.io/v1/scry/schema
Authorization: Bearer $SCRY_API_KEY
Step 3: Run query
POST https://api.scry.io/v1/scry/query
Authorization: Bearer $SCRY_API_KEY
Content-Type: text/plain
WITH hits AS (
SELECT entity_id
FROM scry.search_federated('RLHF reinforcement learning human feedback',
ARRAY['lesswrong'], ARRAY['post'], 100, 100)
WHERE entity_id IS NOT NULL
)
SELECT e.uri, e.title, e.original_author, e.original_timestamp, e.score
FROM hits h
JOIN scry.entities e ON e.id = h.entity_id
WHERE e.source = 'lesswrong'
ORDER BY e.score DESC NULLS LAST, e.original_timestamp DESC
LIMIT 20
Response shape:
{
"columns": ["uri", "title", "original_author", "original_timestamp", "score"],
"rows": [["https://...", "My RLHF Post", "author", "2025-01-15T...", 142], ...],
"row_count": 20,
"duration_ms": 312,
"truncated": false
}
Source-native cross-table example:
WITH hn AS (
SELECT 'hackernews'::text AS source, hn_id::text AS external_id, score
FROM scry.search_hackernews_items('interpretability', kinds => ARRAY['post'], limit_n => 20)
),
wiki AS (
SELECT 'wikipedia'::text AS source, page_id::text AS external_id, score
FROM scry.search_wikipedia_articles('interpretability', limit_n => 20)
),
hits AS (
SELECT * FROM hn
UNION ALL
SELECT * FROM wiki
)
SELECT h.source, r.uri, r.title, h.score
FROM hits h
JOIN scry.source_records r
ON r.source = h.source
AND r.external_id = h.external_id
ORDER BY h.score DESC
LIMIT 20;
D) Decision Tree
User wants to search the ExoPriors corpus?
|
+-- Ambiguous / conceptual ask? --> Clarify intent first, then use
| scry-vectors for semantic search (optionally hybridize with lexical)
|
+-- By keywords/phrases? --> typed search, scry.search_federated(...), or source-native search helpers
| +-- Specific source? --> pass `sources` explicitly or use the source-native helper/table
| +-- Reddit? --> START with scry.reddit_subreddit_stats /
| scry.reddit_clusters() / scry.reddit_embeddings
| and trust /v1/scry/schema status before
| using direct retrieval helpers
| +-- Large result? --> use tight source/kind/date filters and scale only after a small probe
|
+-- By structured filters (source, date, author)? --> Direct SQL on MVs
|
+-- By semantic similarity? --> (scry-vectors skill, not this one)
|
+-- Hybrid (keywords + semantic rerank)? --> scry.hybrid_search() or
| lexical CTE + JOIN scry.chunk_embeddings
|
+-- Author/people lookup? --> scry.actors, scry.people, scry.person_accounts
| (conservative public account links)
|
+-- Academic graph (OpenAlex)? --> scry.openalex_find_authors(),
| scry.openalex_find_works(), etc. (see schema-guide.md)
|
+-- Bluesky / Twitter / Open Library text lookup? --> scry.search_bluesky_posts(),
| scry.search_twitter_posts(), scry.search_openlibrary_editions(),
| scry.search_openlibrary_works(), scry.search_openlibrary_authors()
|
+-- Need to share results? --> POST /v1/scry/shares
|
+-- Need to emit a structured observation? --> POST /v1/scry/judgements
|
+-- Scry blocked / missing obvious results? --> POST /v1/feedback
E) Recipes
E0. Context handshake + skill update advisory
curl -s "https://api.scry.io/v1/scry/context?skill_generation=2026041201" \
-H "Authorization: Bearer $SCRY_API_KEY"
If response includes "should_update_skill": true, ask the user to run:
npx skills update.
If the response shows "client_skill_generation": null while the session is
using packaged Scry skills, or if local instructions still point at
legacy ExoPriors hostnames or legacy console routes, stop and ask the user
to run npx skills update before deeper debugging.
If response includes "lexical_search": {...}, read status, status_basis,
and last_known_status as health for the shared BM25 diagnostic path. Prefer
typed search, scry.search_federated(...), source-native scry.search_*
helpers, or scry.entities_with_embeddings depending on the task. Use
/v1/scry/index-view-status for detailed live timing before blaming the query.
E0b. Submit feedback when Scry blocks the task
curl -s "https://api.scry.io/v1/feedback?feedback_type=bug&channel=scry_skill" \
-H "Authorization: Bearer $SCRY_API_KEY" \
-H "Content-Type: text/plain" \
--data $'## What happened\n- Query: ...\n- Problem: ...\n\n## Why it matters\n- ...\n\n## Suggested fix\n- ...'
Success response includes a receipt id. Logged-in users can review their own
submissions with:
curl -s "https://api.scry.io/v1/feedback?limit=10" \
-H "Authorization: Bearer $SCRY_API_KEY"
E1. Source-aware lexical search
WITH c AS (
SELECT entity_id
FROM scry.search_federated('your query here',
NULL, ARRAY['post'], 100, 10)
WHERE entity_id IS NOT NULL
)
SELECT e.uri, e.title, e.original_author, e.original_timestamp
FROM c JOIN scry.entities e ON e.id = c.entity_id
WHERE e.content_risk IS DISTINCT FROM 'dangerous'
LIMIT 50
Use sources and kinds arrays for strict scope. NULL searches across
available source-aware arms, while per_source_cap keeps one noisy source from
dominating first-pass results.
Healthy source-specific MVs can still be useful for source-native score fields
such as base_score, but they are optional convenience slices rather than the default path.
E2. Reddit-specific discovery
SELECT subreddit, total_count, latest
FROM scry.reddit_subreddit_stats
WHERE subreddit IN ('MachineLearning', 'LocalLLaMA')
ORDER BY total_count DESC
For semantic Reddit retrieval over the embedding-covered subset, use
scry.reddit_embeddings or scry.search_reddit_posts_semantic(...).
Direct retrieval helpers (scry.reddit_posts, scry.reddit_comments,
scry.mv_reddit_*, scry.search_reddit_posts(...),
scry.search_reddit_comments(...)) are currently degraded on the public
instance. Check /v1/scry/schema status before using them.
E3. Source-filtered materialized view query
SELECT entity_id, uri, title, original_author, score, original_timestamp
FROM scry.arxiv_papers
WHERE original_timestamp >= '2025-01-01'
ORDER BY original_timestamp DESC
LIMIT 50
score is not the useful ranking axis for arXiv. Sort by
original_timestamp, primary_category, or downstream citation proxies instead.
E4. Author activity across sources
SELECT e.source::text, COUNT(*) AS docs, MAX(e.original_timestamp) AS latest
FROM scry.entities e
WHERE e.original_author ILIKE '%yudkowsky%'
AND e.content_risk IS DISTINCT FROM 'dangerous'
GROUP BY e.source::text
ORDER BY docs DESC
LIMIT 20
E5. Recent entity kind distribution for a source
SELECT kind::text, COUNT(*)
FROM scry.hackernews_items
WHERE original_timestamp >= '2025-01-01'
GROUP BY kind::text
ORDER BY 2 DESC
LIMIT 20
Source-native corpora follow the same pattern:
SELECT kind::text, COUNT(*)
FROM scry.wikipedia_articles
WHERE original_timestamp >= '2025-01-01'
GROUP BY kind::text
ORDER BY 2 DESC
LIMIT 20
Removing the date bound turns this into a large base-table aggregation. Run
/v1/scry/estimate first or prefer source-specific MVs when they already cover
the question.
E6. Hybrid search (lexical + semantic rerank in SQL)
WITH c AS (
SELECT entity_id
FROM scry.search_federated('deceptive alignment',
NULL, ARRAY['post'], 200, 10)
WHERE entity_id IS NOT NULL
)
SELECT e.uri, e.title, e.original_author,
emb.embedding_voyage4 <=> @p_deadbeef_topic AS distance
FROM c
JOIN scry.entities e ON e.id = c.entity_id
JOIN scry.chunk_embeddings emb ON emb.entity_id = c.entity_id AND emb.chunk_index = 0
WHERE e.content_risk IS DISTINCT FROM 'dangerous'
ORDER BY distance
LIMIT 50
Requires a stored embedding handle (@p_deadbeef_topic). See scry-vectors
skill for creating handles.
E7. Cost estimation before execution
curl -s -X POST https://api.scry.io/v1/scry/estimate \
-H "Authorization: Bearer $SCRY_API_KEY" \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT arxiv_id, title FROM scry.arxiv_papers LIMIT 1000"}'
Returns EXPLAIN (FORMAT JSON) output. Use this for expensive queries before committing.
It does not prove BM25 helper health: if scry.search* fails, check
/v1/scry/index-view-status and /v1/scry/schema status as well.
The /v1/scry/context handshake now also exposes lexical_search.status,
status_basis, and last_known_status so you can distinguish stale
observability from confirmed lexical trouble before issuing fallbacks.
E8. Create a shareable artifact
# 1. Run query and capture results
# 2. POST share
curl -s -X POST https://api.scry.io/v1/scry/shares \
-H "Authorization: Bearer $SCRY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"kind": "query",
"title": "Top RLHF posts on LessWrong",
"summary": "20 highest-scored LW posts mentioning RLHF.",
"payload": {
"sql": "...",
"result": {"columns": [...], "rows": [...]}
}
}'
Kinds: query, rerank, insight, chat, markdown.
Progressive update: create stub immediately, then PATCH /v1/scry/shares/{slug}.
Rendered at: https://scry.io/scry/share/{slug}.
E9. Emit a structured agent judgement
curl -s -X POST https://api.scry.io/v1/scry/judgements \
-H "Authorization: Bearer $SCRY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"emitter": "my-agent",
"judgement_kind": "topic_classification",
"target_external_ref": "arxiv:2401.12345",
"summary": "Paper primarily about mechanistic interpretability.",
"payload": {"primary_topic": "mech_interp", "confidence_detail": "title+abstract match"},
"confidence": 0.88,
"tags": ["arxiv", "mech_interp"],
"privacy_level": "self"
}'
Exactly one target required: target_entity_id, target_actor_id,
target_judgement_id, or target_external_ref.
Public judgements must target target_entity_id, target_actor_id, or
target_judgement_id; target_external_ref is limited to self or group
privacy.
Judgement-on-judgement: use target_judgement_id to chain observations.
E10. People / author lookup
-- Per-source author grouping
SELECT a.handle, a.display_name, a.source::text, COUNT(*) AS docs
FROM scry.entities e
JOIN scry.actors a ON a.id = e.author_actor_id
WHERE e.source = 'twitter'
GROUP BY a.handle, a.display_name, a.source::text
ORDER BY docs DESC
LIMIT 50
E11. Thread navigation (replies)
-- Find all replies to a root post
SELECT id, uri, title, original_author, original_timestamp
FROM scry.entities
WHERE anchor_entity_id = 'ROOT_ENTITY_UUID'
ORDER BY original_timestamp
LIMIT 100
anchor_entity_id is the root subject; parent_entity_id is the direct parent.
E12. Count estimation (safe pattern)
Avoid COUNT(*) on large tables. Instead, use schema endpoint row estimates or:
SELECT reltuples::bigint AS estimated_rows
FROM pg_class
WHERE relname = 'mv_lesswrong_posts'
LIMIT 1
Note: pg_class access is blocked on the public Scry SQL surface. Use /v1/scry/schema instead.
F) Error Handling
See references/error-reference.md for the full catalogue. Key patterns:
| HTTP | Code | Meaning | Action |
|---|---|---|---|
| 400 | invalid_request |
SQL parse error, missing LIMIT, bad params | Fix query |
| 401 | unauthorized |
Missing or invalid API key | Check key |
| 402 | insufficient_credits |
Token budget exhausted | Notify user |
| 429 | rate_limited |
Too many requests | Respect Retry-After header |
| 503 | service_unavailable |
Scry pool down or overloaded | Wait and retry |
Auth + timeout diagnostics for CLI users:
- If curl shows HTTP
000, that is client-side timeout/network abort, not a server HTTP status. Check--max-timeand retry with/v1/scry/estimatefirst. - If you see
401with"Invalid authorization format", check for whitespace/newlines in the key:KEY_CLEAN="$(printf '%s' \"$SCRY_API_KEY\" | tr -d '\\r\\n')"then useAuthorization: Bearer $KEY_CLEAN.
Quota fallback strategy:
- If 429: wait
Retry-Afterseconds, retry once. - If 402: tell the user their token budget is exhausted.
- If 503: retry after 30s with exponential backoff (max 3 attempts).
- If query times out: simplify (use MV instead of full table, reduce LIMIT, add tighter WHERE filters).
G) Output Contract
When this skill completes a query task, return a consistent structure:
## Scry Result
**Query**: <natural language description>
**SQL**: ```sql <the SQL that ran> ```
**Rows returned**: <N> (truncated: <yes/no>)
**Duration**: <N>ms
<formatted results table or summary>
**Share**: <share URL if created>
**Caveats**: <any data quality notes, e.g., "score is NULL for arXiv">
Handoff Contract
Produces: JSON with columns, rows, row_count, duration_ms, truncated
Feeds into:
rerank: ensure SQL returnsidandcontent_textcolumns for candidate setsscry-vectors: save entity IDs for embedding lookup and semantic reranking Receives from: none (entry point for SQL-based corpus access)
Related Skills
- scry-vectors -- embed concepts as @handles, search by cosine distance, debias with vector algebra
- scry-rerank -- LLM-powered multi-attribute reranking of candidate sets via pairwise comparison
For detailed schema documentation, see references/schema-guide.md.
For the full pattern library, see references/query-patterns.md.
For error codes and quota details, see references/error-reference.md.