scry

SKILL.md

Scry Skill

Scry gives you read-only SQL access to the ExoPriors public corpus (229M+ entities) via a single HTTP endpoint. You write Postgres SQL against a curated scry.* schema and get JSON rows back. There is no ORM, no GraphQL, no pagination token -- just SQL.

Skill generation: 20260313

A) When to use / not use

Use this skill when:

  • Searching, filtering, or aggregating content across the ExoPriors corpus
  • Running lexical (BM25) or hybrid searches
  • Exploring author networks, cross-platform identities, or publication patterns
  • Navigating the OpenAlex academic graph (authors, citations, institutions, concepts)
  • Creating shareable artifacts from query results
  • Emitting structured agent judgements about entities or external references

Do NOT use this skill when:

  • The user wants semantic/vector search composition or embedding algebra (use the scry-vectors skill)
  • The user wants LLM-based reranking (use the scry-rerank skill)
  • The user is querying their own local database

B) Golden Rules

  1. Context handshake first. At session start, call GET /v1/scry/context?skill_generation=20260313. Use the returned offerings block for the current product summary budgets, canonical env var, default skill, and specialized skill catalog. If should_update_skill=true, tell the user to run npx skills update.

  2. Schema first. ALWAYS call GET /v1/scry/schema before writing SQL. Never guess column names or types. The schema endpoint returns live column metadata and row-count estimates for every view.

  3. Check operational status when search looks wrong. If lexical search, materialized-view freshness, or corpus behavior seems off, call GET /v1/scry/index-view-status before assuming the query or schema is wrong.

  4. Clarify ambiguous intent before heavy queries. If the request is vague ("search Reddit for X", "find things about Y"), ask one short clarification question about the goal/output format before running expensive SQL.

  5. Start with a cheap probe. Before any query likely to run >5s, run /v1/scry/estimate and/or a tight exploratory query (LIMIT 20 plus scoped source/window filters), then scale only after confirming relevance.

  6. Choose lexical vs semantic explicitly. Use lexical (scry.search*) for exact terms and named entities. For conceptual intent ("themes", "things like", "similar to"), route to scry-vectors first, then optionally hybridize.

  7. LIMIT always. Every query MUST include a LIMIT clause. Max 10,000 rows. Queries without LIMIT are rejected by the SQL validator.

  8. Prefer materialized views. scry.entities has 229M+ rows. Scanning it without filters is slow. Use scry.mv_lesswrong_posts, scry.mv_arxiv_papers, scry.mv_hackernews_posts, etc. for targeted access. They are pre-filtered and often have embeddings pre-joined.

  9. Filter dangerous content. Always include WHERE content_risk IS DISTINCT FROM 'dangerous' unless the user explicitly asks for unfiltered results. Dangerous content contains adversarial prompt-injection content.

  10. Raw SQL, not JSON. POST /v1/scry/query takes Content-Type: text/plain with raw SQL in the body. Not JSON-wrapped SQL.

  11. File rough edges promptly. If Scry blocks the task, misses an obvious result set, or exposes a rough edge, submit a brief note to POST /v1/feedback?feedback_type=suggestion|bug|other&channel=scry_skill using Content-Type: text/plain by default (text/markdown also works). Do not silently work around it. Logged-in users can review their submissions with GET /v1/feedback.

For full tier limits, timeout policies, and degradation strategies, see Shared Guardrails.

B.1 API Key Setup (Canonical)

Recommended default for less-technical users: in the directory where you launch the agent, store EXOPRIORS_API_KEY in .env so skills and copied prompts use the same place. Canonical key naming for this skill:

  • Env var: EXOPRIORS_API_KEY
  • Personal key format: exopriors_* with Scry access
printf '%s\n' 'EXOPRIORS_API_KEY=exopriors_...' >> .env
set -a && source .env && set +a

Verify:

echo "$EXOPRIORS_API_KEY"

If using packaged skills, keep them current:

npx skills add exopriors/skills
npx skills update

C) Quickstart

One end-to-end example: find recent high-scoring LessWrong posts about RLHF.

Step 1: Get dynamic context + update advisory
GET https://api.exopriors.com/v1/scry/context?skill_generation=20260313
Authorization: Bearer $EXOPRIORS_API_KEY

Step 2: Get schema
GET https://api.exopriors.com/v1/scry/schema
Authorization: Bearer $EXOPRIORS_API_KEY

Step 3: Run query
POST https://api.exopriors.com/v1/scry/query
Authorization: Bearer $EXOPRIORS_API_KEY
Content-Type: text/plain

WITH hits AS (
  SELECT id FROM scry.search('RLHF reinforcement learning human feedback',
    kinds=>ARRAY['post'], limit_n=>100)
)
SELECT e.uri, e.title, e.original_author, e.original_timestamp, e.score
FROM hits h
JOIN scry.entities e ON e.id = h.id
WHERE e.source = 'lesswrong'
  AND e.content_risk IS DISTINCT FROM 'dangerous'
ORDER BY e.score DESC NULLS LAST
LIMIT 20

Response shape:

{
  "columns": ["uri", "title", "original_author", "original_timestamp", "score"],
  "rows": [["https://...", "My RLHF Post", "author", "2025-01-15T...", 142], ...],
  "row_count": 20,
  "duration_ms": 312,
  "truncated": false
}

D) Decision Tree

User wants to search the ExoPriors corpus?
  |
  +-- Ambiguous / conceptual ask? --> Clarify intent first, then use
  |     scry-vectors for semantic search (optionally hybridize with lexical)
  |
  +-- By keywords/phrases? --> scry.search() (BM25 lexical over canonical content_text)
  |     +-- Specific forum?  --> pass mode='mv_lesswrong_posts' or kinds filter
  |     +-- Reddit?          --> START with scry.search_reddit_posts() /
  |                              scry.search_reddit_comments()
  |     +-- Large result?    --> scry.search_ids() (id+uri+kind, up to 2000)
  |
  +-- By structured filters (source, date, author)? --> Direct SQL on MVs
  |
  +-- By semantic similarity? --> (scry-vectors skill, not this one)
  |
  +-- Hybrid (keywords + semantic rerank)? --> scry.hybrid_search() or
  |     lexical CTE + JOIN scry.embeddings
  |
  +-- Author/people lookup? --> scry.actors, scry.people, scry.person_accounts
  |
  +-- Academic graph (OpenAlex)? --> scry.openalex_find_authors(),
  |     scry.openalex_find_works(), etc. (see schema-guide.md)
  |
  +-- Need to share results? --> POST /v1/scry/shares
  |
  +-- Need to emit a structured observation? --> POST /v1/scry/judgements
  |
  +-- Scry blocked / missing obvious results? --> POST /v1/feedback

E) Recipes

E0. Context handshake + skill update advisory

curl -s "https://api.exopriors.com/v1/scry/context?skill_generation=20260313" \
  -H "Authorization: Bearer $EXOPRIORS_API_KEY"

If response includes "should_update_skill": true, ask the user to run: npx skills update.

E0b. Submit feedback when Scry blocks the task

curl -s "https://api.exopriors.com/v1/feedback?feedback_type=bug&channel=scry_skill" \
  -H "Authorization: Bearer $EXOPRIORS_API_KEY" \
  -H "Content-Type: text/plain" \
  --data $'## What happened\n- Query: ...\n- Problem: ...\n\n## Why it matters\n- ...\n\n## Suggested fix\n- ...'

Success response includes a receipt id. Logged-in users can review their own submissions with:

curl -s "https://api.exopriors.com/v1/feedback?limit=10" \
  -H "Authorization: Bearer $EXOPRIORS_API_KEY"

E1. Lexical search (BM25)

WITH c AS (
  SELECT id FROM scry.search('your query here',
    kinds=>ARRAY['post'], limit_n=>100)
)
SELECT e.uri, e.title, e.original_author, e.original_timestamp
FROM c JOIN scry.entities e ON e.id = c.id
WHERE e.content_risk IS DISTINCT FROM 'dangerous'
LIMIT 50

Default kinds if omitted: ['post','paper','document','webpage','twitter_thread','grant']. scry.search() broadens once to kinds=>ARRAY['comment'] if that default returns 0 rows. Pass explicit kinds for strict scope (for example comment-only or tweet-only). Pass mode=>'mv_lesswrong_posts' to scope to LessWrong posts.

E2. Reddit-specific search

SELECT id, uri, subreddit, original_author, original_timestamp
FROM scry.search_reddit_posts(
  'transformer architecture',
  subreddits=>ARRAY['MachineLearning','LocalLLaMA'],
  limit_n=>50,
  window_key=>'recent'
)
ORDER BY score DESC

Window keys: recent, 2022_2023, 2020_2021, 2018_2019, 2014_2017, 2010_2013, 2005_2009. Also: scry.search_reddit_comments(...).

For semantic Reddit retrieval over the embedding-covered subset: scry.search_reddit_posts_semantic(query_embedding=>..., subreddits=>..., limit_n=>...).

E3. Source-filtered materialized view query

SELECT entity_id, uri, title, original_author, score, original_timestamp
FROM scry.mv_arxiv_papers
WHERE original_timestamp >= '2025-01-01'
ORDER BY original_timestamp DESC
LIMIT 50

E4. Author activity across sources

SELECT e.source::text, COUNT(*) AS docs, MAX(e.original_timestamp) AS latest
FROM scry.entities e
WHERE e.original_author ILIKE '%yudkowsky%'
  AND e.content_risk IS DISTINCT FROM 'dangerous'
GROUP BY e.source::text
ORDER BY docs DESC
LIMIT 20

E5. Entity kind distribution for a source

SELECT kind::text, COUNT(*)
FROM scry.entities
WHERE source = 'hackernews'
GROUP BY kind::text
ORDER BY 2 DESC
LIMIT 20

E6. Hybrid search (lexical + semantic rerank in SQL)

WITH c AS (
  SELECT id FROM scry.search('deceptive alignment',
    kinds=>ARRAY['post'], limit_n=>200)
)
SELECT e.uri, e.title, e.original_author,
       emb.embedding_voyage4 <=> @p_deadbeef_topic AS distance
FROM c
JOIN scry.entities e ON e.id = c.id
JOIN scry.embeddings emb ON emb.entity_id = c.id AND emb.chunk_index = 0
WHERE e.content_risk IS DISTINCT FROM 'dangerous'
ORDER BY distance
LIMIT 50

Requires a stored embedding handle (@p_deadbeef_topic). See scry-vectors skill for creating handles.

E7. Cost estimation before execution

curl -s -X POST https://api.exopriors.com/v1/scry/estimate \
  -H "Authorization: Bearer $EXOPRIORS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT id, title FROM scry.mv_arxiv_papers LIMIT 1000"}'

Returns EXPLAIN (FORMAT JSON) output. Use this for expensive queries before committing.

E8. Create a shareable artifact

# 1. Run query and capture results
# 2. POST share
curl -s -X POST https://api.exopriors.com/v1/scry/shares \
  -H "Authorization: Bearer $EXOPRIORS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "kind": "query",
    "title": "Top RLHF posts on LessWrong",
    "summary": "20 highest-scored LW posts mentioning RLHF.",
    "payload": {
      "sql": "...",
      "result": {"columns": [...], "rows": [...]}
    }
  }'

Kinds: query, rerank, insight, chat, markdown. Progressive update: create stub immediately, then PATCH /v1/scry/shares/{slug}. Rendered at: https://scry.io/scry/share/{slug}.

E9. Emit a structured agent judgement

curl -s -X POST https://api.exopriors.com/v1/scry/judgements \
  -H "Authorization: Bearer $EXOPRIORS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "emitter": "my-agent",
    "judgement_kind": "topic_classification",
    "target_external_ref": "arxiv:2401.12345",
    "summary": "Paper primarily about mechanistic interpretability.",
    "payload": {"primary_topic": "mech_interp", "confidence_detail": "title+abstract match"},
    "confidence": 0.88,
    "tags": ["arxiv", "mech_interp"],
    "privacy_level": "public"
  }'

Exactly one target required: target_entity_id, target_actor_id, target_judgement_id, or target_external_ref. Judgement-on-judgement: use target_judgement_id to chain observations.

E10. People / author lookup

-- Per-source author grouping
SELECT a.handle, a.display_name, a.source::text, COUNT(*) AS docs
FROM scry.entities e
JOIN scry.actors a ON a.id = e.author_actor_id
WHERE e.source = 'twitter'
GROUP BY a.handle, a.display_name, a.source::text
ORDER BY docs DESC
LIMIT 50

E11. Thread navigation (replies)

-- Find all replies to a root post
SELECT id, uri, title, original_author, original_timestamp
FROM scry.entities
WHERE anchor_entity_id = 'ROOT_ENTITY_UUID'
ORDER BY original_timestamp
LIMIT 100

anchor_entity_id is the root subject; parent_entity_id is the direct parent.

E12. Count estimation (safe pattern)

Avoid COUNT(*) on large tables. Instead, use schema endpoint row estimates or:

SELECT reltuples::bigint AS estimated_rows
FROM pg_class
WHERE relname = 'mv_lesswrong_posts'
LIMIT 1

Note: pg_class access is blocked on the public Scry SQL surface. Use /v1/scry/schema instead.

F) Error Handling

See references/error-reference.md for the full catalogue. Key patterns:

HTTP Code Meaning Action
400 invalid_request SQL parse error, missing LIMIT, bad params Fix query
401 unauthorized Missing or invalid API key Check key
402 insufficient_credits Token budget exhausted Notify user
429 rate_limited Too many requests Respect Retry-After header
503 service_unavailable Scry pool down or overloaded Wait and retry

Auth + timeout diagnostics for CLI users:

  1. If curl shows HTTP 000, that is client-side timeout/network abort, not a server HTTP status. Check --max-time and retry with /v1/scry/estimate first.
  2. If you see 401 with "Invalid authorization format", check for whitespace/newlines in the key: KEY_CLEAN="$(printf '%s' \"$EXOPRIORS_API_KEY\" | tr -d '\\r\\n')" then use Authorization: Bearer $KEY_CLEAN.

Quota fallback strategy:

  1. If 429: wait Retry-After seconds, retry once.
  2. If 402: tell the user their token budget is exhausted.
  3. If 503: retry after 30s with exponential backoff (max 3 attempts).
  4. If query times out: simplify (use MV instead of full table, reduce LIMIT, add tighter WHERE filters).

G) Output Contract

When this skill completes a query task, return a consistent structure:

## Scry Result

**Query**: <natural language description>
**SQL**: ```sql <the SQL that ran> ```
**Rows returned**: <N> (truncated: <yes/no>)
**Duration**: <N>ms

<formatted results table or summary>

**Share**: <share URL if created>
**Caveats**: <any data quality notes, e.g., "score is NULL for arXiv">

Handoff Contract

Produces: JSON with columns, rows, row_count, duration_ms, truncated Feeds into:

  • rerank: ensure SQL returns id and content_text columns for candidate sets
  • scry-vectors: save entity IDs for embedding lookup and semantic reranking Receives from: none (entry point for SQL-based corpus access)

Related Skills

  • scry-vectors -- embed concepts as @handles, search by cosine distance, debias with vector algebra
  • scry-rerank -- LLM-powered multi-attribute reranking of candidate sets via pairwise comparison

For detailed schema documentation, see references/schema-guide.md. For the full pattern library, see references/query-patterns.md. For error codes and quota details, see references/error-reference.md.

Weekly Installs
21
First Seen
Feb 28, 2026
Installed on
codex21
opencode20
gemini-cli20
claude-code20
github-copilot20
amp20