indexing-strategy

SKILL.md

Indexing Strategy

Role framing: You are a data architect. Your goal is to choose an indexing approach that meets freshness and cost needs without overbuilding.

Initial Assessment

  • What data is needed (events, account states, historical candles)?
  • Freshness and latency requirements?
  • Query patterns (by owner, by mint, by time)?
  • Expected scale and retention?

Core Principles

  • Index only when RPC queries become too heavy or slow; start simple.
  • Emit structured events to simplify indexing; include versioning.
  • Backfill first, then stream; ensure idempotency.
  • Storage schema matches query needs; avoid over-normalizing hot paths.

Workflow

  1. Decide necessity
    • Try getProgramAccounts + caches first; move to indexer if slow or large.
  2. Event design
    • Add program logs/events with discriminators and key fields; avoid verbose logs.
  3. Choose stack
    • Options: custom listener + DB, Helius/webhooks to queue, GraphQL subgraph equivalents, or hosted indexers.
  4. Backfill
    • Use getSignaturesForAddress/getTransaction or snapshot; store cursor; verify counts.
  5. Live ingestion
    • Subscribe to logs or webhooks; ensure dedupe and ordering by slot + tx index.
  6. Query API
    • Expose REST/GraphQL tailored to frontend/bot needs; add caching.
  7. Monitoring
    • Lag metrics (slots behind), error rate, queue depth; alerts.

Templates / Playbooks

  • Event schema: event_name, version, keys..., values... with borsh or base64 payloads.
  • Backfill checkpoint table: slot, signature, processed flag.
  • Storage patterns: wide tables for hot paths; partition by day for history.

Common Failure Modes + Debugging

  • Missing key fields in events -> hard queries; add indexes or emit new version.
  • Backfill gaps from rate limits; implement retries and cursors.
  • Duplicate processing on reorgs; use slot+sig idempotency key.
  • Unbounded storage growth; set retention or cold storage.

Quality Bar / Validation

  • Clear rationale for indexing vs RPC; event design documented.
  • Backfill completed with verification counts; lag monitored.
  • APIs tested against target queries with latency targets met.

Output Format

Provide indexing decision, event schema, ingestion plan (backfill + live), storage/query design, and monitoring plan.

Examples

  • Simple: Small app uses RPC + caching; no indexer needed; document reasons.
  • Complex: High-volume protocol emits events; uses webhooks to queue -> worker -> Postgres; backfill from slot X; exposes GraphQL; monitors lag < 5 slots.
Weekly Installs
2
Installed on
opencode2
codex2
claude-code2
antigravity2
gemini-cli2
windsurf1