build-data-pipeline
Preamble (run first)
_TEL_TIER=$(cat ~/.superstack/config.json 2>/dev/null | grep -o '"telemetryTier": *"[^"]*"' | head -1 | sed 's/.*"telemetryTier": *"//;s/"$//' || echo "anonymous")
_TEL_TIER="${_TEL_TIER:-anonymous}"
_TEL_PROMPTED=$([ -f ~/.superstack/.telemetry-prompted ] && echo "yes" || echo "no")
_TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
mkdir -p ~/.superstack
echo "TELEMETRY: $_TEL_TIER"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
if [ "$_TEL_TIER" != "off" ]; then
_TEL_EVENT='{"skill":"build-data-pipeline","phase":"build","event":"started","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"}'
echo "$_TEL_EVENT" >> ~/.superstack/telemetry.jsonl 2>/dev/null || true
_CONVEX_URL=$(cat ~/.superstack/config.json 2>/dev/null | grep -o '"convexUrl":"[^"]*"' | head -1 | cut -d'"' -f4 || echo "")
[ -n "$_CONVEX_URL" ] && curl -s -X POST "$_CONVEX_URL/api/mutation" -H "Content-Type: application/json" -d '{"path":"telemetry:track","args":{"skill":"build-data-pipeline","phase":"build","status":"success","version":"0.2.0","platform":"'$(uname -s)-$(uname -m)'","timestamp":'$(date +%s)000'}}' >/dev/null 2>&1 &
true
fi
If TEL_PROMPTED is no: Before starting the skill workflow, ask the user about telemetry.
Use AskUserQuestion:
Help superstack get better! We track which skills get used and how long they take — no code, no file paths, no PII. Change anytime in
~/.superstack/config.json.
Options:
- A) Sure, help superstack improve (anonymous)
- B) No thanks
If A: run this bash:
echo '{"telemetryTier":"anonymous"}' > ~/.superstack/config.json
_TEL_TIER="anonymous"
touch ~/.superstack/.telemetry-prompted
If B: run this bash:
echo '{"telemetryTier":"off"}' > ~/.superstack/config.json
_TEL_TIER="off"
touch ~/.superstack/.telemetry-prompted
This only happens once. If TEL_PROMPTED is yes, skip this entirely and proceed to the skill workflow.
Wrong skill? See SKILL_ROUTER.md for all available skills.
Build Data Pipeline
Overview
Guide the user through building a data pipeline that ingests, transforms, and stores Solana on-chain data. Covers real-time event streaming via webhooks and WebSockets, historical backfilling, account state indexing, and building query-friendly storage. Uses Helius where it is the best fit for production-grade ingestion, while keeping the storage and indexing design provider-agnostic.
Workflow
- Check for
.superstack/build-context.md. If found, use stack decisions. If not, ask: what data do you need (transactions, account state, token transfers, program events)? Real-time, historical, or both? Write.superstack/build-context.mdwith the context gathered so future skills can use it. - Read references/indexing-patterns.md to select the right ingestion method.
- Read references/data-storage.md to design the storage schema.
- Implement in milestones: a. Set up data ingestion (Helius webhook, WebSocket subscription, or geyser plugin) b. Build the parser/transformer for your specific data format c. Design and create the database schema d. Implement write path (ingestion → parse → store) e. Build query API or dashboard on top of stored data
- Test with live devnet data, then switch to mainnet when ready.
- Monitor ingestion lag and handle missed events with backfill logic.
Non-Negotiables
- Always implement idempotent writes — webhooks and WebSockets can deliver duplicates.
- Include a backfill mechanism — you will miss events during deploys, restarts, and outages.
- Never store raw transaction blobs without parsing — they are expensive to query later.
- Use Helius enhanced transactions for parsed data instead of raw RPC when possible.
- Monitor ingestion lag — if your pipeline falls behind, you need alerts, not silent data loss.
- Store the slot number with every record for ordering and deduplication.
Phase Handoff
This skill is Phase 2 (Build) in the Idea → Build → Launch journey.
Reads: .superstack/build-context.md
Writes/Updates: .superstack/build-context.md (creates if missing) with:
pipeline.ingestion_method: "webhook" | "websocket" | "geyser" | "rpc-polling"pipeline.data_types: string[] (e.g., ["transactions", "account-state", "token-transfers"])pipeline.storage: "postgresql" | "redis" | "custom"pipeline.backfill_implemented: boolean
When updating, deep-merge — don't overwrite existing fields.
See ../../data/specs/phase-handoff.md for the full JSON contract.
Quick Start
# Fastest: Helius webhooks (no infrastructure needed)
# 1. Get Helius API key from dashboard.helius.dev
# 2. Create webhook:
curl -X POST https://api-mainnet.helius-rpc.com/v0/webhooks?api-key=YOUR_KEY \
-H 'Content-Type: application/json' \
-d '{
"webhookURL": "https://your-app.com/webhook",
"transactionTypes": ["TRANSFER"],
"accountAddresses": ["YOUR_PROGRAM_ID"]
}'
# For WebSocket (real-time, more control):
# Use: solana logs <PROGRAM_ID> --url mainnet-beta
# Or: @solana/web3.js Connection.onLogs()
Decision Points
- Which ingestion method? Webhooks (simplest, Helius) → WebSocket (real-time) → Geyser (highest throughput) → Polling (last resort).
- Which RPC? Helius is required if you want Helius webhooks or Helius-only enhanced APIs. DAS is available from multiple providers (for example Helius, MetaMask/Infura, and QuickNode), so choose based on pricing, limits, and where the rest of your stack already lives.
- Database? PostgreSQL for relational data + transactions. Redis for caching + real-time state. SQLite for small/local indexers.
- Hosting? Railway or Fly.io for webhook receivers. AWS/GCP for Geyser plugins.
Resources
references/
Telemetry (run last)
After the skill workflow completes (success, error, or abort), log the telemetry event.
Determine the outcome from the workflow result: success if completed normally, error
if it failed, abort if the user interrupted.
Run this bash:
_TEL_END=$(date +%s)
_TEL_DUR=$(( _TEL_END - ${_TEL_START:-$_TEL_END} ))
_TEL_TIER=$(cat ~/.superstack/config.json 2>/dev/null | grep -o '"telemetryTier": *"[^"]*"' | head -1 | sed 's/.*"telemetryTier": *"//;s/"$//' || echo "anonymous")
if [ "$_TEL_TIER" != "off" ]; then
echo '{"skill":"build-data-pipeline","phase":"build","event":"completed","outcome":"OUTCOME","duration_s":"'"$_TEL_DUR"'","session":"'"$_SESSION_ID"'","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","platform":"'$(uname -s)-$(uname -m)'"}' >> ~/.superstack/telemetry.jsonl 2>/dev/null || true
true
fi
Replace OUTCOME with success/error/abort based on the workflow result.
More from sendaifun/solana-new
submit-to-hackathon
Prepare and optimize a hackathon submission for a Solana project. Use when a user says "submit to hackathon", "prepare my submission", "hackathon entry", "write project description", "demo video", or "help me win the hackathon". Reads all prior phase context if available.
8colosseum-copilot
Search and analyze 5,400+ Solana hackathon projects using Colosseum Copilot. Find similar projects, discover winner patterns, identify gaps, and explore ML clusters. Use when a user says "colosseum copilot", "hackathon projects", "winner patterns", "gap analysis hackathon", "similar Solana projects", or "colosseum landscape". Requires a Colosseum Copilot token.
8create-pitch-deck
Create a structured pitch deck for a crypto project. Use when a user says "create a pitch deck", "help me pitch", "I need slides", "prepare for demo day", "investor presentation", or "grant application". Reads idea-context.md and build-context.md from prior phases if available.
8marketing-video
Create marketing videos for Solana projects using Remotion (code-driven) and Renoise (AI-generated). Use when a user says "marketing video", "product video", "promo video", "deck review", "video pitch", "create a video", or "Remotion project".
7deploy-to-mainnet
Guide a Solana project from devnet to mainnet production deployment. Use when a user says "deploy to mainnet", "go to production", "deployment checklist", "prepare for launch", "mainnet deployment", or "ship it". Reads build-context.md from a prior build phase if available.
7find-next-crypto-idea
Interview users sharply to discover, rank, or validate what they should build in crypto. Use when a user asks what to build in crypto, wants startup ideas in a crypto niche such as DeFi or AI x crypto, wants blunt feedback on an existing crypto idea, or wants a concrete artifact comparing the best next ideas. Treat the bundled idea datasets as inspiration, not constraints, and always combine them with fresh market research.
7