turbo-builder
Pipeline Builder
Boundaries
- Build NEW pipelines. Do not diagnose broken pipelines — that belongs to
/turbo-doctor. - Do not serve as a YAML reference. If the user only needs to look up a field or syntax, use the
/turbo-pipelinesskill instead. - For dataset lookups, use
/datasets.
Walk the user through building a complete pipeline from scratch, step by step. Generate a valid YAML configuration, validate it, and deploy it.
Builder Workflow
Step 1: Verify Authentication
Run goldsky project list 2>&1 to check login status.
- If logged in: Note the current project and continue.
- If not logged in: Use the
/auth-setupskill for guidance.
Step 2: Understand the Goal
Ask the user what they want to index. Good questions:
- What blockchain/chain? (Ethereum, Base, Polygon, Solana, etc.)
- What data? (transfers, swaps, events from a specific contract, all transactions, etc.)
- Where should the data go? (PostgreSQL, ClickHouse, Kafka, S3, etc.)
- Do they need transforms? (filtering, aggregation, enrichment)
- One-time backfill or continuous streaming?
If the user already described their goal, extract answers from their description.
Step 3: Choose the Dataset
Use the /datasets skill to find the right dataset.
Key points:
- Common datasets:
<chain>.decoded_logs,<chain>.raw_transactions,<chain>.erc20_transfers,<chain>.raw_traces - For decoded contract events, use
<chain>.decoded_logswith a filter onaddressandtopic0 - For Solana: use
solana.transactions,solana.token_transfers, etc.
Present the dataset choice to the user for confirmation.
Step 4: Configure the Source
Build the source section of the YAML:
sources:
my_source:
type: dataset
dataset_name: <chain>.<dataset>
version: 1.0.0
start_at: earliest # or a specific block number
Ask about:
- Start block:
earliest(from genesis),latest(from now), or a specific block number - End block: Only for job-mode/backfill pipelines. Omit for streaming.
- Source-level filter: Optional filter to reduce data at the source (e.g., specific contract address)
Step 5: Configure Transforms (if needed)
If the user needs transforms, use the /turbo-transforms skill to help:
- SQL transforms — filter, aggregate, join, or reshape data using DataFusion SQL
- TypeScript transforms — custom logic, external API calls, complex processing
- Dynamic tables — join with a PostgreSQL table or in-memory allowlist
Build the transforms section:
transforms:
my_transform:
type: sql
primary_key: id
sql: |
SELECT * FROM my_source
WHERE <conditions>
Step 6: Configure the Sink(s)
Ask where the data should go. Use the /turbo-pipelines skill for sink configuration:
| Sink | Key config |
|---|---|
| PostgreSQL | secret_name, schema, table, primary_key |
| ClickHouse | secret_name, table, order_by |
| Kafka | secret_name, topic |
| S3 | bucket, region, prefix, format |
| Webhook | url, format |
If the user names more than one destination, generate ONE pipeline with multiple sinks — do not generate a separate pipeline per destination. Each sink has a from: field that references the source (or a transform) by name, and sinks run independently. Use a fan-out pattern when different sinks want different views of the same source — add an SQL transform per view, then point each sink's from: at the appropriate transform. See references/architecture-patterns.md in /turbo-pipelines and templates/multi-sink-pipeline.yaml for examples.
Only split into separate pipelines when sources are fundamentally different (e.g., different chains with independent lifecycles) or the user explicitly asks for separate pipelines.
For sinks requiring secret_name, check if the secret exists:
goldsky secret list
If it doesn't exist, help create it using the /secrets skill.
Step 7: Choose Mode
Use the /turbo-pipelines skill for guidance:
- Streaming (default) — continuous processing, no
end_block, runs indefinitely - Job mode — one-time backfill, set
job: trueandend_block
Step 8: Generate, Validate, and Present
Assemble the complete pipeline YAML. Use a descriptive name following the convention: <chain>-<data>-<sink> (e.g., base-erc20-transfers-postgres).
- Write the YAML file to disk (e.g.,
<pipeline-name>.yaml). - Run validation BEFORE showing the YAML to the user:
goldsky turbo validate -f <pipeline-name>.yaml
-
If validation fails, fix the issues and re-validate. Do NOT present the YAML until validation passes. Common fixes:
- Missing
versionfield on dataset source - Invalid dataset name (check chain prefix)
- Missing
secret_namefor database sinks - SQL syntax errors in transforms
- Missing
-
Once validation passes, present the full YAML to the user for review.
Step 9: Deploy
After user confirms the YAML looks good:
goldsky turbo apply <pipeline-name>.yaml
Step 10: Verify
After deployment:
goldsky turbo list
Suggest running inspect to verify data flow:
goldsky turbo inspect <pipeline-name> -p
To filter to a specific node: goldsky turbo inspect <pipeline-name> -n <node-name> -p.
Present a summary:
## Pipeline Deployed
**Name:** [name]
**Chain:** [chain]
**Dataset:** [dataset]
**Sink:** [sink type]
**Mode:** [streaming/job]
**Next steps:**
- Verify data flow with `goldsky turbo inspect <name> -p`
- Check logs with `goldsky turbo logs <name>`
- Use /turbo-doctor if you run into issues
Important Rules
- Always validate before presenting complete YAML to the user. Never show unvalidated complete pipeline YAML.
- Always validate before deploying.
- Always show the user the complete YAML before deploying.
- For job-mode pipelines, remind the user they auto-cleanup ~1hr after completion.
- Use
blackholesink for testing pipelines without writing to a real destination. - If the user wants to modify an existing pipeline, check if it's streaming (update in place) or job-mode (must delete first).
- Default to
start_at: earliestunless the user specifies otherwise. - Always include
version: 1.0.0on dataset sources.
Related
/turbo-pipelines— YAML configuration and architecture reference/turbo-doctor— Diagnose and fix pipeline issues/turbo-operations— Lifecycle commands and monitoring reference/turbo-transforms— SQL and TypeScript transform reference/datasets— Dataset names and chain prefixes/secrets— Sink credential management
More from goldsky-io/goldsky-agent
turbo-pipelines
Turbo pipeline YAML reference and architecture guide. Covers: YAML field syntax (start_at, from, version, primary_key), source/transform/sink configuration, validation errors, resource sizing (xs–xxl), architecture decisions (dataset vs kafka, streaming vs job, fan-out vs fan-in, sink selection, pipeline splitting). Triggers on: 'what does field X do', 'what fields does a postgres sink need', 'what resource size', 'should I use kafka or dataset', 'how to structure my pipeline'. For writing transforms, use /turbo-transforms. For end-to-end building, use /turbo-builder.
39secrets
Use this skill when a user wants to store, manage, or work with Goldsky secrets — the named credential objects used by pipeline sinks. This includes: creating a new secret from a connection string or credentials, listing or inspecting existing secrets, updating or rotating credentials after a password change, and deleting secrets that are no longer needed. Trigger for any query where the user mentions 'goldsky secret', wants to securely store database credentials for a pipeline, or is working with sink authentication for PostgreSQL, Neon, Supabase, ClickHouse, Kafka, S3, Elasticsearch, DynamoDB, SQS, OpenSearch, or webhooks.
34datasets
Use this skill when the user needs to look up or verify Goldsky blockchain dataset names, chain prefixes, dataset types, or versions. Triggers on questions like 'what\\'s the dataset name for X?', 'what prefix does Goldsky use for chain Y?', 'what version should I use for Z?', or 'what datasets are available for Solana/Stellar/Arbitrum/etc?'. Also use for chain-specific dataset questions (e.g., polygon vs matic prefix, stellarnet balance datasets, solana token transfer dataset names). Do NOT trigger for questions about CLI commands, pipeline setup, or general Goldsky architecture unless the core question is about finding the right dataset name or chain prefix.
34turbo-doctor
Diagnose and fix broken Goldsky Turbo pipelines interactively. Triggers on: pipeline in error state, stuck starting, connection refused, not getting data, duplicate rows, missing fields, slow backfill, or any named pipeline misbehaving. Runs logs/status commands, identifies root cause, and offers fixes. For CLI syntax or error pattern lookup without an active problem, use /turbo-operations instead.
34turbo-transforms
Write SQL, TypeScript, and dynamic table transforms for Turbo pipelines. Covers: decoding EVM logs with _gs_log_decode, filtering/casting blockchain data, UNION ALL for combining events, TypeScript/WASM transforms (invoke function), dynamic lookup tables (dynamic_table_check), transform chaining, and Solana decoding. Triggers on: 'decode Transfer events', 'write a SQL transform', 'filter by contract', 'TypeScript transform', 'dynamic table', 'UNION ALL'. For pipeline YAML structure, use /turbo-pipelines. For end-to-end building, use /turbo-builder.
33auth-setup
Set up Goldsky CLI authentication and project configuration. Use this skill when the user needs to: install the goldsky CLI (what's the official install command?), run goldsky login (including when the browser opens but 'authentication failed'), run goldsky project list and see 'not logged in' or 'unauthorized', switch between Goldsky projects, check which project they're currently authenticated to, or fix 'unauthorized' errors when running goldsky turbo commands. Also use for 'walk me through setting up goldsky CLI from scratch for the first time'. If any other Goldsky skill hits an auth error, redirect here first.
32