turbo-builder

Installation

SKILL.md

Pipeline Builder

Boundaries

Build NEW pipelines. Do not diagnose broken pipelines — that belongs to /turbo-doctor.
Do not serve as a YAML reference. If the user only needs to look up a field or syntax, use the /turbo-pipelines skill instead.
For dataset lookups, use /datasets.

Walk the user through building a complete pipeline from scratch, step by step. Generate a valid YAML configuration, validate it, and deploy it.

Builder Workflow

Step 1: Verify Authentication

Run goldsky project list 2>&1 to check login status.

If logged in: Note the current project and continue.
If not logged in: Use the /auth-setup skill for guidance.

Step 2: Understand the Goal

Ask the user what they want to index. Good questions:

What blockchain/chain? (Ethereum, Base, Polygon, Solana, etc.)
What data? (transfers, swaps, events from a specific contract, all transactions, etc.)
Where should the data go? (PostgreSQL, ClickHouse, Kafka, S3, etc.)
Do they need transforms? (filtering, aggregation, enrichment)
One-time backfill or continuous streaming?

If the user already described their goal, extract answers from their description.

Step 3: Choose the Dataset

Use the /datasets skill to find the right dataset.

Key points:

Common datasets: <chain>.decoded_logs, <chain>.raw_transactions, <chain>.erc20_transfers, <chain>.raw_traces
For decoded contract events, use <chain>.decoded_logs with a filter on address and topic0
For Solana: use solana.transactions, solana.token_transfers, etc.

Present the dataset choice to the user for confirmation.

Step 4: Configure the Source

Build the source section of the YAML:

sources:
  my_source:
    type: dataset
    dataset_name: <chain>.<dataset>
    version: 1.0.0
    start_at: earliest  # or a specific block number

Ask about:

Start block: earliest (from genesis), latest (from now), or a specific block number
End block: Only for job-mode/backfill pipelines. Omit for streaming.
Source-level filter: Optional filter to reduce data at the source (e.g., specific contract address)

Step 5: Configure Transforms (if needed)

If the user needs transforms, use the /turbo-transforms skill to help:

SQL transforms — filter, aggregate, join, or reshape data using DataFusion SQL
TypeScript transforms — custom logic, external API calls, complex processing
Dynamic tables — join with a PostgreSQL table or in-memory allowlist

Build the transforms section:

transforms:
  my_transform:
    type: sql
    primary_key: id
    sql: |
      SELECT * FROM my_source
      WHERE <conditions>

Step 6: Configure the Sink(s)

Ask where the data should go. Use the /turbo-pipelines skill for sink configuration:

Sink	Key config
PostgreSQL	`secret_name`, `schema`, `table`, `primary_key`
ClickHouse	`secret_name`, `table`, `order_by`
Kafka	`secret_name`, `topic`
S3	`bucket`, `region`, `prefix`, `format`
Webhook	`url`, `format`

If the user names more than one destination, generate ONE pipeline with multiple sinks — do not generate a separate pipeline per destination. Each sink has a from: field that references the source (or a transform) by name, and sinks run independently. Use a fan-out pattern when different sinks want different views of the same source — add an SQL transform per view, then point each sink's from: at the appropriate transform. See references/architecture-patterns.md in /turbo-pipelines and templates/multi-sink-pipeline.yaml for examples.

Only split into separate pipelines when sources are fundamentally different (e.g., different chains with independent lifecycles) or the user explicitly asks for separate pipelines.

For sinks requiring secret_name, check if the secret exists:

goldsky secret list

If it doesn't exist, help create it using the /secrets skill.

Step 7: Choose Mode

Use the /turbo-pipelines skill for guidance:

Streaming (default) — continuous processing, no end_block, runs indefinitely
Job mode — one-time backfill, set job: true and end_block

Step 8: Generate, Validate, and Present

Assemble the complete pipeline YAML. Use a descriptive name following the convention: <chain>-<data>-<sink> (e.g., base-erc20-transfers-postgres).

Write the YAML file to disk (e.g., <pipeline-name>.yaml).
Run validation BEFORE showing the YAML to the user:

goldsky turbo validate -f <pipeline-name>.yaml

If validation fails, fix the issues and re-validate. Do NOT present the YAML until validation passes. Common fixes:
- Missing version field on dataset source
- Invalid dataset name (check chain prefix)
- Missing secret_name for database sinks
- SQL syntax errors in transforms
Once validation passes, present the full YAML to the user for review.

Step 9: Deploy

After user confirms the YAML looks good:

goldsky turbo apply <pipeline-name>.yaml

Step 10: Verify

After deployment:

goldsky turbo list

Suggest running inspect to verify data flow:

goldsky turbo inspect <pipeline-name> -p

To filter to a specific node: goldsky turbo inspect <pipeline-name> -n <node-name> -p.

Present a summary:

## Pipeline Deployed

**Name:** [name]
**Chain:** [chain]
**Dataset:** [dataset]
**Sink:** [sink type]
**Mode:** [streaming/job]

**Next steps:**
- Verify data flow with `goldsky turbo inspect <name> -p`
- Check logs with `goldsky turbo logs <name>`
- Use /turbo-doctor if you run into issues

Important Rules

Always validate before presenting complete YAML to the user. Never show unvalidated complete pipeline YAML.
Always validate before deploying.
Always show the user the complete YAML before deploying.
For job-mode pipelines, remind the user they auto-cleanup ~1hr after completion.
Use blackhole sink for testing pipelines without writing to a real destination.
If the user wants to modify an existing pipeline, check if it's streaming (update in place) or job-mode (must delete first).
Default to start_at: earliest unless the user specifies otherwise.
Always include version: 1.0.0 on dataset sources.

/turbo-pipelines — YAML configuration and architecture reference
/turbo-doctor — Diagnose and fix pipeline issues
/turbo-operations — Lifecycle commands and monitoring reference
/turbo-transforms — SQL and TypeScript transform reference
/datasets — Dataset names and chain prefixes
/secrets — Sink credential management

Related skills

More from goldsky-io/goldsky-agent

Installs

Repository

goldsky-io/goldsky-agent

GitHub Stars

First Seen

Mar 9, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

turbo-builder

Pipeline Builder

Boundaries

Builder Workflow

Step 1: Verify Authentication

Step 2: Understand the Goal

Step 3: Choose the Dataset

Step 4: Configure the Source

Step 5: Configure Transforms (if needed)

Step 6: Configure the Sink(s)

Step 7: Choose Mode

Step 8: Generate, Validate, and Present

Step 9: Deploy

Step 10: Verify

Important Rules

Related

More from goldsky-io/goldsky-agent

turbo-pipelines

secrets

datasets

turbo-doctor

turbo-transforms

auth-setup