mirror
Goldsky Mirror Pipelines
Mirror is Goldsky's original streaming pipeline product. It reads onchain data from a source (a subgraph entity or a direct-indexing dataset), optionally applies transforms, and writes the result to a sink (your database or message queue).
Mirror vs Turbo — which should you use?
| Mirror | Turbo | |
|---|---|---|
| Subgraph sources | Yes | No |
| Speed & reliability | Good | Faster, more reliable |
| Sink variety | 11 sink types | Growing — new sinks added regularly |
| Config complexity | Moderate | Simpler YAML |
| Dataset coverage | 130+ chains | 130+ chains, richer catalog |
Use Turbo unless you need a subgraph source. Turbo is faster, more reliable, and actively gaining feature parity with Mirror — especially sink support. If you don't have a subgraph requirement, say "help me build a Turbo pipeline" and the /turbo-builder skill will guide you through a faster setup.
How Mirror Pipelines Work
Source (subgraph entity or direct-indexing dataset)
↓
Transforms (optional SQL or external handlers)
↓
Sink (PostgreSQL, ClickHouse, Kafka, S3, etc.)
A pipeline is defined in a YAML file (apiVersion: 3) and deployed with goldsky pipeline apply.
Pipeline YAML Structure
Top-level fields:
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | yes | Lowercase letters, numbers, hyphens only. Under 50 characters. |
apiVersion |
number | yes | Always 3 |
resource_size |
string | no | s (default), m, l, xl, xxl |
description |
string | no | Pipeline description |
sources |
object | yes | At least one source |
transforms |
object | no | Use {} if none needed |
sinks |
object | yes | At least one sink |
Sources
| Source type | YAML type value |
Description |
|---|---|---|
| Subgraph entity | subgraph_entity |
Mirror data from Goldsky-hosted subgraphs |
| Dataset (direct indexing) | dataset |
Raw onchain datasets (blocks, logs, transactions, traces, transfers) |
Subgraph entity source
sources:
subgraph_account:
type: subgraph_entity
name: account # Entity name in your subgraph
start_at: latest # "earliest" or "latest" (default: latest)
filter: "" # Optional SQL WHERE clause for fast scan
subgraphs:
- name: my-subgraph # Deployed subgraph name
version: 1.0.0
- name: my-subgraph-arb # Cross-chain: add more subgraphs
version: 1.0.0
Fields: type (required: subgraph_entity), name (required: entity name), subgraphs (required: list of {name, version}), start_at (optional), filter (optional), description (optional).
Dataset source
sources:
base_logs:
type: dataset
dataset_name: base.logs # Use `goldsky dataset list` to discover names
version: 1.0.0 # Use `goldsky dataset get <name>` for versions
start_at: latest # "earliest" or "latest" (default: latest)
filter: "address = '0x...'" # Optional — enables Fast Scan for backfills
Fields: type (required: dataset), dataset_name (required), version (required), start_at (optional), filter (optional), description (optional).
Fast Scan: When filter is defined on a dataset source with start_at: earliest, the filter is pre-applied at the source level, making historical backfill much faster. Use attributes that exist in the dataset schema (goldsky dataset get <dataset_name> to check).
See docs.goldsky.com/mirror/sources/supported-sources.
Sinks
| Sink | YAML type value |
Notes |
|---|---|---|
| PostgreSQL | postgres |
Most common — OLTP, auto-creates tables, upsert via INSERT ON CONFLICT. Hosted option via NeonDB. |
| ClickHouse | clickhouse |
OLAP — uses ReplacingMergeTree by default, append_only_mode for best performance |
| MySQL | mysql |
OLTP workloads |
| Elasticsearch | elasticsearch |
Real-time search and analytics |
| Kafka | kafka |
High-throughput streaming to a topic, configurable topic_partitions |
| Object Storage | file |
S3, GCS, or R2 — Parquet format, append-only, supports partition_columns |
| AWS SQS | sqs |
Message queuing |
| Webhook | webhook |
HTTP POST to an external endpoint |
All sinks writing to user-managed destinations require a Goldsky Secret (secret_name). Create one with goldsky secret create.
Sinks support schema_override for casting column types at the sink level (e.g., string to jsonb).
Sink examples
# PostgreSQL
sinks:
my_pg:
type: postgres
table: transfers
schema: public
secret_name: MY_PG_SECRET
from: my_transform
# ClickHouse
sinks:
my_ch:
type: clickhouse
table: transfers
database: my_db
secret_name: MY_CH_SECRET
from: my_source
# Kafka
sinks:
my_kafka:
type: kafka
topic: accounts
topic_partitions: 2
secret_name: MY_KAFKA_SECRET
from: my_source
# Object Storage (S3/GCS/R2)
sinks:
my_s3:
type: file
path: s3://bucket/path/
format: parquet
secret_name: MY_S3_SECRET
from: my_source
# SQS
sinks:
my_sqs:
type: sqs
url: https://sqs.us-east-1.amazonaws.com/123456/my-queue
secret_name: MY_SQS_SECRET
from: my_source
See docs.goldsky.com/mirror/sinks/supported-sinks.
Transforms
| Type | YAML type value |
Description |
|---|---|---|
| SQL | (none — default) | Filter, join, or reshape records with SQL |
| External handler | handler |
POST records to an HTTP endpoint for custom logic |
SQL transform
transforms:
filtered_logs:
sql: SELECT id, block_number, address FROM base_logs WHERE block_number > 1000
primary_key: id
SQL transforms reference source or transform names as table names. Supports chaining (one transform reads from another).
Built-in decode functions:
_gs_log_decode(abi, topics, data)— decode raw log events_gs_tx_decode(abi, input, output)— decode raw trace/transaction data_gs_fetch_abi(url, type)— fetch ABI from URL (etherscan-compatible or raw JSON); fetched once at pipeline start
External handler transform
transforms:
my_handler:
type: handler
primary_key: id
url: http://example.com/transform
from: my_source
batch_size: 100 # Records per batch (default: 100)
batch_flush_interval: 1s # Flush interval (default: 1s)
payload_columns: [col1,col2] # Optional: send subset of columns
headers: # Optional custom headers
X-Api-Key: my-key
- At-least-once delivery with exponential backoff on failure
- Max response time: 5 minutes; max connection time: 1 minute
- Supports
schema_overridefor return type casting
See docs.goldsky.com/mirror/transforms/sql-transforms.
Full YAML Examples
Subgraph entity to PostgreSQL
name: my-subgraph-sync
apiVersion: 3
resource_size: s
sources:
subgraph_transfer:
type: subgraph_entity
name: Transfer
subgraphs:
- name: uniswap-v3
version: 1.0.0
transforms: {}
sinks:
my_postgres:
type: postgres
table: transfers
schema: public
secret_name: MY_PG_SECRET
from: subgraph_transfer
Dataset (direct indexing) to PostgreSQL with SQL transform
name: base-logs-filtered
apiVersion: 3
resource_size: s
sources:
base_logs:
type: dataset
dataset_name: base.logs
version: 1.0.0
start_at: earliest
filter: "address = '0x833589fcd6edb6e08f4c7c32d4f71b54bda02913'"
transforms:
select_fields:
sql: SELECT id, block_number, transaction_hash, data FROM base_logs
primary_key: id
sinks:
pg_logs:
type: postgres
table: base_logs
schema: public
secret_name: MY_PG_SECRET
from: select_fields
No subgraph source? You should almost certainly use Turbo instead — it's faster, more reliable, and has a richer dataset catalog with simpler syntax. Use
/turbo-builderto get started.
CLI Reference — All Pipeline Commands
Global options available on every command: --token <string> (CLI auth token), --color (colorize output, default true), -h, --help.
goldsky pipeline apply <config-path>
Create or update a pipeline from a YAML config file. Idempotent.
| Flag | Type | Description |
|---|---|---|
--status |
ACTIVE | INACTIVE | PAUSED |
Desired pipeline status |
--from-snapshot |
string | Snapshot to start from: last, new, none, or a snapshot ID. last = latest available. new = create a fresh snapshot first. none = start from scratch. Default: new |
--force |
boolean | Skip confirmation prompts (useful for CI) |
--skip-transform-validation |
boolean | Skip transform validation on update |
--save-progress |
boolean | (deprecated, use --from-snapshot) Attempt snapshot before applying |
--use-latest-snapshot |
boolean | (deprecated, use --from-snapshot) Start from latest snapshot |
--skip-validation |
boolean | (deprecated) Same as --skip-transform-validation |
goldsky pipeline apply my-pipeline.yaml --status ACTIVE
goldsky pipeline apply my-pipeline.yaml --status ACTIVE --from-snapshot last
goldsky pipeline apply my-pipeline.yaml --force # CI/CD usage
goldsky pipeline start <nameOrConfigPath>
Start a pipeline (equivalent to apply with --status ACTIVE).
| Flag | Type | Description |
|---|---|---|
--from-snapshot |
string | last, new, none, or snapshot ID |
--use-latest-snapshot |
boolean | (deprecated, use --from-snapshot) |
goldsky pipeline stop <nameOrConfigPath>
Stop a pipeline without taking a snapshot. Sets status to INACTIVE, runtime to TERMINATED.
No additional flags beyond global options.
goldsky pipeline pause <nameOrConfigPath>
Pause a pipeline with a snapshot so it can resume from where it left off. Sets status to PAUSED, runtime to TERMINATED.
No additional flags beyond global options.
goldsky pipeline restart <nameOrConfigPath>
Restart a pipeline without configuration changes. Useful when the sink database was restarted, connection is stuck, etc.
| Flag | Type | Description |
|---|---|---|
--from-snapshot |
string | Required. last, new, none, or snapshot ID |
--disable-monitoring |
boolean | Skip monitoring after restart (default: false) |
goldsky pipeline restart my-pipeline --from-snapshot last
goldsky pipeline restart my-pipeline --from-snapshot none # restart from scratch
goldsky pipeline get <nameOrConfigPath>
Get pipeline configuration and status.
| Flag | Type | Description |
|---|---|---|
--outputFormat, --output |
json | table | yaml |
Output format (default: yaml) |
--definition |
boolean | Print only the pipeline definition (sources, transforms, sinks) |
-v, --version |
string | Pipeline version (default: latest) |
goldsky pipeline list
List all pipelines in the project.
| Flag | Type | Description |
|---|---|---|
--output, --outputFormat |
json | table | yaml |
Output format (default: table) |
--outputVerbosity |
summary | usablewithapplycmd | all |
Detail level (default: summary) |
--include-runtime-details |
boolean | Include runtime status and errors (default: false) |
goldsky pipeline list --output json
goldsky pipeline list --include-runtime-details
goldsky pipeline info <nameOrConfigPath>
Display pipeline information (status, config, runtime details).
| Flag | Type | Description |
|---|---|---|
-v, --version |
string | Pipeline version (default: latest) |
goldsky pipeline monitor <nameOrConfigPath>
Monitor pipeline runtime — status, metrics (records received/written), errors. Refreshes every 10 seconds.
| Flag | Type | Description |
|---|---|---|
--update-request |
boolean | Monitor an in-flight update request |
--max-refreshes, --maxRefreshes |
number | Max number of data refreshes |
-v, --version |
string | Pipeline version (default: latest) |
goldsky pipeline delete <nameOrConfigPath>
Delete a pipeline permanently.
| Flag | Type | Description |
|---|---|---|
-f, --force |
boolean | Force deletion without confirmation prompt (default: false) |
goldsky pipeline resize <nameOrConfigPath> <resourceSize>
Change the compute resources for a pipeline.
| Positional | Description |
|---|---|
resourceSize |
One of: s, m, l, xl, xxl (default: s) |
goldsky pipeline resize my-pipeline l
goldsky pipeline validate [config-path]
Validate a pipeline YAML config without deploying.
| Flag | Type | Description |
|---|---|---|
--definition |
string | (deprecated) Inline JSON definition |
--definition-path |
string | (deprecated) Path to JSON/YAML definition |
goldsky pipeline validate my-pipeline.yaml
goldsky pipeline export [name]
Export pipeline configuration.
| Flag | Type | Description |
|---|---|---|
--all |
boolean | Export configs for all pipelines |
goldsky pipeline cancel-update <nameOrConfigPath>
Cancel an in-flight update or snapshot request. Useful when a long-running snapshot blocks a needed update.
No additional flags beyond global options.
goldsky pipeline create <name> (interactive/guided)
Guided CLI experience for creating a pipeline interactively.
| Flag | Type | Description |
|---|---|---|
--resource-size, --resourceSize |
s | m | l | xl | xxl |
Resource size (default: s) |
--use-dedicated-ip |
boolean | Use dedicated egress IPs (default: false) |
--skip-transform-validation |
boolean | Skip transform validation |
--status |
ACTIVE | INACTIVE |
(deprecated, use pipeline start/stop/pause) |
--description |
string | (deprecated, use pipeline apply) |
--definition |
string | (deprecated, use pipeline apply) |
--definition-path |
string | (deprecated, use pipeline apply) |
--output, --outputFormat |
json | table | yaml |
Output format (default: yaml) |
goldsky pipeline get-definition <name> (deprecated)
Get a shareable pipeline definition. Use goldsky pipeline get <name> --definition instead.
Snapshot Commands
# List snapshots for a pipeline
goldsky pipeline snapshots list <nameOrConfigPath> [-v <version>]
# Create a snapshot manually
goldsky pipeline snapshots create <nameOrConfigPath>
snapshots list supports -v, --version to filter by pipeline version (default: all versions).
Lifecycle Quick Reference
| Action | Command |
|---|---|
| Deploy / start | goldsky pipeline apply <file.yaml> --status ACTIVE |
| Start (existing) | goldsky pipeline start <name> |
| Pause (with snapshot) | goldsky pipeline pause <name> |
| Stop (no snapshot) | goldsky pipeline stop <name> |
| Restart (no config change) | goldsky pipeline restart <name> --from-snapshot last |
| Update config | goldsky pipeline apply <file.yaml> (edit YAML first) |
| Resize | goldsky pipeline resize <name> <size> |
| Validate YAML | goldsky pipeline validate <file.yaml> |
| Monitor | goldsky pipeline monitor <name> |
| Get config | goldsky pipeline get <name> --definition |
| Export config | goldsky pipeline export <name> |
| Delete | goldsky pipeline delete <name> -f |
| Cancel in-flight op | goldsky pipeline cancel-update <name> |
| List snapshots | goldsky pipeline snapshots list <name> |
| Create snapshot | goldsky pipeline snapshots create <name> |
| List all pipelines | goldsky pipeline list |
Pause vs. Stop:
pause— takes a snapshot and suspends the pipeline (status: PAUSED + TERMINATED). Can resume from where it left off.stop— stops without taking a snapshot (status: INACTIVE + TERMINATED). Resuming may reprocess data.
Desired statuses: ACTIVE, INACTIVE, PAUSED Runtime statuses: STARTING, RUNNING, FAILING, TERMINATED
Snapshots
Snapshots capture a point-in-time state of a RUNNING pipeline for resumption. They contain progress on reading sources and SQL transform state — not sink state.
- Automatic snapshots are taken every 4 hours for healthy RUNNING pipelines.
- Before updates: a snapshot is created automatically before applying config changes to a RUNNING pipeline.
- On pause: a snapshot is created when pausing.
- Manual:
goldsky pipeline snapshots create <name>. - Resume: only the latest snapshot can be used. For older snapshots, contact support.
The --from-snapshot flag (on apply, start, restart) controls snapshot behavior:
new— create a fresh snapshot, then start from it (default)last— use the latest existing snapshot (no new snapshot)none— start from scratch, no snapshot<snapshot-id>— use a specific snapshot
Resource Sizing
Set via resource_size in YAML or goldsky pipeline resize <name> <size>.
| Size | Description |
|---|---|
s |
Default. Handles most use cases, backfill of small chains, up to 300K records/sec, up to ~8 subgraph sources |
m, l, xl, xxl |
Larger compute — for backfilling large chains or large JOINs |
Start small and scale up if needed. Resource size affects pricing.
Networking
- Mirror pipelines write data from AWS us-west-2. Ensure your sink allows inbound connections from this region.
- IP addresses are dynamic by default.
- Dedicated egress IPs available on request — use
--use-dedicated-iponpipeline create, or contact support@goldsky.com. - VPC peering available on request.
- For external handler transforms, deploy close to us-west-2 for best performance (aim for p95 < 100ms).
Dataset Discovery
# List available datasets
goldsky dataset list
# Get schema for a specific dataset
goldsky dataset get <dataset_name>
Common Questions
Can Mirror pipelines use subgraphs as a source?
Yes — this is Mirror's primary advantage over Turbo. Set type: subgraph_entity in your source and reference your deployed subgraph.
Can Mirror handle multiple sources or cross-chain data?
Yes — define multiple sources in the YAML and use SQL transforms to join or merge them. For subgraphs, you can list multiple subgraphs (different chains) in a single source's subgraphs array.
My pipeline needs more resources / is too slow?
Run goldsky pipeline resize <name> l (or xl, xxl). Start small and scale up.
My pipeline is ACTIVE but TERMINATED — what happened?
The desired status is ACTIVE but the runtime failed (e.g., bad secret, sink unavailable, resource issues). Check errors with goldsky pipeline monitor <name> --include-runtime-details or view the dashboard. Fix the issue and restart.
How do I update a pipeline without losing progress?
Edit your YAML and run goldsky pipeline apply <file.yaml>. By default, a snapshot is taken before the update is applied. Use --from-snapshot last to skip creating a new snapshot and use the latest existing one.
A long snapshot is blocking my update — what do I do?
Run goldsky pipeline cancel-update <name> to cancel the in-flight operation, then reapply with --from-snapshot last or --from-snapshot none.
Related
/turbo-builder— Build a new Turbo pipeline (recommended for new projects not using subgraph sources)/subgraphs— Deploy and manage the subgraph you want to sync via Mirror/secrets— Create secrets for sink credentials/datasets— Browse available dataset names and chain prefixes- Goldsky docs: docs.goldsky.com/mirror/introduction
- Pipeline config reference: docs.goldsky.com/mirror/reference/config-file/pipeline
More from goldsky-io/goldsky-agent
turbo-builder
Build and deploy new Goldsky Turbo pipelines from scratch. Triggers on: 'build a pipeline', 'index X on Y chain', 'set up a pipeline', 'track transfers to postgres', or any request describing data to move from a chain/contract to a destination (postgres, clickhouse, kafka, s3, webhook). Covers the full workflow: requirements → dataset selection → YAML generation → validation → deploy. Not for debugging (use /turbo-doctor) or syntax lookups (use /turbo-pipelines).
38turbo-pipelines
Turbo pipeline YAML reference and architecture guide. Covers: YAML field syntax (start_at, from, version, primary_key), source/transform/sink configuration, validation errors, resource sizing (xs–xxl), architecture decisions (dataset vs kafka, streaming vs job, fan-out vs fan-in, sink selection, pipeline splitting). Triggers on: 'what does field X do', 'what fields does a postgres sink need', 'what resource size', 'should I use kafka or dataset', 'how to structure my pipeline'. For writing transforms, use /turbo-transforms. For end-to-end building, use /turbo-builder.
38turbo-doctor
Diagnose and fix broken Goldsky Turbo pipelines interactively. Triggers on: pipeline in error state, stuck starting, connection refused, not getting data, duplicate rows, missing fields, slow backfill, or any named pipeline misbehaving. Runs logs/status commands, identifies root cause, and offers fixes. For CLI syntax or error pattern lookup without an active problem, use /turbo-operations instead.
34secrets
Use this skill when a user wants to store, manage, or work with Goldsky secrets — the named credential objects used by pipeline sinks. This includes: creating a new secret from a connection string or credentials, listing or inspecting existing secrets, updating or rotating credentials after a password change, and deleting secrets that are no longer needed. Trigger for any query where the user mentions 'goldsky secret', wants to securely store database credentials for a pipeline, or is working with sink authentication for PostgreSQL, Neon, Supabase, ClickHouse, Kafka, S3, Elasticsearch, DynamoDB, SQS, OpenSearch, or webhooks.
33datasets
Use this skill when the user needs to look up or verify Goldsky blockchain dataset names, chain prefixes, dataset types, or versions. Triggers on questions like 'what\\'s the dataset name for X?', 'what prefix does Goldsky use for chain Y?', 'what version should I use for Z?', or 'what datasets are available for Solana/Stellar/Arbitrum/etc?'. Also use for chain-specific dataset questions (e.g., polygon vs matic prefix, stellarnet balance datasets, solana token transfer dataset names). Do NOT trigger for questions about CLI commands, pipeline setup, or general Goldsky architecture unless the core question is about finding the right dataset name or chain prefix.
33turbo-transforms
Write SQL, TypeScript, and dynamic table transforms for Turbo pipelines. Covers: decoding EVM logs with _gs_log_decode, filtering/casting blockchain data, UNION ALL for combining events, TypeScript/WASM transforms (invoke function), dynamic lookup tables (dynamic_table_check), transform chaining, and Solana decoding. Triggers on: 'decode Transfer events', 'write a SQL transform', 'filter by contract', 'TypeScript transform', 'dynamic table', 'UNION ALL'. For pipeline YAML structure, use /turbo-pipelines. For end-to-end building, use /turbo-builder.
32