skills/spiceai/skills/spice-connect-data

spice-connect-data

SKILL.md

Connect to Data Sources

Spice federates SQL queries across 30+ data sources without ETL. Connect databases, data lakes, warehouses, and APIs, then query across them with standard SQL.

How Federation Works

Configure datasets pointing to different sources. Spice's query planner (built on Apache DataFusion) optimizes and routes queries with filter pushdown and column projection:

datasets:
  - from: postgres:customers
    name: customers
    params:
      pg_host: db.example.com
      pg_user: ${secrets:PG_USER}
  - from: s3://bucket/orders/
    name: orders
    params:
      file_format: parquet
  - from: snowflake:analytics.sales
    name: sales
-- Query across all three sources in one statement
SELECT c.name, o.order_total, s.region
FROM customers c
  JOIN orders o ON c.id = o.customer_id
  JOIN sales s ON o.id = s.order_id
WHERE s.region = 'EMEA';

Dataset Configuration

datasets:
  - from: <connector>:<identifier>
    name: <dataset_name>
    params:
      # connector-specific parameters
    acceleration:
      enabled: true # optional: materialize locally (see spice-acceleration)

Supported Connectors

Databases

Connector From Format Status
PostgreSQL postgres:schema.table Stable (also Amazon Redshift)
MySQL mysql:schema.table Stable
DuckDB duckdb:database.table Stable
MS SQL Server mssql:db.table Beta
DynamoDB dynamodb:table Release Candidate
MongoDB mongodb:collection Alpha
ClickHouse clickhouse:db.table Alpha

Data Warehouses

Connector From Format Status
Snowflake snowflake:db.schema.table Beta
Databricks (Delta Lake) databricks:catalog.schema.table Stable
Spark spark:db.table Beta

Data Lakes & Object Storage

Connector From Format Status
S3 s3://bucket/path/ Stable
Delta Lake delta_lake:/path/to/delta/ Stable
Iceberg iceberg:table Beta
Azure BlobFS abfs://container/path/ Alpha
File (local) file:./path/to/data Stable

Other Sources

Connector From Format Status
Spice.ai spice.ai:path/to/dataset Stable
Dremio dremio:source.table Stable
GitHub github:github.com/owner/repo/issues Stable
GraphQL graphql:endpoint Release Candidate
FlightSQL flightsql:query Beta
ODBC odbc:connection Beta
FTP/SFTP sftp://host/path/ Alpha
HTTP/HTTPS https://url/path/data.csv Alpha
Kafka kafka:topic Alpha
Debezium CDC debezium:topic Alpha
SharePoint sharepoint:site/path Alpha
IMAP imap:mailbox Alpha

Common Examples

PostgreSQL

datasets:
  - from: postgres:public.users
    name: users
    params:
      pg_host: localhost
      pg_port: 5432
      pg_user: ${ env:PG_USER }
      pg_pass: ${ env:PG_PASS }
    acceleration:
      enabled: true

S3 with Parquet

datasets:
  - from: s3://my-bucket/data/sales/
    name: sales
    params:
      file_format: parquet
      s3_region: us-east-1
    acceleration:
      enabled: true
      engine: duckdb

GitHub Issues

datasets:
  - from: github:github.com/spiceai/spiceai/issues
    name: spiceai.issues
    params:
      github_token: ${ secrets:GITHUB_TOKEN }
    acceleration:
      enabled: true
      refresh_mode: append
      refresh_check_interval: 24h
      refresh_data_window: 14d

Local File

datasets:
  - from: file:./data/sales.parquet
    name: sales

File Formats

Connectors reading from object stores (S3, ABFS) or network storage (FTP, SFTP) support:

Format file_format Type
Apache Parquet parquet Structured
CSV csv Structured
Markdown md Document
Text txt Document
PDF pdf Document
Microsoft Word docx Document

Document files produce a table with location and content columns:

datasets:
  - from: file:docs/decisions/
    name: my_documents
    params:
      file_format: md

Hive Partitioning

datasets:
  - from: s3://bucket/data/
    name: partitioned_data
    params:
      file_format: parquet
      hive_partitioning_enabled: true
SELECT * FROM partitioned_data WHERE year = '2024' AND month = '01';

Dataset Naming

  • name: foospice.public.foo
  • name: myschema.foospice.myschema.foo
  • Use . to organize datasets into schemas

Catalogs

Catalog connectors expose external data catalogs, preserving the source schema hierarchy. Tables are accessed as <catalog>.<schema>.<table>.

Note: Acceleration is not supported for catalog tables. Use datasets for accelerated access.

catalogs:
  - from: <connector>
    name: <catalog_name>
    params:
      # connector-specific parameters
    include:
      - 'schema.*' # optional: filter with glob patterns

Supported Catalogs

Connector From Value Status
Unity Catalog unity_catalog Stable
Databricks databricks Beta
Iceberg iceberg Beta
Spice.ai spice.ai Beta
AWS Glue glue Alpha

Catalog Example

catalogs:
  - from: unity_catalog
    name: unity
    params:
      unity_catalog_endpoint: https://my-workspace.cloud.databricks.com
      databricks_token: ${ secrets:DATABRICKS_TOKEN }
    include:
      - 'my_schema.*'
SELECT * FROM unity.my_schema.customers LIMIT 10;

Views

Views are virtual tables defined by SQL queries — useful for pre-aggregations, transformations, and simplified access:

views:
  - name: daily_sales
    sql: |
      SELECT DATE(created_at) as date, SUM(amount) as total, COUNT(*) as orders
      FROM orders
      GROUP BY DATE(created_at)

  - name: order_details
    sql: |
      SELECT o.id, c.name as customer, p.name as product, o.quantity
      FROM orders o
      JOIN customers c ON o.customer_id = c.id
      JOIN products p ON o.product_id = p.id

Views can be accelerated:

views:
  - name: rankings
    sql: |
      SELECT product_id, SUM(quantity) as total_sold
      FROM orders GROUP BY product_id ORDER BY total_sold DESC LIMIT 100
    acceleration:
      enabled: true
      refresh_check_interval: 1h

Views are read-only and queried like regular tables: SELECT * FROM daily_sales.

Writing Data

Spice supports writing to Apache Iceberg tables and Amazon S3 Tables via INSERT INTO:

datasets:
  - from: iceberg:https://catalog.example.com/v1/namespaces/sales/tables/transactions
    name: transactions
    access: read_write # required for writes
INSERT INTO transactions SELECT * FROM staging_transactions;

Referencing Secrets

Use ${ store_name:KEY } syntax in params. See spice-secrets for full configuration:

params:
  pg_user: ${ env:PG_USER }
  pg_pass: ${ secrets:PG_PASSWORD }

Documentation

Weekly Installs
5
Repository
spiceai/skills
GitHub Stars
1
First Seen
Feb 12, 2026
Installed on
opencode5
github-copilot5
codex5
gemini-cli4
amp4
kimi-cli4