skills/spiceai/skills/spice-data-connector

spice-data-connector

SKILL.md

Spice Data Connectors

Data Connectors enable federated SQL queries across databases, data warehouses, data lakes, and files. Spice connects directly to your existing data sources and provides a unified SQL interface — no ETL pipelines required. The query planner (built on Apache DataFusion) optimizes and routes queries, including filter pushdown and column projection.

Cross-Source Federation

Query across multiple heterogeneous sources in one SQL statement:

datasets:
  - from: postgres:customers
    name: customers
    params:
      pg_host: db.example.com
      pg_user: ${secrets:PG_USER}
  - from: s3://bucket/orders/
    name: orders
    params:
      file_format: parquet
  - from: snowflake:analytics.sales
    name: sales
-- Query across all three sources in one statement
SELECT c.name, o.order_total, s.region
FROM customers c
  JOIN orders o ON c.id = o.customer_id
  JOIN sales s ON o.id = s.order_id
WHERE s.region = 'EMEA';

Without acceleration, each query fetches data directly from the underlying sources with optimized filter pushdown.

Basic Dataset Configuration

datasets:
  - from: <connector>:<identifier>
    name: <dataset_name>
    params:
      # connector-specific parameters
    acceleration:
      enabled: true # optional: enable local materialization

Supported Connectors

Databases

Connector From Format Status
PostgreSQL postgres:schema.table Stable (also Amazon Redshift)
MySQL mysql:schema.table Stable
DuckDB duckdb:database.table Stable
MS SQL Server mssql:db.table Beta
MongoDB mongodb:collection Alpha
ClickHouse clickhouse:db.table Alpha
DynamoDB dynamodb:table Release Candidate

Data Warehouses

Connector From Format Status
Snowflake snowflake:db.schema.table Beta
Databricks (Delta Lake) databricks:catalog.schema.table Stable
Spark spark:db.table Beta

Data Lakes & Object Storage

Connector From Format Status
S3 s3://bucket/path/ Stable
Delta Lake delta_lake:/path/to/delta/ Stable
Iceberg iceberg:table Beta
Azure BlobFS abfs://container/path/ Alpha
File (local) file:./path/to/data Stable

Other Sources

Connector From Format Status
Spice.ai spice.ai:path/to/dataset Stable
Dremio dremio:source.table Stable
GitHub github:github.com/owner/repo/issues Stable
GraphQL graphql:endpoint Release Candidate
FlightSQL flightsql:query Beta
ODBC odbc:connection Beta
FTP/SFTP sftp://host/path/ Alpha
HTTP/HTTPS https://url/path/data.csv Alpha
Kafka kafka:topic Alpha
Debezium CDC debezium:topic Alpha
SharePoint sharepoint:site/path Alpha
IMAP imap:mailbox Alpha

Common Examples

PostgreSQL

datasets:
  - from: postgres:public.users
    name: users
    params:
      pg_host: localhost
      pg_port: 5432
      pg_user: ${ env:PG_USER }
      pg_pass: ${ env:PG_PASS }
    acceleration:
      enabled: true

S3 with Parquet

datasets:
  - from: s3://my-bucket/data/sales/
    name: sales
    params:
      file_format: parquet
      s3_region: us-east-1
    acceleration:
      enabled: true
      engine: duckdb

GitHub Issues

datasets:
  - from: github:github.com/spiceai/spiceai/issues
    name: spiceai.issues
    params:
      github_token: ${ secrets:GITHUB_TOKEN }
    acceleration:
      enabled: true
      refresh_mode: append
      refresh_check_interval: 24h
      refresh_data_window: 14d

Local File

datasets:
  - from: file:./data/sales.parquet
    name: sales

File Formats

Connectors reading from object stores (S3, ABFS, GCS) or network storage (FTP, SFTP) support:

Format file_format Status Type
Apache Parquet parquet Stable Structured
CSV csv Stable Structured
Markdown md Stable Document
Text txt Stable Document
PDF pdf Alpha Document
Microsoft Word docx Alpha Document

Document Formats

Document files (md, txt, pdf, docx) produce a table with location and content columns:

datasets:
  - from: file:docs/decisions/
    name: my_documents
    params:
      file_format: md
SELECT location, content FROM my_documents LIMIT 5;

Hive Partitioning

datasets:
  - from: s3://bucket/data/
    name: partitioned_data
    params:
      file_format: parquet
      hive_partitioning_enabled: true
SELECT * FROM partitioned_data WHERE year = '2024' AND month = '01';

Dataset Naming

  • name: foo creates spice.public.foo
  • name: myschema.foo creates spice.myschema.foo
  • Use . to organize datasets into schemas

Documentation

Weekly Installs
12
Repository
spiceai/skills
GitHub Stars
1
First Seen
Jan 20, 2026
Installed on
opencode12
codex11
github-copilot11
gemini-cli10
claude-code8
cursor7