wren-dlt-connector
wren-dlt-connector
Connect SaaS data to Wren Engine for SQL analysis — from zero to a verified, queryable project in one conversation.
Who this is for
Data analysts who know SQL and some Python, but may not have used dlt or Wren before. Explain concepts briefly when they first appear, but don't over-explain things a SQL-literate person would already know.
Overview
This skill walks through a four-phase workflow:
- Extract — Use dlt (data load tool) to pull data from a SaaS API into a local DuckDB file
- Model — Introspect the DuckDB schema and auto-generate a Wren semantic project (YAML models, relationships, profile)
- Build & Verify — Build the project and run actual SQL queries to confirm everything works end-to-end
- Handoff — Show the user their data and next steps
The user might enter at any phase. Ask which phase they're starting from — they may already have a .duckdb file and just need phases 2–4.
The goal is a project that actually queries successfully, not just files that look correct. Always run the verification step before declaring success.
Critical: DuckDB catalog naming
When wren engine connects to a DuckDB file, it ATTACHes it using the filename (without .duckdb extension) as the catalog alias:
ATTACH DATABASE 'stripe_data.duckdb' AS "stripe_data" (READ_ONLY)
This means every model's table_reference.catalog must equal the DuckDB filename stem. If the file is hubspot.duckdb, the catalog is hubspot. If it's my_pipeline.duckdb, the catalog is my_pipeline.
Getting this wrong causes "table not found" errors at query time. The introspect_dlt.py script handles this automatically.
Critical: Type normalization
Column types must be normalized using wren SDK's type_mapping.parse_type() function, which uses sqlglot to convert database-specific types (like DuckDB's HUGEINT, TIMESTAMP WITH TIME ZONE) into canonical SQL types that wren-core understands. Do not hardcode type mappings — always delegate to parse_type(raw_type, "duckdb").
The introspect_dlt.py script does this automatically when wren SDK is installed.
Phase 1: Extract — dlt Pipeline Setup
Step 1: Pick the SaaS source
Ask the user which SaaS service they want to connect. Read references/dlt_sources.md for a list of popular verified sources and their auth requirements. If the source isn't listed, check whether dlt has a verified source for it by searching dlthub.com/docs/dlt-ecosystem/verified-sources.
Step 2: Install dlt
pip install "dlt[duckdb]" --break-system-packages
Step 3: Write the pipeline script
Create a Python script that:
- Imports the dlt source function for the chosen SaaS
- Configures the pipeline with
destination='duckdb'and a local file path - Runs the pipeline with
pipeline.run(source)
Here's the general pattern — adapt it per source (check references/dlt_sources.md for source-specific templates):
import dlt
pipeline = dlt.pipeline(
pipeline_name="<source>_pipeline",
destination="duckdb",
dataset_name="<source>_data",
)
# Source-specific: check references/dlt_sources.md for auth patterns
source = <source_function>(api_key=dlt.secrets.value)
info = pipeline.run(source)
print(info)
Step 4: Set up credentials
dlt reads credentials from environment variables or .dlt/secrets.toml. The simplest approach for a one-time run:
# Set the credential as an environment variable
# The exact variable name depends on the source — check references/dlt_sources.md
export SOURCES__<SOURCE>__API_KEY="the-actual-key"
Ask the user for their API key or token. Remind them:
- Never commit credentials to git
- Environment variables are the simplest way for a one-time run
- For repeated use, they can create
.dlt/secrets.toml
Step 5: Run the pipeline
python <pipeline_script>.py
After the run, confirm:
- The pipeline completed without errors
- A
.duckdbfile was created (usually at<pipeline_name>.duckdb) - Print discovered tables and their column counts
import duckdb
con = duckdb.connect("<pipeline_name>.duckdb", read_only=True)
for row in con.execute("""
SELECT table_schema, table_name,
(SELECT COUNT(*) FROM information_schema.columns c
WHERE c.table_schema = t.table_schema AND c.table_name = t.table_name) as col_count
FROM information_schema.tables t
WHERE table_schema NOT IN ('information_schema', 'pg_catalog')
AND table_name NOT LIKE '_dlt_%'
ORDER BY table_schema, table_name
""").fetchall():
print(f" {row[0]}.{row[1]} ({row[2]} columns)")
con.close()
Phase 2: Model — Generate Wren Project
Run the introspection script to auto-generate a complete Wren project from the DuckDB file:
python <path-to-this-skill>/scripts/introspect_dlt.py \
--duckdb-path <path-to-duckdb-file> \
--output-dir <project-directory> \
--project-name <name>
This script:
- Connects to the DuckDB file (read-only)
- Sets
table_reference.catalogto the DuckDB filename stem (matching wren engine's ATTACH behavior) - Discovers all tables and columns via
information_schema - Filters out dlt internal tables (
_dlt_loads,_dlt_pipeline_state, etc.) - Filters out dlt metadata columns (
_dlt_id,_dlt_load_id,_dlt_list_idx) from model definitions - Detects parent-child relationships from
_dlt_parent_idcolumns and table naming conventions - Normalizes column types using
wren.type_mapping.parse_type()(sqlglot-based) - Generates a complete v2 YAML project (wren_project.yml, models/, relationships.yml, instructions.md)
After running, show the user what was generated:
# Show project summary
cat <project-directory>/wren_project.yml
echo "---"
ls <project-directory>/models/
echo "---"
cat <project-directory>/relationships.yml
Verify model correctness
Spot-check one generated model to confirm:
table_reference.catalogmatches the DuckDB filename (e.g.,stripe_dataforstripe_data.duckdb)table_reference.schemamatches the DuckDB schema (usuallymain)- No
_dlt_*columns appear in the columns list - Column types look reasonable (VARCHAR, BIGINT, BOOLEAN, TIMESTAMP, etc.)
Set up the connection profile
Create a Wren profile so the user can query without specifying connection details every time. The url must point to the directory containing the .duckdb file (not the file itself):
import yaml
from pathlib import Path
wren_home = Path.home() / ".wren"
wren_home.mkdir(exist_ok=True)
profiles_file = wren_home / "profiles.yml"
existing = (yaml.safe_load(profiles_file.read_text()) or {}) if profiles_file.exists() else {}
existing.setdefault("profiles", {})
profile_name = "<source>_dlt"
existing["profiles"][profile_name] = {
"datasource": "duckdb",
"url": str(Path("<duckdb-path>").resolve().parent),
"format": "duckdb",
}
existing["active"] = profile_name
profiles_file.write_text(yaml.dump(existing, default_flow_style=False, sort_keys=False))
Phase 3: Build & Verify — The Project Must Actually Work
This phase is not optional. A project that generates YAML but fails at query time is not a success.
Step 1: Build the MDL
cd <project-directory>
wren context build
This compiles the YAML models into target/mdl.json. If this fails, fix the issues before proceeding (see Troubleshooting below).
Step 2: Validate with a real query
Run at least one query per generated model to confirm the project is functional:
# For each model, verify it resolves correctly
wren --sql 'SELECT COUNT(*) as total FROM "<table_name>"'
If any query fails, debug and fix the model before moving on. Common issues:
- Wrong catalog in table_reference → "table not found"
- Type mismatch → fix the column type in metadata.yml
- Missing profile → check
wren profile list
Step 3: Run interesting queries
Once basic queries pass, run 2–3 more interesting queries to show the user what their data looks like:
# Preview data
wren --sql 'SELECT * FROM "<table_name>" LIMIT 5'
# If there's a relationship, verify both models are queryable
wren --sql 'SELECT * FROM "<parent>" LIMIT 5'
wren --sql 'SELECT * FROM "<child>" LIMIT 5'
Show the results to the user and explain what they're seeing. This is their first look at the data through Wren — make it count.
Step 4: Confirm success
Only after queries return real data, tell the user the setup is complete. Summarize:
- How many models were created
- What relationships were detected
- Which profile is active
- Example queries they can try next
Troubleshooting
If wren context build fails:
- Check that
data_source: duckdbis set inwren_project.yml - Verify the DuckDB file path in the profile is correct
- Run
wren context validatefor detailed error messages
If queries fail with "table not found":
- Most likely cause:
table_reference.catalogdoesn't match the DuckDB filename. If the file ispipeline.duckdb, the catalog must bepipeline, not empty string. - Check the profile's
urlpoints to the directory containing the.duckdbfile - Table names with double underscores need quoting:
"hubspot__contacts"
If queries fail with type errors:
- Check column types in the model YAML — they should be canonical SQL types (VARCHAR, BIGINT, etc.)
- Re-run
introspect_dlt.pywith wren SDK installed to get proper type normalization
General:
- Check that the profile is active:
wren profile list - The DuckDB file might be locked if a dlt pipeline is running — wait for it to finish
Important notes
- dlt's
_dlt_parent_id/_dlt_idcolumns are kept in the actual DuckDB tables but hidden from Wren model definitions. They're only used in relationship conditions. - DuckDB has a single-writer limitation. Don't run a dlt sync while querying. For concurrent access, dlt should write to a separate file and swap atomically.
- The generated models use
table_reference(notref_sql) since they map directly to DuckDB tables created by dlt. - Column types are normalized using wren SDK's
parse_type()with sqlglot's DuckDB dialect. If a type looks wrong, the user can edit the model'smetadata.ymldirectly.
More from canner/wrenai
wren-generate-mdl
Generate a Wren MDL project by exploring a database with available tools (SQLAlchemy, database drivers, MCP connectors, or raw SQL). Guides agents through schema discovery, type normalization, and MDL YAML generation using the wren CLI. Use when: user wants to create or set up a new MDL, onboard a new data source, or scaffold a project from an existing database.
16wren-onboarding
Onboard a user to Wren Engine end-to-end. Walks through environment checks, project scaffolding, connection configuration via .env, and first query. Use when: user wants to install Wren Engine, set up a new data source connection, or bootstrap a new project from scratch. Triggers: '/wren-onboarding', 'install wren', 'set up wren engine', 'wren onboarding', 'connect new database to wren'.
15wren-usage
Wren Engine CLI workflow guide for AI agents. Answer data questions end-to-end using the wren CLI: gather schema context, recall past queries, write SQL through the MDL semantic layer, execute, and learn from confirmed results. Use when: user asks a data question, requests a report or analysis, asks about metrics, revenue, customers, orders, trends, or any business data; user says 'how many', 'show me', 'what is the', 'top N', 'compare', 'trend', 'growth', 'breakdown'; user wants to explore, analyze, filter, aggregate, or summarize data from a database; agent needs to query data, connect a data source, handle errors, or manage MDL changes via the wren CLI.
15