dbt-transforms
dbt Skill for Claude
Comprehensive dbt guidance covering project structure, modeling, testing, CI/CD, and production patterns. Targets Snowflake, BigQuery, and DuckDB. Beginner-friendly with progressive scaling.
When to Use This Skill
Activate when: Creating/modifying dbt models, choosing materializations, structuring layers, setting up tests, CI/CD, sources, Jinja macros, or making analytics engineering decisions.
Don't use for: Basic SQL syntax, warehouse administration, raw pipeline config (Fivetran/Airbyte), BI tool setup.
Scope Constraints
- Targets Snowflake, BigQuery, and DuckDB; other warehouses may need adaptation.
- Assumes dbt Core 1.6+ or dbt Cloud; earlier versions lack governance features.
- Does not cover raw data ingestion, BI tool configuration, or warehouse admin.
- Reference files are loaded on demand -- do not pre-load multiple references.
Model Routing
| reasoning_demand | preferred | acceptable | minimum |
|---|---|---|---|
| medium | Sonnet | Sonnet, Opus | Sonnet |
Core Principles
- DRY via ref()/source() -- Never hardcode table names; use
ref()orsource()exclusively. - Single Source of Truth -- Staging = entry point for raw data; marts = consumer interface.
- Idempotent Transformations -- Ensure
dbt runis safe to re-execute. - Test Everything -- Every model has at minimum PK uniqueness + not_null tests.
- Progressive Complexity -- Start with views/tables; add incremental only when volume demands it.
Project Structure
dbt_project/
├── dbt_project.yml
├── packages.yml
├── profiles.yml # Local only, never committed
├── models/
│ ├── staging/ # 1:1 with source tables
│ │ └── <source>/
│ │ ├── _<source>__models.yml
│ │ ├── _<source>__sources.yml
│ │ └── stg_<source>__<entity>.sql
│ ├── intermediate/ # Business logic, joins, pivots
│ └── marts/ # Business-facing facts and dimensions
├── macros/
├── tests/
├── seeds/
├── snapshots/
└── analyses/
Modeling Methodology -- Medallion + Kimball
Layer Decision Matrix
| Layer | Materialization | Purpose | Naming | Tests |
|---|---|---|---|---|
| Staging | view |
Clean/rename raw data, 1:1 with source | stg_<source>__<entity> |
not_null, unique on PK |
| Intermediate | ephemeral |
Business logic, joins, pivots | int_<entity>_<verb>ed |
Tested via downstream |
| Marts | table/incremental |
Business-facing facts and dimensions | fct_<entity>, dim_<entity> |
Full coverage |
| Reports | table |
Pre-aggregated for dashboards | rpt_<entity> |
Acceptance tests |
Staging Model Pattern
-- stg_stripe__payments.sql
with source as (
select * from {{ source('stripe', 'payments') }}
),
renamed as (
select
id as payment_id,
order_id,
customer_id,
lower(payment_method) as payment_method,
status as payment_status,
amount / 100.0 as amount_dollars,
created_at,
updated_at
from source
)
select * from renamed
Intermediate / Marts (Compact)
-- int_payments_pivoted.sql: pivot payments by method per order
-- fct_orders.sql: join orders + pivoted payments into final fact table
-- Pattern: with <cte> as (select * from {{ ref('...') }}), final as (...) select * from final
Model Naming Conventions
| Layer | Pattern | Example |
|---|---|---|
| Staging | stg_<source>__<entity> |
stg_stripe__payments |
| Intermediate | int_<entity>_<verb>ed |
int_payments_pivoted |
| Facts | fct_<entity> |
fct_orders |
| Dimensions | dim_<entity> |
dim_customers |
| Reports | rpt_<entity> |
rpt_monthly_revenue |
| YAML model config | _<source>__models.yml |
_stripe__models.yml |
| YAML sources | _<source>__sources.yml |
_stripe__sources.yml |
Leading underscore sorts config files above models in directory listings.
SQL Style Guide
- Leading commas -- cleaner diffs
- Lowercase keywords --
select, notSELECT - CTEs over subqueries -- always use
withblocks - Explicit columns -- no
select *in marts (OK in stagingwith source) - Final CTE -- name the last CTE
final - 4-space indentation
- One column per line in select statements
Materialization Decision Matrix
| Situation | Materialization | Why |
|---|---|---|
| Staging models | view |
Always fresh, minimal storage |
| Intermediate logic | ephemeral |
Zero cost, inlined as CTE |
| Marts < 100M rows | table |
Simple, fast reads |
| Marts > 100M rows | incremental |
Only process new/changed data |
| SCD Type 2 tracking | snapshot |
Track historical changes |
For detailed incremental strategies, see: Incremental Models & Performance
Source Configuration
sources:
- name: stripe
database: raw
schema: stripe
loader: fivetran
loaded_at_field: _fivetran_synced
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
tables:
- name: payments
columns:
- name: id
data_tests: [unique, not_null]
- name: order_id
data_tests:
- not_null
- relationships:
to: source('shopify', 'orders')
field: id
| Concept | Snowflake | BigQuery |
|---|---|---|
| Top-level container | Database | Project |
| Schema grouping | Schema | Dataset |
| Freshness field | _fivetran_synced |
_fivetran_synced or _PARTITIONTIME |
ref() and source() Rules
source()only in staging -- staging models are the only gateway to raw data.ref()everywhere else -- all other models reference throughref().- Never skip layers -- marts must not
ref()staging directly; go through intermediate. - Never hardcode schema names -- use
source()andref()exclusively.
-- WRONG: select * from raw.stripe.payments
-- WRONG: {{ source('stripe', 'payments') }} in a marts model
-- WRONG: {{ ref('stg_stripe__payments') }} in a marts model
-- CORRECT: {{ ref('int_payments_pivoted') }} in a marts model
Basic Testing Overview
| Layer | Test Type | Examples |
|---|---|---|
| Sources | Freshness, existence | loaded_at_field, not_null on keys |
| Staging | PK integrity | unique, not_null on PK |
| Intermediate | Tested via downstream | -- |
| Marts | Full coverage | All keys, accepted values, relationships, row counts |
For deep testing strategies, see: Testing & Quality Strategy
Security Posture
See Security & Compliance Patterns for the full framework.
- Credentials:
profiles.yml(never committed) usingenv_var()for all secrets. - Auth: SSO/OAuth for local dev; key-pair (Snowflake) or workload identity (BigQuery) for CI/prod.
- Data classification: Tag sensitive columns with
meta.data_classificationin model YAML. - Never put real data in seed files or hardcode credentials.
Detailed Guides
Reference files loaded on demand:
- Testing & Quality Strategy -- Schema, generic, singular, unit tests, dbt-expectations, layer strategy
- CI/CD & Deployment -- Slim CI, GitHub Actions, dbt Cloud jobs, environment strategy, blue/green
- Jinja, Macros & Packages -- Jinja fundamentals, custom macros, packages, debugging
- Incremental Models & Performance -- Microbatch (1.9+), merge, delete+insert, warehouse tuning, cost monitoring
- Data Quality & Observability -- Source freshness, Elementary, anomaly detection, alerting, incident response
- Semantic Layer & Governance -- MetricFlow, contracts, versions, access controls, dbt Mesh
- Data Cleaning Patterns -- Deduplication, standardization, entity resolution, validation, NULL handling, cleaning layer architecture
- DuckDB Adapter -- dbt-duckdb setup, external file reading, DuckDB-specific SQL, export patterns, dev-to-prod portability
- Consulting Workflow -- dbt artifacts as deliverables, security tier awareness, project portability, client handoff
Handoffs
- Comprehensive testing beyond dbt-native tests → data-testing (testing strategy, SQL assertions, pipeline validation, test-as-deliverable)
- Data governance features → data-governance (cataloging, lineage, classification, access control via dbt meta tags)
More from dtsong/data-engineering-skills
data-observability
Use this skill when implementing monitoring, alerting, and incident response for data pipelines. Covers freshness monitoring, volume anomaly detection, schema change detection, alerting patterns, and incident response workflows. Common phrases: \"data freshness\", \"pipeline monitoring\", \"data anomaly\", \"schema drift\", \"data alerting\", \"incident response\", \"data observability\", \"stale data\". Do NOT use for writing dbt models (use dbt-transforms), pipeline scheduling (use data-pipelines), or data quality testing as deliverables (use data-testing).
3duckdb
Use this skill when working with DuckDB for local data analysis, file ingestion, or data exploration. Covers reading CSV/Excel/Parquet/JSON files into DuckDB, SQL analytics on local data, data profiling, cleaning transformations, and export to various formats. Common phrases: \"analyze this CSV\", \"DuckDB query\", \"local data analysis\", \"read Excel in SQL\", \"profile this data\". Do NOT use for dbt model building (use dbt-transforms with DuckDB adapter) or cloud warehouse administration.
2data-governance
Use this skill when implementing data governance as part of engineering work. Covers data cataloging (dbt docs, external tools), lineage documentation, data classification (PII/PHI taxonomy), access control patterns (RBAC, row-level security), and compliance frameworks (GDPR, HIPAA, SOX, CCPA). Common phrases: \"data catalog\", \"data lineage\", \"PII classification\", \"access control\", \"RBAC\", \"data governance\", \"compliance requirements\". Do NOT use for writing dbt models (use dbt-transforms), pipeline orchestration (use data-pipelines), or data quality testing (use data-testing).
2dlt-extract
Use this skill when building DLT pipelines for file-based or consulting data extraction. Covers Excel/CSV/SharePoint ingestion via DLT, destination swapping (DuckDB dev to warehouse prod), schema contracts for cleaning, and portable pipeline patterns. Common phrases: \"dlt pipeline for files\", \"extract Excel with dlt\", \"portable data pipeline\", \"dlt filesystem source\". Do NOT use for core DLT concepts like REST API or SQL database sources (use data-integration) or pipeline scheduling (use data-pipelines).
2data-testing
Use this skill when designing testing strategies for data pipelines, writing SQL assertions, validating pipeline output, or packaging tests as client deliverables. Covers dbt test patterns, pipeline validation, SQL assertion libraries, test coverage targets, and test-as-deliverable packaging. Common phrases: \"data testing strategy\", \"pipeline validation\", \"SQL assertions\", \"test coverage\", \"test as deliverable\", \"data quality tests\". Do NOT use for writing dbt models (use dbt-transforms), DuckDB analytical queries (use duckdb), or pipeline scheduling (use data-pipelines).
2event-streaming
Use this skill when building real-time or near-real-time data pipelines. Covers Kafka, Flink, Spark Streaming, Snowpipe, BigQuery streaming, materialized views, and batch-vs-streaming decisions. Common phrases: \"real-time pipeline\", \"Kafka consumer\", \"streaming vs batch\", \"low latency ingestion\". Do NOT use for batch integration patterns (use data-integration) or pipeline orchestration (use data-pipelines).
2