dbt-transforms
dbt Skill for Claude
Comprehensive dbt guidance covering project structure, modeling, testing, CI/CD, and production patterns. Targets Snowflake, BigQuery, and DuckDB. Beginner-friendly with progressive scaling.
When to Use This Skill
Activate when: Creating/modifying dbt models, choosing materializations, structuring layers, setting up tests, CI/CD, sources, Jinja macros, or making analytics engineering decisions.
Don't use for: Basic SQL syntax, warehouse administration, raw pipeline config (Fivetran/Airbyte), BI tool setup.
Scope Constraints
- Targets Snowflake, BigQuery, and DuckDB; other warehouses may need adaptation.
- Assumes dbt Core 1.6+ or dbt Cloud; earlier versions lack governance features.
- Does not cover raw data ingestion, BI tool configuration, or warehouse admin.
- Reference files are loaded on demand -- do not pre-load multiple references.
Model Routing
| reasoning_demand | preferred | acceptable | minimum |
|---|---|---|---|
| medium | Sonnet | Sonnet, Opus | Sonnet |
Core Principles
- DRY via ref()/source() -- Never hardcode table names; use
ref()orsource()exclusively. - Single Source of Truth -- Staging = entry point for raw data; marts = consumer interface.
- Idempotent Transformations -- Ensure
dbt runis safe to re-execute. - Test Everything -- Every model has at minimum PK uniqueness + not_null tests.
- Progressive Complexity -- Start with views/tables; add incremental only when volume demands it.
Project Structure
dbt_project/
├── dbt_project.yml
├── packages.yml
├── profiles.yml # Local only, never committed
├── models/
│ ├── staging/ # 1:1 with source tables
│ │ └── <source>/
│ │ ├── _<source>__models.yml
│ │ ├── _<source>__sources.yml
│ │ └── stg_<source>__<entity>.sql
│ ├── intermediate/ # Business logic, joins, pivots
│ └── marts/ # Business-facing facts and dimensions
├── macros/
├── tests/
├── seeds/
├── snapshots/
└── analyses/
Modeling Methodology -- Medallion + Kimball
Layer Decision Matrix
| Layer | Materialization | Purpose | Naming | Tests |
|---|---|---|---|---|
| Staging | view |
Clean/rename raw data, 1:1 with source | stg_<source>__<entity> |
not_null, unique on PK |
| Intermediate | ephemeral |
Business logic, joins, pivots | int_<entity>_<verb>ed |
Tested via downstream |
| Marts | table/incremental |
Business-facing facts and dimensions | fct_<entity>, dim_<entity> |
Full coverage |
| Reports | table |
Pre-aggregated for dashboards | rpt_<entity> |
Acceptance tests |
Staging Model Pattern
-- stg_stripe__payments.sql
with source as (
select * from {{ source('stripe', 'payments') }}
),
renamed as (
select
id as payment_id,
order_id,
customer_id,
lower(payment_method) as payment_method,
status as payment_status,
amount / 100.0 as amount_dollars,
created_at,
updated_at
from source
)
select * from renamed
Intermediate / Marts (Compact)
-- int_payments_pivoted.sql: pivot payments by method per order
-- fct_orders.sql: join orders + pivoted payments into final fact table
-- Pattern: with <cte> as (select * from {{ ref('...') }}), final as (...) select * from final
Model Naming Conventions
| Layer | Pattern | Example |
|---|---|---|
| Staging | stg_<source>__<entity> |
stg_stripe__payments |
| Intermediate | int_<entity>_<verb>ed |
int_payments_pivoted |
| Facts | fct_<entity> |
fct_orders |
| Dimensions | dim_<entity> |
dim_customers |
| Reports | rpt_<entity> |
rpt_monthly_revenue |
| YAML model config | _<source>__models.yml |
_stripe__models.yml |
| YAML sources | _<source>__sources.yml |
_stripe__sources.yml |
Leading underscore sorts config files above models in directory listings.
SQL Style Guide
- Leading commas -- cleaner diffs
- Lowercase keywords --
select, notSELECT - CTEs over subqueries -- always use
withblocks - Explicit columns -- no
select *in marts (OK in stagingwith source) - Final CTE -- name the last CTE
final - 4-space indentation
- One column per line in select statements
Materialization Decision Matrix
| Situation | Materialization | Why |
|---|---|---|
| Staging models | view |
Always fresh, minimal storage |
| Intermediate logic | ephemeral |
Zero cost, inlined as CTE |
| Marts < 100M rows | table |
Simple, fast reads |
| Marts > 100M rows | incremental |
Only process new/changed data |
| SCD Type 2 tracking | snapshot |
Track historical changes |
For detailed incremental strategies, see: Incremental Models & Performance
Source Configuration
sources:
- name: stripe
database: raw
schema: stripe
loader: fivetran
loaded_at_field: _fivetran_synced
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
tables:
- name: payments
columns:
- name: id
data_tests: [unique, not_null]
- name: order_id
data_tests:
- not_null
- relationships:
to: source('shopify', 'orders')
field: id
| Concept | Snowflake | BigQuery |
|---|---|---|
| Top-level container | Database | Project |
| Schema grouping | Schema | Dataset |
| Freshness field | _fivetran_synced |
_fivetran_synced or _PARTITIONTIME |
ref() and source() Rules
source()only in staging -- staging models are the only gateway to raw data.ref()everywhere else -- all other models reference throughref().- Never skip layers -- marts must not
ref()staging directly; go through intermediate. - Never hardcode schema names -- use
source()andref()exclusively.
-- WRONG: select * from raw.stripe.payments
-- WRONG: {{ source('stripe', 'payments') }} in a marts model
-- WRONG: {{ ref('stg_stripe__payments') }} in a marts model
-- CORRECT: {{ ref('int_payments_pivoted') }} in a marts model
Basic Testing Overview
| Layer | Test Type | Examples |
|---|---|---|
| Sources | Freshness, existence | loaded_at_field, not_null on keys |
| Staging | PK integrity | unique, not_null on PK |
| Intermediate | Tested via downstream | -- |
| Marts | Full coverage | All keys, accepted values, relationships, row counts |
For deep testing strategies, see: Testing & Quality Strategy
Security Posture
See Security & Compliance Patterns for the full framework.
- Credentials:
profiles.yml(never committed) usingenv_var()for all secrets. - Auth: SSO/OAuth for local dev; key-pair (Snowflake) or workload identity (BigQuery) for CI/prod.
- Data classification: Tag sensitive columns with
meta.data_classificationin model YAML. - Never put real data in seed files or hardcode credentials.
Detailed Guides
Reference files loaded on demand:
- Testing & Quality Strategy -- Schema, generic, singular, unit tests, dbt-expectations, layer strategy
- CI/CD & Deployment -- Slim CI, GitHub Actions, dbt Cloud jobs, environment strategy, blue/green
- Jinja, Macros & Packages -- Jinja fundamentals, custom macros, packages, debugging
- Incremental Models & Performance -- Microbatch (1.9+), merge, delete+insert, warehouse tuning, cost monitoring
- Data Quality & Observability -- Source freshness, Elementary, anomaly detection, alerting, incident response
- Semantic Layer & Governance -- MetricFlow, contracts, versions, access controls, dbt Mesh
- Data Cleaning Patterns -- Deduplication, standardization, entity resolution, validation, NULL handling, cleaning layer architecture
- DuckDB Adapter -- dbt-duckdb setup, external file reading, DuckDB-specific SQL, export patterns, dev-to-prod portability
- Consulting Workflow -- dbt artifacts as deliverables, security tier awareness, project portability, client handoff
Handoffs
- Comprehensive testing beyond dbt-native tests → data-testing (testing strategy, SQL assertions, pipeline validation, test-as-deliverable)
- Data governance features → data-governance (cataloging, lineage, classification, access control via dbt meta tags)