unity-catalog-constraints
Unity Catalog Primary Key and Foreign Key Constraints
Overview
Unity Catalog supports informational constraints (NOT ENFORCED) that enrich metadata, improve query optimization, and enable BI tool relationship discovery. This skill standardizes PK/FK implementation for star schema dimensional models following Kimball dimensional modeling best practices.
Core Principle: Surrogate keys are PRIMARY KEYS. Facts reference surrogate PKs via FOREIGN KEY constraints. Business keys have UNIQUE constraints for alternate lookups.
When to Use This Skill
Use this skill when:
- Creating Gold layer tables with star schema dimensional models
- Implementing PRIMARY KEY and FOREIGN KEY constraints in Unity Catalog
- Setting up SCD Type 1 or Type 2 dimension tables
- Creating fact tables that reference dimension tables
- Troubleshooting constraint application errors
- Preventing common production deployment failures
Critical Rules
1. Surrogate Keys as Primary Keys
❌ WRONG: Business keys as PKs, facts reference business keys
✅ CORRECT: Surrogate keys as PKs, facts reference surrogate keys
CREATE TABLE dim_store (
store_key STRING NOT NULL, -- ✅ Surrogate PK
store_number STRING NOT NULL, -- Business key (UNIQUE)
CONSTRAINT pk_dim_store PRIMARY KEY (store_key) NOT ENFORCED,
CONSTRAINT uk_store_number UNIQUE (store_number) NOT ENFORCED
)
CREATE TABLE fact_sales (
store_key STRING NOT NULL, -- ✅ Surrogate FK
CONSTRAINT fk_sales_store FOREIGN KEY (store_key)
REFERENCES dim_store(store_key) NOT ENFORCED
)
2. NOT NULL Requirement for Primary Keys
All surrogate key columns used in PKs MUST be NOT NULL.
Error: Cannot create the primary key because its child column(s) is nullable.
3. Foreign Keys Must Reference Primary Keys
Foreign keys MUST reference PRIMARY KEY columns, even when NOT ENFORCED.
- FKs can only reference PK columns (not any column)
- FKs must reference exact column(s) in PK definition
4. Never Define FK Constraints Inline in CREATE TABLE
❌ WRONG: FK constraints inline during table creation
✅ CORRECT: Define PKs inline, apply FKs later via ALTER TABLE
-- Step 1: CREATE TABLE with PK only
CREATE TABLE fact_sales (
CONSTRAINT pk_fact_sales PRIMARY KEY (...) NOT ENFORCED -- ✅ OK
)
-- Step 2: Apply FKs after all tables exist (separate script)
ALTER TABLE fact_sales
ADD CONSTRAINT fk_sales_store
FOREIGN KEY (store_key)
REFERENCES dim_store(store_key) NOT ENFORCED -- ✅ Safe
Key Principles:
- ✅ Inline PKs are safe - Define PRIMARY KEY constraints directly in CREATE TABLE
- ❌ Inline FKs fail - Never define FOREIGN KEY constraints in CREATE TABLE
- ✅ Separate constraint script - Create dedicated
constraints.pyto apply all FKs after tables exist - ✅ Dependency order matters - Reference tables (dims) before fact tables, constraints last
Quick Reference
Dimension Table (SCD Type 1)
- Surrogate key (
*_key) is PRIMARY KEY - Business key has UNIQUE constraint
- All PK columns are NOT NULL
- PK defined inline in CREATE TABLE
Dimension Table (SCD Type 2)
- Surrogate key (
*_key) is PRIMARY KEY - Business key does NOT have UNIQUE (multiple versions allowed)
is_currentflag indicates current version- Facts join to surrogate key, filter
WHERE is_current = TRUE
Fact Table
- Composite PK on surrogate FKs (defines grain)
- Surrogate FKs reference dimension PKs
- Business keys included for readability
- NO inline FK constraints (apply separately via ALTER TABLE)
Date Type Casting
Rule: Always CAST(DATE_TRUNC(...) AS DATE) when populating DATE columns.
Constraint Naming Conventions
| Constraint Type | Pattern | Example |
|---|---|---|
| Primary Key | pk_<table_name> |
pk_dim_store |
| Foreign Key | fk_<fact_table>_<dimension> |
fk_sales_store |
| Unique (Business Key) | uk_<table>_<column> |
uk_store_number |
Quick Reference Checklist
Table Creation Phase
- All dimension tables define PRIMARY KEY inline in CREATE TABLE
- NO fact tables define FOREIGN KEY inline in CREATE TABLE
- Date columns use explicit
CAST(DATE_TRUNC(...) AS DATE) - All surrogate key columns are
NOT NULL - Reference dimension (dim_date) created before facts that reference it
Constraint Application Phase
- Separate
constraints.pyscript applies ALL FKs via ALTER TABLE - Constraint script runs AFTER all table creation tasks
- Primary keys applied to dimensions before foreign keys applied to facts
- Each FK references an existing PK column
- Constraint names are descriptive (e.g.,
pk_dim_date,fk_sales_date)
Validation Phase
- All surrogate key columns are
NOT NULL - PKs defined on surrogate keys (proper dimensional modeling)
- FKs reference surrogate PK columns exactly
- Facts have surrogate FK columns (store_key, product_key, date_key)
- Business keys have UNIQUE constraints (where applicable)
- All constraints include
NOT ENFORCED
Common Mistakes to Avoid
- PK on business key - Use surrogate keys as PKs
- Nullable surrogate key - All PK columns must be NOT NULL
- FK references non-PK column - FKs must reference PK columns
- Inline FK constraints - Never define FKs in CREATE TABLE
- DATE_TRUNC without casting - Always CAST to DATE
- Module imports in notebooks - Inline helper functions or use pure Python
- Widget parameter name mismatch - YAML parameters must match
dbutils.widgets.get()
Reference Files
constraint-patterns.md
Detailed SQL patterns including:
- Standard dimensional model patterns (SCD Type 1/2, date dimensions, fact tables)
- Correct vs incorrect pattern examples
- Common mistakes and solutions
- Production error prevention patterns
- Benefits of NOT ENFORCED constraints
- Design philosophy
validation-guide.md
Validation patterns and troubleshooting:
- NOT NULL requirement for primary keys
- Databricks FK constraint requirements
- Validation checklists (pre-deployment, post-deployment)
- Common error patterns and solutions
- Production deployment checklist
- Benefits of NOT ENFORCED constraints
Scripts
apply_constraints.py
Python utility for applying constraints:
add_primary_key()- Add PRIMARY KEY constraintadd_foreign_key()- Add FOREIGN KEY constraintadd_unique_constraint()- Add UNIQUE constraint (for business keys)drop_constraint()- Drop constraint (for idempotency)apply_all_constraints()- Example function applying all constraints
Usage:
from scripts.apply_constraints import add_primary_key, add_foreign_key
add_primary_key(spark, catalog, schema, "dim_store", ["store_key"])
add_foreign_key(spark, catalog, schema, "fact_sales", ["store_key"],
"dim_store", ["store_key"])
Assets
constraints-template.sql
SQL template for creating Gold layer tables with constraints:
- Dimension table templates (SCD Type 1 and Type 2)
- Date dimension template
- Fact table template (PK only, no inline FKs)
- Foreign key application examples (ALTER TABLE statements)
Usage: Copy template, replace placeholders (<dimension_name>, <business_key>, etc.), customize for your schema.