databricks-table-properties
Databricks Table Properties Standards
Pattern Recognition
Every table creation (Bronze, Silver, Gold) uses a consistent set of TBLPROPERTIES and metadata. This rule standardizes these patterns to ensure governance compliance.
Required Table Properties by Layer
See assets/templates/table-properties.sql for complete SQL templates.
Bronze Layer Tables
TBLPROPERTIES (
'delta.enableChangeDataFeed' = 'true',
'delta.autoOptimize.optimizeWrite' = 'true',
'delta.autoOptimize.autoCompact' = 'true',
'layer' = 'bronze',
'source_system' = 'RetailChain', # Update based on source
'domain' = '<domain>', # e.g., 'retail', 'sales', 'inventory', 'product'
'entity_type' = '<dimension|fact>',
'contains_pii' = '<true|false>',
'data_classification' = '<confidential|internal>',
'business_owner' = '<Team Name>',
'technical_owner' = 'Data Engineering',
# Optional: Add retention for compliance
'retention_period' = '7_years' # Only if required
)
Silver Layer DLT Tables
table_properties={
"quality": "silver",
"delta.enableChangeDataFeed": "true",
"delta.enableRowTracking": "true",
"delta.enableDeletionVectors": "true",
"delta.autoOptimize.autoCompact": "true",
"delta.autoOptimize.optimizeWrite": "true",
"delta.tuneFileSizesForRewrites": "true",
"layer": "silver",
"source_table": "<bronze_table_name>",
"domain": "<domain>",
"entity_type": "<dimension|fact|quarantine>",
"contains_pii": "<true|false>",
"data_classification": "<confidential|internal>",
"business_owner": "<Team Name>",
"technical_owner": "Data Engineering"
}
Gold Layer Tables
TBLPROPERTIES (
'delta.enableChangeDataFeed' = 'true',
'delta.enableRowTracking' = 'true',
'delta.enableDeletionVectors' = 'true',
'delta.autoOptimize.autoCompact' = 'true',
'delta.autoOptimize.optimizeWrite' = 'true',
'layer' = 'gold',
'source_layer' = 'silver',
'domain' = '<domain>',
'entity_type' = '<dimension|fact>',
'contains_pii' = '<true|false>',
'data_classification' = '<confidential|internal>',
'business_owner' = '<Team Name>',
'technical_owner' = 'Data Engineering',
'gold_type' = '<scd2|snapshot|aggregated>'
)
Clustering Configuration
⚠️ MANDATORY: ALWAYS use automatic liquid clustering
NEVER specify clustering columns manually. Always use AUTO.
# For SQL DDL (Bronze, Gold)
CLUSTER BY AUTO
# For DLT Python (Silver)
cluster_by_auto=True
Benefits of AUTO clustering:
- ✅ Delta automatically selects optimal clustering columns
- ✅ Self-tuning based on query patterns
- ✅ No manual column specification needed
- ✅ Works with all data types (including BOOLEAN)
- ✅ Adapts as data and queries evolve
❌ DO NOT DO THIS:
CLUSTER BY (column1, column2) -- ❌ WRONG: Never specify columns
CLUSTER BY (is_current) -- ❌ WRONG: BOOLEAN columns don't support clustering
✅ ALWAYS DO THIS:
CLUSTER BY AUTO -- ✅ CORRECT: Let Delta choose optimal clustering
Table Comments
Modern Pattern (RECOMMENDED for Gold Layer)
For Gold layer tables, use dual-purpose documentation without "LLM:" prefix.
See data_product_accelerator/skills/gold/design-workers/06-table-documentation/SKILL.md for comprehensive Gold layer standards.
Pattern:
[Natural description]. Business: [business context and use cases]. Technical: [implementation details].
Example:
COMMENT 'Gold layer daily sales fact table with pre-aggregated metrics at store-product-day grain. Business: Primary source for sales performance reporting including revenue, units, discounts, returns, and customer loyalty metrics. Aggregated from transaction-level Silver data for fast query performance. Used for dashboards, executive reporting, and sales analysis. Technical: Grain is one row per store-product-date combination. Pre-aggregated measures eliminate need for transaction-level scans, surrogate keys enable fast dimension joins.'
Legacy Pattern (Bronze/Silver)
For Bronze and Silver layers, "LLM:" prefix is acceptable for brevity.
# DLT Example (Silver)
@dlt.table(
name="silver_transactions",
comment="""LLM: Silver layer streaming fact table for point-of-sale transactions with comprehensive
data quality rules, price validation, discount logic verification, and referential integrity checks""",
table_properties={...},
cluster_by_auto=True
)
# SQL DDL Example (Bronze)
COMMENT 'LLM: Bronze layer dimension table containing retail store location details with full UC compliance. Store details to link across other views and ensure accuracy of data linkage.'
Column Comments
Gold Layer (Dual-Purpose Format)
Every column in Gold layer must have comprehensive dual-purpose comments:
Pattern:
[Definition]. Business: [purpose, use cases, business rules]. Technical: [data type, format, calculation, source, constraints].
Examples:
# Surrogate key
store_key STRING NOT NULL
COMMENT 'Surrogate key uniquely identifying each version of a store record. Business: Used for joining fact tables to dimension. Technical: MD5 hash generated from store_id and processed_timestamp to ensure uniqueness across SCD Type 2 versions.'
# Business key
store_number STRING NOT NULL
COMMENT 'Business key identifying the physical store location. Business: The primary identifier used by store operations and field teams. Technical: Natural key from source system, same across all historical versions of this store.'
# Measure
net_revenue DECIMAL(18,2)
COMMENT 'Net revenue after subtracting returns from gross revenue. Business: The actual revenue realized from sales, primary KPI for financial reporting. Technical: gross_revenue - return_amount, represents true daily sales value.'
Bronze/Silver Layers (Simpler Format)
Column comments can be more concise but should still include key context:
store_number STRING NOT NULL
COMMENT 'Store number where the transaction occurred. Links to store dimension.'
transaction_date DATE NOT NULL
COMMENT 'Transaction date from POS system. Used for daily aggregations and trending.'
Domain Values
Standard domains used in this project:
retail- Store and location datasales- Transaction and revenue datainventory- Stock and replenishment dataproduct- Product master datalogistics- Delivery and supply chainrevenue- Financial metrics
Data Classification Values
confidential- Contains PII or sensitive business datainternal- Business data without PIIpublic- Safe for external sharing (rare)
Validation Checklist
When creating any table, ensure:
-
layerproperty matches the actual layer -
domainis from the standard list -
entity_typeis dimension, fact, or quarantine -
contains_piiaccurately reflects PII presence -
data_classificationaligns with contains_pii -
business_owneris a real team name -
CLUSTER BY AUTOorcluster_by_auto=Trueis set - Table comment starts with "LLM:"
- All columns have detailed comments
Common Mistakes to Avoid
❌ Don't do this:
# Missing critical properties
TBLPROPERTIES (
'layer' = 'bronze'
)
# No clustering
CREATE TABLE my_table (...)
USING DELTA
# Minimal comment
COMMENT 'Store data'
✅ Do this:
# Complete properties
TBLPROPERTIES (
'delta.enableChangeDataFeed' = 'true',
'delta.autoOptimize.optimizeWrite' = 'true',
'delta.autoOptimize.autoCompact' = 'true',
'layer' = 'bronze',
'source_system' = 'RetailChain',
'domain' = 'retail',
'entity_type' = 'dimension',
'contains_pii' = 'true',
'data_classification' = 'confidential',
'business_owner' = 'Retail Operations',
'technical_owner' = 'Data Engineering'
)
CLUSTER BY AUTO
COMMENT 'LLM: Bronze layer dimension table containing retail store location details with full UC compliance. Store details to link across other views and ensure accuracy of data linkage.'