databricks-table-properties

SKILL.md

Databricks Table Properties Standards

Pattern Recognition

Every table creation (Bronze, Silver, Gold) uses a consistent set of TBLPROPERTIES and metadata. This rule standardizes these patterns to ensure governance compliance.

Required Table Properties by Layer

See assets/templates/table-properties.sql for complete SQL templates.

Bronze Layer Tables

TBLPROPERTIES (
    'delta.enableChangeDataFeed' = 'true',
    'delta.autoOptimize.optimizeWrite' = 'true',
    'delta.autoOptimize.autoCompact' = 'true',
    'layer' = 'bronze',
    'source_system' = 'RetailChain',  # Update based on source
    'domain' = '<domain>',  # e.g., 'retail', 'sales', 'inventory', 'product'
    'entity_type' = '<dimension|fact>',
    'contains_pii' = '<true|false>',
    'data_classification' = '<confidential|internal>',
    'business_owner' = '<Team Name>',
    'technical_owner' = 'Data Engineering',
    # Optional: Add retention for compliance
    'retention_period' = '7_years'  # Only if required
)

Silver Layer DLT Tables

table_properties={
    "quality": "silver",
    "delta.enableChangeDataFeed": "true",
    "delta.enableRowTracking": "true",
    "delta.enableDeletionVectors": "true",
    "delta.autoOptimize.autoCompact": "true",
    "delta.autoOptimize.optimizeWrite": "true",
    "delta.tuneFileSizesForRewrites": "true",
    "layer": "silver",
    "source_table": "<bronze_table_name>",
    "domain": "<domain>",
    "entity_type": "<dimension|fact|quarantine>",
    "contains_pii": "<true|false>",
    "data_classification": "<confidential|internal>",
    "business_owner": "<Team Name>",
    "technical_owner": "Data Engineering"
}

Gold Layer Tables

TBLPROPERTIES (
    'delta.enableChangeDataFeed' = 'true',
    'delta.enableRowTracking' = 'true',
    'delta.enableDeletionVectors' = 'true',
    'delta.autoOptimize.autoCompact' = 'true',
    'delta.autoOptimize.optimizeWrite' = 'true',
    'layer' = 'gold',
    'source_layer' = 'silver',
    'domain' = '<domain>',
    'entity_type' = '<dimension|fact>',
    'contains_pii' = '<true|false>',
    'data_classification' = '<confidential|internal>',
    'business_owner' = '<Team Name>',
    'technical_owner' = 'Data Engineering',
    'gold_type' = '<scd2|snapshot|aggregated>'
)

Clustering Configuration

⚠️ MANDATORY: ALWAYS use automatic liquid clustering

NEVER specify clustering columns manually. Always use AUTO.

# For SQL DDL (Bronze, Gold)
CLUSTER BY AUTO

# For DLT Python (Silver)
cluster_by_auto=True

Benefits of AUTO clustering:

  • ✅ Delta automatically selects optimal clustering columns
  • ✅ Self-tuning based on query patterns
  • ✅ No manual column specification needed
  • ✅ Works with all data types (including BOOLEAN)
  • ✅ Adapts as data and queries evolve

❌ DO NOT DO THIS:

CLUSTER BY (column1, column2)  -- ❌ WRONG: Never specify columns
CLUSTER BY (is_current)         -- ❌ WRONG: BOOLEAN columns don't support clustering

✅ ALWAYS DO THIS:

CLUSTER BY AUTO  -- ✅ CORRECT: Let Delta choose optimal clustering

Table Comments

Modern Pattern (RECOMMENDED for Gold Layer)

For Gold layer tables, use dual-purpose documentation without "LLM:" prefix.

See data_product_accelerator/skills/gold/design-workers/06-table-documentation/SKILL.md for comprehensive Gold layer standards.

Pattern:

[Natural description]. Business: [business context and use cases]. Technical: [implementation details].

Example:

COMMENT 'Gold layer daily sales fact table with pre-aggregated metrics at store-product-day grain. Business: Primary source for sales performance reporting including revenue, units, discounts, returns, and customer loyalty metrics. Aggregated from transaction-level Silver data for fast query performance. Used for dashboards, executive reporting, and sales analysis. Technical: Grain is one row per store-product-date combination. Pre-aggregated measures eliminate need for transaction-level scans, surrogate keys enable fast dimension joins.'

Legacy Pattern (Bronze/Silver)

For Bronze and Silver layers, "LLM:" prefix is acceptable for brevity.

# DLT Example (Silver)
@dlt.table(
    name="silver_transactions",
    comment="""LLM: Silver layer streaming fact table for point-of-sale transactions with comprehensive 
    data quality rules, price validation, discount logic verification, and referential integrity checks""",
    table_properties={...},
    cluster_by_auto=True
)

# SQL DDL Example (Bronze)
COMMENT 'LLM: Bronze layer dimension table containing retail store location details with full UC compliance. Store details to link across other views and ensure accuracy of data linkage.'

Column Comments

Gold Layer (Dual-Purpose Format)

Every column in Gold layer must have comprehensive dual-purpose comments:

Pattern:

[Definition]. Business: [purpose, use cases, business rules]. Technical: [data type, format, calculation, source, constraints].

Examples:

# Surrogate key
store_key STRING NOT NULL 
    COMMENT 'Surrogate key uniquely identifying each version of a store record. Business: Used for joining fact tables to dimension. Technical: MD5 hash generated from store_id and processed_timestamp to ensure uniqueness across SCD Type 2 versions.'

# Business key
store_number STRING NOT NULL 
    COMMENT 'Business key identifying the physical store location. Business: The primary identifier used by store operations and field teams. Technical: Natural key from source system, same across all historical versions of this store.'

# Measure
net_revenue DECIMAL(18,2) 
    COMMENT 'Net revenue after subtracting returns from gross revenue. Business: The actual revenue realized from sales, primary KPI for financial reporting. Technical: gross_revenue - return_amount, represents true daily sales value.'

Bronze/Silver Layers (Simpler Format)

Column comments can be more concise but should still include key context:

store_number STRING NOT NULL 
    COMMENT 'Store number where the transaction occurred. Links to store dimension.'
    
transaction_date DATE NOT NULL
    COMMENT 'Transaction date from POS system. Used for daily aggregations and trending.'

Domain Values

Standard domains used in this project:

  • retail - Store and location data
  • sales - Transaction and revenue data
  • inventory - Stock and replenishment data
  • product - Product master data
  • logistics - Delivery and supply chain
  • revenue - Financial metrics

Data Classification Values

  • confidential - Contains PII or sensitive business data
  • internal - Business data without PII
  • public - Safe for external sharing (rare)

Validation Checklist

When creating any table, ensure:

  • layer property matches the actual layer
  • domain is from the standard list
  • entity_type is dimension, fact, or quarantine
  • contains_pii accurately reflects PII presence
  • data_classification aligns with contains_pii
  • business_owner is a real team name
  • CLUSTER BY AUTO or cluster_by_auto=True is set
  • Table comment starts with "LLM:"
  • All columns have detailed comments

Common Mistakes to Avoid

Don't do this:

# Missing critical properties
TBLPROPERTIES (
    'layer' = 'bronze'
)

# No clustering
CREATE TABLE my_table (...)
USING DELTA

# Minimal comment
COMMENT 'Store data'

Do this:

# Complete properties
TBLPROPERTIES (
    'delta.enableChangeDataFeed' = 'true',
    'delta.autoOptimize.optimizeWrite' = 'true',
    'delta.autoOptimize.autoCompact' = 'true',
    'layer' = 'bronze',
    'source_system' = 'RetailChain',
    'domain' = 'retail',
    'entity_type' = 'dimension',
    'contains_pii' = 'true',
    'data_classification' = 'confidential',
    'business_owner' = 'Retail Operations',
    'technical_owner' = 'Data Engineering'
)
CLUSTER BY AUTO
COMMENT 'LLM: Bronze layer dimension table containing retail store location details with full UC compliance. Store details to link across other views and ensure accuracy of data linkage.'

References

Weekly Installs
1
GitHub Stars
2
First Seen
8 days ago
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1