silver-layer-setup
Silver Layer Setup - Orchestrator Skill
End-to-end workflow for creating production-grade Silver layer pipelines using Spark Declarative Pipelines (SDP, formerly Delta Live Tables/DLT) with Delta table-based data quality rules, quarantine patterns, streaming ingestion, and monitoring views.
Naming: Databricks rebranded DLT to Spark Declarative Pipelines (SDP) / Lakeflow Declarative Pipelines (LDP). The modern Python API is
from pyspark import pipelines as dp. However, our DQ rules framework still usesimport dlt(legacy API) because@dlt.expect_all_or_drop()decorators are not yet available in thedpAPI. When Databricks migrates expectations todp, both this skill anddlt-expectations-patternswill be updated. New projects may usedatabricks pipelines initto scaffold an SDP Asset Bundle project.
Time Estimate: 3-4 hours for initial setup, 1 hour per additional table
What You'll Create:
dq_rulesDelta table - Centralized rules repository in Unity Catalogdq_rules_loader.py- Pure Python module to load rules at runtimesilver_*.py- SDP/DLT notebooks with expectations loaded from Delta tablesilver_pipeline.yml- Serverless SDP pipeline configuration- DQ monitoring views - Per-table metrics and referential integrity checks
Decision Tree
| Question | Action |
|---|---|
| Creating a Silver layer from scratch? | Use this skill - it orchestrates everything |
| Only need DLT expectations patterns? | Read silver/01-dlt-expectations-patterns/SKILL.md directly |
| Need advanced DQX validation? | Read silver/02-dqx-patterns/SKILL.md directly |
| Need Asset Bundle configuration? | Read common/databricks-asset-bundles/SKILL.md directly |
| Need table properties reference? | Read common/databricks-table-properties/SKILL.md directly |
| Need pure Python import patterns? | Read common/databricks-python-imports/SKILL.md directly |
Mandatory Skill Dependencies
CRITICAL: Before generating ANY code for the Silver layer, you MUST read and follow the patterns in these common skills. Do NOT generate these patterns from memory.
| Phase | MUST Read Skill (use Read tool on SKILL.md) | What It Provides |
|---|---|---|
| All phases | common/databricks-expert-agent |
Core extraction principle: extract names from source, never hardcode |
| Schema setup | common/schema-management-patterns |
CREATE SCHEMA DDL with governance metadata |
| DQ rules table | common/databricks-table-properties |
TBLPROPERTIES for the dq_rules metadata table |
| DQ rules table | common/unity-catalog-constraints |
PRIMARY KEY constraint syntax |
| Rules loader | common/databricks-python-imports |
Pure Python module patterns (NO notebook header) |
| DLT notebooks | common/databricks-table-properties |
Silver-layer TBLPROPERTIES (CDF, row tracking, auto-optimize) |
| Pipeline config | common/databricks-asset-bundles |
DLT pipeline YAML, job YAML, serverless config |
| Deployment (if user-triggered) | common/databricks-autonomous-operations |
Deploy β Poll β Diagnose β Fix β Redeploy loop when jobs/pipelines fail |
NEVER do these without FIRST reading the corresponding skill:
- NEVER write
table_properties={...}without readingdatabricks-table-properties - NEVER write Python import patterns without reading
databricks-python-imports - NEVER write Asset Bundle YAML without reading
databricks-asset-bundles - NEVER write
CREATE SCHEMAwithout readingschema-management-patterns - NEVER define PK/FK constraints without reading
unity-catalog-constraints
π΄ Non-Negotiable Defaults (Applied to EVERY Silver Table and Pipeline)
These defaults are ALWAYS applied. There are NO exceptions, NO overrides, NO alternative options.
| Default | Value | Applied Where | NEVER Do This Instead |
|---|---|---|---|
| Serverless | serverless: true |
Pipeline YAML | β NEVER set serverless: false or define clusters: |
| Auto Liquid Clustering | cluster_by_auto=True |
Every @dlt.table() |
β NEVER use cluster_by=["col1", "col2"] or partition_cols= |
| Edition | edition: ADVANCED |
Pipeline YAML | β NEVER use CORE or PRO (expectations require ADVANCED) |
| Photon | photon: true |
Pipeline YAML | β NEVER set photon: false |
| Row Tracking | "delta.enableRowTracking": "true" |
Every table's table_properties |
β NEVER omit (breaks downstream MV refresh) |
| Change Data Feed | "delta.enableChangeDataFeed": "true" |
Every table's table_properties |
β NEVER omit (required for incremental propagation) |
# β
CORRECT: Every @dlt.table() MUST include these
@dlt.table(
name="silver_anything",
table_properties={
"delta.enableChangeDataFeed": "true",
"delta.enableRowTracking": "true",
# ... other properties from databricks-table-properties
},
cluster_by_auto=True # π΄ MANDATORY on every table, including monitoring views
)
# β
CORRECT: Pipeline YAML MUST include these
serverless: true # π΄ MANDATORY - no classic clusters
photon: true # π΄ MANDATORY - vectorized execution
edition: ADVANCED # π΄ MANDATORY - required for expectations
Silver-Domain Dependencies
| Skill | Requirement | What It Provides |
|---|---|---|
silver/01-dlt-expectations-patterns |
MUST read | DQ rules table DDL, rules loader cache pattern, DLT decorators, quarantine generation, runtime rule updates |
silver/02-dqx-patterns |
Optional | Read only when user needs richer diagnostics than DLT expectations, or pre-merge validation |
Core Philosophy: Schema Cloning
The Silver layer should essentially clone the source Bronze schema with minimal transformations:
- Same column names as Bronze (no complex renaming)
- Same data types (minimal type conversions)
- Same grain (no aggregation - that's for Gold)
- Add data quality rules (the main value-add)
- Add derived flags (business indicators like
is_return,is_out_of_stock) - Add business keys (SHA256 hashes for tracking)
- Add timestamps (
processed_timestamp)
What NOT to do in Silver:
- No major schema restructuring
- No aggregations (save for Gold)
- No complex business logic (simple flags only)
- No joining across tables (dimension lookups in Gold)
Why: Silver is the validated copy of source data. Gold handles complex transformations. This keeps Silver focused on data quality and makes troubleshooting easier (column names match source).
Python API: ALWAYS use import dlt (Legacy API)
Our DQ rules framework (dlt-expectations-patterns) is built on the legacy import dlt API. ALWAYS use this API for Silver layer creation.
# β
CORRECT: Legacy API (our standard)
import dlt
from dq_rules_loader import get_critical_rules_for_table
@dlt.table(name="silver_transactions", cluster_by_auto=True)
@dlt.expect_all_or_drop(get_critical_rules_for_table("silver_transactions"))
def silver_transactions():
return dlt.read_stream(get_bronze_table("bronze_transactions"))
# β WRONG: Modern SDP API (not compatible with our DQ rules framework)
from pyspark import pipelines as dp
@dp.table(name="silver_transactions")
def silver_transactions():
return spark.readStream.table("bronze_transactions")
Why not modern dp API? The @dlt.expect_all_or_drop() and @dlt.expect_all() decorators from our dlt-expectations-patterns skill require the dlt module. When Databricks fully migrates expectations to dp, update both this skill and dlt-expectations-patterns.
File Structure
src/{project}_silver/
βββ setup_dq_rules_table.py # Databricks notebook: Create and populate DQ rules Delta table
βββ dq_rules_loader.py # Pure Python (NO notebook header): Load rules from Delta table
βββ silver_dimensions.py # DLT notebook: Dimension tables (stores, products, etc.)
βββ silver_transactions.py # DLT notebook: Fact table with quarantine
βββ silver_inventory.py # DLT notebook: Additional fact tables (if applicable)
βββ data_quality_monitoring.py # DLT notebook: DQ monitoring views
Critical Files:
dq_rules_loader.pymust be pure Python (NO# Databricks notebook sourceheader)- Run
silver_dq_setup_jobBEFORE deploying DLT pipeline
Working Memory Management
This orchestrator spans 6 phases (deployment and Phase 7 are user-triggered). To maintain coherence without context pollution:
After each phase, persist a brief summary note capturing:
- Phase 1 output: Schema names (catalog, silver_schema), table list, DQ rules strategy decision
- Phase 2 output: DQ rules table path, count of rules defined, rule severity distribution
- Phase 3 output:
dq_rules_loader.pypath, confirmation it is pure Python (no notebook header) - Phase 4 output: DLT notebook paths per table, expectation counts, SCD handling decisions
- Phase 5 output: Monitoring view paths, metric definitions
- Phase 6 output: Pipeline YAML path, job YAML path,
databricks.ymlsync status - Phase 7 output (if user-triggered): Anomaly detection config, schema monitoring status
What to keep in working memory: Only the current phase's context, the table list from Phase 1, and the previous phase's summary note. Discard intermediate outputs (full DDL strings, DQ rule DataFrames, raw DLT notebook contents) β they are on disk and reproducible.
Critical file note: dq_rules_loader.py must be pure Python (NO # Databricks notebook source header). Carry this constraint through all phases.
Phased Implementation Workflow
Phase 1: Requirements & Schema Setup (30 min)
Pre-Condition - MUST read these skills first:
- Read
common/databricks-expert-agent/SKILL.md- Apply extraction principle throughout - Read
common/schema-management-patterns/SKILL.md- Use for Silver schema DDL
Steps:
- Fill in requirements template:
assets/templates/requirements-template.md- Map Bronze tables to Silver tables
- Define DQ rules per entity (critical vs warning)
- Identify quarantine candidates
- Create Silver schema using pattern from
schema-management-patterns - Verify Bronze tables exist and extract their schemas (don't hardcode column names)
Phase 2: DQ Rules Table Setup (30 min)
Pre-Condition - MUST read these skills first:
- Read
silver/01-dlt-expectations-patterns/SKILL.md- Use for DQ rules table DDL and population - Read
common/databricks-table-properties/SKILL.md- Apply metadata table TBLPROPERTIES - Read
common/unity-catalog-constraints/SKILL.md- Apply PK constraint on (table_name, rule_name)
Steps:
- Create
setup_dq_rules_table.pynotebook - Define DQ rules table DDL (schema from
dlt-expectations-patterns) - Apply TBLPROPERTIES from
databricks-table-properties - Apply PK constraint:
CONSTRAINT pk_dq_rules PRIMARY KEY (table_name, rule_name) NOT ENFORCED - Populate rules using the requirements from Phase 1
- Verify file created:
setup_dq_rules_table.pyis ready for deployment (deployment is user-triggered)
Phase 3: Rules Loader Module (15 min)
Pre-Condition - MUST read these skills first:
- Read
common/databricks-python-imports/SKILL.md- CRITICAL: loader must be pure Python - Read
silver/01-dlt-expectations-patterns/SKILL.md- Use cache pattern withtoPandas()
Steps:
- Create
dq_rules_loader.pyas pure Python file (NO notebook header!) - Implement functions:
get_critical_rules_for_table(),get_warning_rules_for_table(),get_quarantine_condition() - Use module-level cache pattern with
toPandas()(NOT.collect()) fromdlt-expectations-patterns - Test import:
from dq_rules_loader import get_critical_rules_for_table
Phase 4: DLT Notebooks - Silver Tables (1-2 hours)
Pre-Condition - MUST read these skills first:
- Read
common/databricks-table-properties/SKILL.md- Extract Silver TBLPROPERTIES - Read
silver/01-dlt-expectations-patterns/SKILL.md- Use decorator patterns - (Optional) Read
silver/02-dqx-patterns/SKILL.md- Only if user needs hybrid DQX+DLT
Steps:
- Create DLT notebooks using patterns from
references/silver-table-patterns.md - Include
get_bronze_table()helper in every notebook (see references/) - For each table:
- Apply Silver TBLPROPERTIES from
databricks-table-properties - Apply
@dlt.expect_all_or_drop(get_critical_rules_for_table(...))decorator - Apply
@dlt.expect_all(get_warning_rules_for_table(...))decorator - Clone Bronze schema with minimal transformations
- Add derived flags, business keys,
processed_timestamp - Set
cluster_by_auto=True(NEVER specify columns manually)
- Apply Silver TBLPROPERTIES from
- For high-volume fact tables: create quarantine table using
get_quarantine_condition()
See: references/silver-table-patterns.md for complete templates
Phase 5: Monitoring Views (30 min)
No external skill dependencies - use references/monitoring-patterns.md
Steps:
- Create
data_quality_monitoring.pyDLT notebook - Add per-table DQ metrics views (record counts, pass/fail rates)
- Add referential integrity checks (orphaned records between fact and dimension)
- Add data freshness monitoring
See: references/monitoring-patterns.md for complete patterns
Phase 6: Pipeline & Job Configuration (15 min)
Pre-Condition - MUST read these skills first:
- Read
common/databricks-asset-bundles/SKILL.md- DLT pipeline YAML, job YAML patterns
Steps:
- Create
resources/silver_dlt_pipeline.ymlusing patterns fromdatabricks-asset-bundles - Create
resources/silver_dq_setup_job.ymlfor the DQ rules setup job - Set DLT Direct Publishing Mode:
catalog+schemafields (NOTtarget) - Pass configuration:
catalog,bronze_schema,silver_schema - Set:
serverless: true,edition: ADVANCED,photon: true
See: references/pipeline-configuration.md for Silver-specific examples
π STOP β Artifact Creation Complete
Phases 1β6 are complete. All files (DQ rules table script, rules loader, DLT notebooks, monitoring views, pipeline/job YAMLs) have been created. Do NOT proceed to deployment or Phase 7 unless the user explicitly requests it.
Report what was created and ask the user if they want to deploy and run.
Deployment Order (USER-TRIGGERED ONLY β do not auto-execute):
# 1. Deploy everything
databricks bundle deploy -t dev
# 2. Run DQ rules setup FIRST (creates dq_rules table)
databricks bundle run silver_dq_setup_job -t dev
# 3. Verify rules table exists
# SELECT * FROM {catalog}.{silver_schema}.dq_rules
# 4. THEN run DLT pipeline (loads rules from table)
databricks pipelines start-update --pipeline-name "[dev] Silver Layer Pipeline"
Phase 7: Enable Anomaly Detection on Silver Schema (5 min) β USER-TRIGGERED ONLY
This phase is executed ONLY when the user explicitly requests it. Do not auto-execute.
Pre-Condition - MUST read this skill first:
- Read
monitoring/04-anomaly-detection/SKILL.mdβ Schema-level freshness/completeness monitoring
Why: Every Silver schema should have anomaly detection enabled from day one. It builds freshness and completeness ML baselines immediately, catching stale/incomplete tables before downstream consumers notice.
Steps:
- Enable anomaly detection on the Silver schema (uses
enable_anomaly_detection_on_schema()from the anomaly-detection skill) - Exclude metadata tables that are not data pipeline outputs (e.g.,
dq_rules) - Verify enablement via Catalog Explorer or SDK
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.dataquality import Monitor, AnomalyDetectionConfig
w = WorkspaceClient()
# Get Silver schema UUID
schema_info = w.schemas.get(full_name=f"{catalog}.{silver_schema}")
schema_id = schema_info.schema_id
# Enable anomaly detection (exclude metadata tables)
try:
w.data_quality.create_monitor(
monitor=Monitor(
object_type="schema",
object_id=schema_id,
anomaly_detection_config=AnomalyDetectionConfig(
excluded_table_full_names=[
f"{catalog}.{silver_schema}.dq_rules", # Metadata, not pipeline output
]
)
)
)
print(f"β Anomaly detection enabled on {catalog}.{silver_schema}")
except Exception as e:
if "already exists" in str(e).lower():
print(f"β Already enabled (skipping)")
else:
print(f"β οΈ Non-blocking: {e}")
Note: This is non-blocking β if anomaly detection fails to enable (permissions, preview limitations), the Silver layer setup continues. Retry later via monitoring/04-anomaly-detection/scripts/enable_anomaly_detection.py.
Common Mistakes (Silver-Specific)
Mistake 1: Deploying DLT Before DQ Setup Job
Pipeline Error: Table or view not found: dq_rules
Fix: Run silver_dq_setup_job BEFORE deploying/running DLT pipeline.
Mistake 2: Notebook Header in Loader File
# dq_rules_loader.py
# Databricks notebook source # <-- Makes it a notebook, breaks imports!
Fix: Remove # Databricks notebook source line. Read databricks-python-imports for correct pattern.
Mistake 3: Aggregation in Silver
# WRONG: Aggregation belongs in Gold
def silver_sales_daily():
return dlt.read_stream(...).groupBy("store", "date").agg(sum("revenue"))
Fix: Keep Silver at transaction grain. Aggregate in Gold.
Mistake 4: Manual Clustering Columns
# WRONG
@dlt.table(cluster_by=["store_number", "transaction_date"])
Fix: Always use cluster_by_auto=True. Never specify columns.
Mistake 5: Using expect_or_fail
# WRONG: Pipeline fails on bad data
@dlt.expect_or_fail("valid_id", "id IS NOT NULL")
Fix: Use @dlt.expect_all_or_drop() for critical rules. Pipeline continues, bad records quarantined.
Mistake 6: Hardcoding Table Names
# WRONG: Hardcoded table reference
dlt.read_stream("my_catalog.bronze.transactions")
Fix: Use get_bronze_table() helper with DLT configuration. See references/silver-table-patterns.md.
Mistake 7: Schema Evolution Without Full Refresh
Error: Incompatible schema change detected on streaming table
Fix: Streaming tables require a full refresh for incompatible schema changes (adding NOT NULL columns, changing types). Trigger with: databricks pipelines start-update --pipeline-name "..." --full-refresh
Mistake 8: Missing Row Tracking (Breaks Downstream MVs)
# β WRONG: Missing delta.enableRowTracking
table_properties={
"delta.enableChangeDataFeed": "true",
# Row tracking missing!
}
Fix: ALWAYS include "delta.enableRowTracking": "true" in Silver table properties. Without it, downstream Gold materialized views cannot use incremental refresh and will do expensive full recomputation.
Mistake 9: Using Modern dp API with DQ Rules Framework
# β WRONG: dp API doesn't work with our dlt-expectations decorators
from pyspark import pipelines as dp
@dp.table(name="silver_transactions")
@dlt.expect_all_or_drop(get_critical_rules_for_table(...)) # Mixes APIs!
Fix: ALWAYS use import dlt for Silver notebooks that use our DQ rules framework. See "Python API" section above.
Post-Creation Validation
Before considering the Silver layer complete, verify each item and confirm its source:
Common Skill Compliance
- Table properties match
databricks-table-propertiesSilver layer spec (not generated from memory) - Python imports follow
databricks-python-importspatterns (loader has NO notebook header) - Asset Bundle YAML follows
databricks-asset-bundlespatterns (notebook_task, base_parameters) - Schema DDL follows
schema-management-patterns(IF NOT EXISTS, governance metadata) - PK constraint follows
unity-catalog-constraintssyntax (NOT ENFORCED keyword) - Names extracted from source files per
databricks-expert-agent(not hardcoded)
Silver-Domain Skill Compliance
- DQ rules table follows
dlt-expectations-patternsDDL - Rules loader uses cache pattern from
dlt-expectations-patterns(toPandas, not collect) - DLT decorators follow
dlt-expectations-patterns(expect_all_or_drop, expect_all) - Quarantine condition uses
dlt-expectations-patternsgeneration pattern
Silver Layer Specifics
- Schema cloning: Silver columns match Bronze (no aggregation, no joins)
-
get_bronze_table()helper used for all source table references -
cluster_by_auto=Trueon every table (NEVER manual cluster keys) -
delta.enableRowTracking=trueon every Silver table (required for downstream MV incremental refresh) - Quarantine tables created for high-volume fact tables
- DQ monitoring views created (including data freshness)
- (User-triggered) Deployment order documented: DQ setup job runs BEFORE DLT pipeline
-
import dltused (NOTfrom pyspark import pipelines as dp) -
serverless: truein pipeline YAML (NEVER classic clusters) -
photon: truein pipeline YAML -
edition: ADVANCEDset in pipeline YAML (required for expectations/CDC) - No
clusters:block in pipeline YAML (serverless manages compute) - Deduplication applied where Bronze may have duplicate records
-
processed_timestampadded to every Silver table - Event timestamps preserved from Bronze (not replaced by processing time)
- (User-triggered) Anomaly detection enabled on Silver schema (Phase 7)
- (User-triggered) Metadata tables (e.g.,
dq_rules) excluded from anomaly detection
Reference Files
Silver Table Patterns
references/silver-table-patterns.md- Complete DLT table templates: standard pattern, dimension example, fact example with quarantine,get_bronze_table()helper, derived field patterns
Monitoring Patterns
references/monitoring-patterns.md- DQ monitoring views: per-table metrics, referential integrity checks, data freshness monitoring
Pipeline Configuration
references/pipeline-configuration.md- Silver-specific DLT pipeline YAML and DQ setup job YAML examples (supplementsdatabricks-asset-bundles)
Templates
Requirements Template
assets/templates/requirements-template.md- Fill-in-first requirements gathering: project context, Bronze-to-Silver table mapping, DQ rules per entity, quarantine strategy
Related Skills
| Skill | Relationship | Path |
|---|---|---|
dlt-expectations-patterns |
Mandatory - DQ rules, loader, decorators | silver/01-dlt-expectations-patterns/SKILL.md |
dqx-patterns |
Optional - Advanced validation | silver/02-dqx-patterns/SKILL.md |
anomaly-detection |
Mandatory - Schema freshness/completeness monitoring | monitoring/04-anomaly-detection/SKILL.md |
databricks-expert-agent |
Mandatory - Extraction principle | common/databricks-expert-agent/SKILL.md |
databricks-table-properties |
Mandatory - Silver TBLPROPERTIES | common/databricks-table-properties/SKILL.md |
databricks-python-imports |
Mandatory - Pure Python loader | common/databricks-python-imports/SKILL.md |
databricks-asset-bundles |
Mandatory - Pipeline/job YAML | common/databricks-asset-bundles/SKILL.md |
schema-management-patterns |
Mandatory - Schema DDL | common/schema-management-patterns/SKILL.md |
unity-catalog-constraints |
Mandatory - PK constraint | common/unity-catalog-constraints/SKILL.md |
Pipeline Progression
Previous stage: bronze/00-bronze-layer-setup β Bronze tables must exist before creating Silver
Next stage: After completing the Silver layer, proceed to:
gold/01-gold-layer-setupβ Implement Gold layer tables, merge scripts, and FK constraints from the YAML designs created in stage 1
Post-Completion: Skill Usage Summary (MANDATORY)
After completing all phases of this orchestrator, output a Skill Usage Summary reflecting what you ACTUALLY did β not a pre-written summary.
What to Include
- Every skill
SKILL.mdorreferences/file you read (via the Read tool), in the order you read them - Which phase you were in when you read it
- Whether it was a Worker, Common, Cross-domain, or Reference file
- A one-line description of what you specifically used it for in this session
Format
| # | Phase | Skill / Reference Read | Type | What It Was Used For |
|---|---|---|---|---|
| 1 | Phase N | path/to/SKILL.md |
Worker / Common / Cross-domain / Reference | One-line description |
Summary Footer
End with:
- Totals: X worker skills, Y common skills, Z reference files read across N phases
- Skipped: List any skills from the dependency table above that you did NOT need to read, and why (e.g., "phase not applicable", "user skipped", "no issues encountered")
- Unplanned: List any skills you read that were NOT listed in the dependency table (e.g., for troubleshooting, edge cases, or user-requested detours)
References
Official Databricks Documentation
- DLT Expectations
- Portable and Reusable Expectations
- Share Code Between Notebooks
- Automatic Clustering
- Lakeflow Declarative Pipelines (SDP) - Modern pipeline framework overview
- Python API:
pyspark.pipelines- Modern Python API reference (future migration target) - Row Tracking - Required for incremental MV refresh
- Schema Evolution in Streaming Tables - Full refresh requirements
- Pipeline Edition Comparison - ADVANCED required for expectations/CDC