Silver Layer Setup - Orchestrator Skill

End-to-end workflow for creating production-grade Silver layer pipelines using Spark Declarative Pipelines (SDP, formerly Delta Live Tables/DLT) with Delta table-based data quality rules, quarantine patterns, streaming ingestion, and monitoring views.

Naming: Databricks rebranded DLT to Spark Declarative Pipelines (SDP) / Lakeflow Declarative Pipelines (LDP). The modern Python API is from pyspark import pipelines as dp. However, our DQ rules framework still uses import dlt (legacy API) because @dlt.expect_all_or_drop() decorators are not yet available in the dp API. When Databricks migrates expectations to dp, both this skill and dlt-expectations-patterns will be updated. New projects may use databricks pipelines init to scaffold an SDP Asset Bundle project.

Time Estimate: 3-4 hours for initial setup, 1 hour per additional table

What You'll Create:

dq_rules Delta table - Centralized rules repository in Unity Catalog
dq_rules_loader.py - Pure Python module to load rules at runtime
silver_*.py - SDP/DLT notebooks with expectations loaded from Delta table
silver_pipeline.yml - Serverless SDP pipeline configuration
DQ monitoring views - Per-table metrics and referential integrity checks

Decision Tree

Question	Action
Creating a Silver layer from scratch?	Use this skill - it orchestrates everything
Only need DLT expectations patterns?	Read `silver/01-dlt-expectations-patterns/SKILL.md` directly
Need advanced DQX validation?	Read `silver/02-dqx-patterns/SKILL.md` directly
Need Asset Bundle configuration?	Read `common/databricks-asset-bundles/SKILL.md` directly
Need table properties reference?	Read `common/databricks-table-properties/SKILL.md` directly
Need pure Python import patterns?	Read `common/databricks-python-imports/SKILL.md` directly

Mandatory Skill Dependencies

CRITICAL: Before generating ANY code for the Silver layer, you MUST read and follow the patterns in these common skills. Do NOT generate these patterns from memory.

Phase	MUST Read Skill (use Read tool on SKILL.md)	What It Provides
All phases	`common/databricks-expert-agent`	Core extraction principle: extract names from source, never hardcode
Schema setup	`common/schema-management-patterns`	CREATE SCHEMA DDL with governance metadata
DQ rules table	`common/databricks-table-properties`	TBLPROPERTIES for the dq_rules metadata table
DQ rules table	`common/unity-catalog-constraints`	PRIMARY KEY constraint syntax
Rules loader	`common/databricks-python-imports`	Pure Python module patterns (NO notebook header)
DLT notebooks	`common/databricks-table-properties`	Silver-layer TBLPROPERTIES (CDF, row tracking, auto-optimize)
Pipeline config	`common/databricks-asset-bundles`	DLT pipeline YAML, job YAML, serverless config
Deployment (if user-triggered)	`common/databricks-autonomous-operations`	Deploy → Poll → Diagnose → Fix → Redeploy loop when jobs/pipelines fail

NEVER do these without FIRST reading the corresponding skill:

NEVER write table_properties={...} without reading databricks-table-properties
NEVER write Python import patterns without reading databricks-python-imports
NEVER write Asset Bundle YAML without reading databricks-asset-bundles
NEVER write CREATE SCHEMA without reading schema-management-patterns
NEVER define PK/FK constraints without reading unity-catalog-constraints

🔴 Non-Negotiable Defaults (Applied to EVERY Silver Table and Pipeline)

These defaults are ALWAYS applied. There are NO exceptions, NO overrides, NO alternative options.

Default	Value	Applied Where	NEVER Do This Instead
Serverless	`serverless: true`	Pipeline YAML	❌ NEVER set `serverless: false` or define `clusters:`
Auto Liquid Clustering	`cluster_by_auto=True`	Every `@dlt.table()`	❌ NEVER use `cluster_by=["col1", "col2"]` or `partition_cols=`
Edition	`edition: ADVANCED`	Pipeline YAML	❌ NEVER use `CORE` or `PRO` (expectations require ADVANCED)
Photon	`photon: true`	Pipeline YAML	❌ NEVER set `photon: false`
Row Tracking	`"delta.enableRowTracking": "true"`	Every table's `table_properties`	❌ NEVER omit (breaks downstream MV refresh)
Change Data Feed	`"delta.enableChangeDataFeed": "true"`	Every table's `table_properties`	❌ NEVER omit (required for incremental propagation)

# ✅ CORRECT: Every @dlt.table() MUST include these
@dlt.table(
    name="silver_anything",
    table_properties={
        "delta.enableChangeDataFeed": "true",
        "delta.enableRowTracking": "true",
        # ... other properties from databricks-table-properties
    },
    cluster_by_auto=True  # 🔴 MANDATORY on every table, including monitoring views
)

# ✅ CORRECT: Pipeline YAML MUST include these
serverless: true      # 🔴 MANDATORY - no classic clusters
photon: true          # 🔴 MANDATORY - vectorized execution
edition: ADVANCED     # 🔴 MANDATORY - required for expectations

Silver-Domain Dependencies

Skill	Requirement	What It Provides
`silver/01-dlt-expectations-patterns`	MUST read	DQ rules table DDL, rules loader cache pattern, DLT decorators, quarantine generation, runtime rule updates
`silver/02-dqx-patterns`	Optional	Read only when user needs richer diagnostics than DLT expectations, or pre-merge validation

Core Philosophy: Schema Cloning

The Silver layer should essentially clone the source Bronze schema with minimal transformations:

Same column names as Bronze (no complex renaming)
Same data types (minimal type conversions)
Same grain (no aggregation - that's for Gold)
Add data quality rules (the main value-add)
Add derived flags (business indicators like is_return, is_out_of_stock)
Add business keys (SHA256 hashes for tracking)
Add timestamps (processed_timestamp)

What NOT to do in Silver:

No major schema restructuring
No aggregations (save for Gold)
No complex business logic (simple flags only)
No joining across tables (dimension lookups in Gold)

Why: Silver is the validated copy of source data. Gold handles complex transformations. This keeps Silver focused on data quality and makes troubleshooting easier (column names match source).

Python API: ALWAYS use `import dlt` (Legacy API)

Our DQ rules framework (dlt-expectations-patterns) is built on the legacy import dlt API. ALWAYS use this API for Silver layer creation.

# ✅ CORRECT: Legacy API (our standard)
import dlt
from dq_rules_loader import get_critical_rules_for_table

@dlt.table(name="silver_transactions", cluster_by_auto=True)
@dlt.expect_all_or_drop(get_critical_rules_for_table("silver_transactions"))
def silver_transactions():
    return dlt.read_stream(get_bronze_table("bronze_transactions"))

# ❌ WRONG: Modern SDP API (not compatible with our DQ rules framework)
from pyspark import pipelines as dp

@dp.table(name="silver_transactions")
def silver_transactions():
    return spark.readStream.table("bronze_transactions")

Why not modern dp API? The @dlt.expect_all_or_drop() and @dlt.expect_all() decorators from our dlt-expectations-patterns skill require the dlt module. When Databricks fully migrates expectations to dp, update both this skill and dlt-expectations-patterns.

File Structure

src/{project}_silver/
├── setup_dq_rules_table.py        # Databricks notebook: Create and populate DQ rules Delta table
├── dq_rules_loader.py             # Pure Python (NO notebook header): Load rules from Delta table
├── silver_dimensions.py           # DLT notebook: Dimension tables (stores, products, etc.)
├── silver_transactions.py         # DLT notebook: Fact table with quarantine
├── silver_inventory.py            # DLT notebook: Additional fact tables (if applicable)
└── data_quality_monitoring.py     # DLT notebook: DQ monitoring views

Critical Files:

dq_rules_loader.py must be pure Python (NO # Databricks notebook source header)
Run silver_dq_setup_job BEFORE deploying DLT pipeline

Working Memory Management

This orchestrator spans 6 phases (deployment and Phase 7 are user-triggered). To maintain coherence without context pollution:

After each phase, persist a brief summary note capturing:

Phase 1 output: Schema names (catalog, silver_schema), table list, DQ rules strategy decision
Phase 2 output: DQ rules table path, count of rules defined, rule severity distribution
Phase 3 output: dq_rules_loader.py path, confirmation it is pure Python (no notebook header)
Phase 4 output: DLT notebook paths per table, expectation counts, SCD handling decisions
Phase 5 output: Monitoring view paths, metric definitions
Phase 6 output: Pipeline YAML path, job YAML path, databricks.yml sync status
Phase 7 output (if user-triggered): Anomaly detection config, schema monitoring status

What to keep in working memory: Only the current phase's context, the table list from Phase 1, and the previous phase's summary note. Discard intermediate outputs (full DDL strings, DQ rule DataFrames, raw DLT notebook contents) — they are on disk and reproducible.

Critical file note: dq_rules_loader.py must be pure Python (NO # Databricks notebook source header). Carry this constraint through all phases.

Phased Implementation Workflow

Phase 1: Requirements & Schema Setup (30 min)

Pre-Condition - MUST read these skills first:

Read common/databricks-expert-agent/SKILL.md - Apply extraction principle throughout
Read common/schema-management-patterns/SKILL.md - Use for Silver schema DDL

Steps:

Fill in requirements template: assets/templates/requirements-template.md
- Map Bronze tables to Silver tables
- Define DQ rules per entity (critical vs warning)
- Identify quarantine candidates
Create Silver schema using pattern from schema-management-patterns
Verify Bronze tables exist and extract their schemas (don't hardcode column names)

Phase 2: DQ Rules Table Setup (30 min)

Pre-Condition - MUST read these skills first:

Read silver/01-dlt-expectations-patterns/SKILL.md - Use for DQ rules table DDL and population
Read common/databricks-table-properties/SKILL.md - Apply metadata table TBLPROPERTIES
Read common/unity-catalog-constraints/SKILL.md - Apply PK constraint on (table_name, rule_name)

Steps:

Create setup_dq_rules_table.py notebook
Define DQ rules table DDL (schema from dlt-expectations-patterns)
Apply TBLPROPERTIES from databricks-table-properties
Apply PK constraint: CONSTRAINT pk_dq_rules PRIMARY KEY (table_name, rule_name) NOT ENFORCED
Populate rules using the requirements from Phase 1
Verify file created: setup_dq_rules_table.py is ready for deployment (deployment is user-triggered)

Phase 3: Rules Loader Module (15 min)

Pre-Condition - MUST read these skills first:

Read common/databricks-python-imports/SKILL.md - CRITICAL: loader must be pure Python
Read silver/01-dlt-expectations-patterns/SKILL.md - Use cache pattern with toPandas()

Steps:

Create dq_rules_loader.py as pure Python file (NO notebook header!)
Implement functions: get_critical_rules_for_table(), get_warning_rules_for_table(), get_quarantine_condition()
Use module-level cache pattern with toPandas() (NOT .collect()) from dlt-expectations-patterns
Test import: from dq_rules_loader import get_critical_rules_for_table

Phase 4: DLT Notebooks - Silver Tables (1-2 hours)

Pre-Condition - MUST read these skills first:

Read common/databricks-table-properties/SKILL.md - Extract Silver TBLPROPERTIES
Read silver/01-dlt-expectations-patterns/SKILL.md - Use decorator patterns
(Optional) Read silver/02-dqx-patterns/SKILL.md - Only if user needs hybrid DQX+DLT

Steps:

Create DLT notebooks using patterns from references/silver-table-patterns.md
Include get_bronze_table() helper in every notebook (see references/)
For each table:
- Apply Silver TBLPROPERTIES from databricks-table-properties
- Apply @dlt.expect_all_or_drop(get_critical_rules_for_table(...)) decorator
- Apply @dlt.expect_all(get_warning_rules_for_table(...)) decorator
- Clone Bronze schema with minimal transformations
- Add derived flags, business keys, processed_timestamp
- Set cluster_by_auto=True (NEVER specify columns manually)
For high-volume fact tables: create quarantine table using get_quarantine_condition()

See: references/silver-table-patterns.md for complete templates

Phase 5: Monitoring Views (30 min)

No external skill dependencies - use references/monitoring-patterns.md

Steps:

Create data_quality_monitoring.py DLT notebook
Add per-table DQ metrics views (record counts, pass/fail rates)
Add referential integrity checks (orphaned records between fact and dimension)
Add data freshness monitoring

See: references/monitoring-patterns.md for complete patterns

Phase 6: Pipeline & Job Configuration (15 min)

Pre-Condition - MUST read these skills first:

Read common/databricks-asset-bundles/SKILL.md - DLT pipeline YAML, job YAML patterns

Steps:

Create resources/silver_dlt_pipeline.yml using patterns from databricks-asset-bundles
Create resources/silver_dq_setup_job.yml for the DQ rules setup job
Set DLT Direct Publishing Mode: catalog + schema fields (NOT target)
Pass configuration: catalog, bronze_schema, silver_schema
Set: serverless: true, edition: ADVANCED, photon: true

See: references/pipeline-configuration.md for Silver-specific examples

🛑 STOP — Artifact Creation Complete

Phases 1–6 are complete. All files (DQ rules table script, rules loader, DLT notebooks, monitoring views, pipeline/job YAMLs) have been created. Do NOT proceed to deployment or Phase 7 unless the user explicitly requests it.

Report what was created and ask the user if they want to deploy and run.

Deployment Order (USER-TRIGGERED ONLY — do not auto-execute):

# 1. Deploy everything
databricks bundle deploy -t dev

# 2. Run DQ rules setup FIRST (creates dq_rules table)
databricks bundle run silver_dq_setup_job -t dev

# 3. Verify rules table exists
# SELECT * FROM {catalog}.{silver_schema}.dq_rules

# 4. THEN run DLT pipeline (loads rules from table)
databricks pipelines start-update --pipeline-name "[dev] Silver Layer Pipeline"

Phase 7: Enable Anomaly Detection on Silver Schema (5 min) — USER-TRIGGERED ONLY

This phase is executed ONLY when the user explicitly requests it. Do not auto-execute.

Pre-Condition - MUST read this skill first:

Read monitoring/04-anomaly-detection/SKILL.md — Schema-level freshness/completeness monitoring

Why: Every Silver schema should have anomaly detection enabled from day one. It builds freshness and completeness ML baselines immediately, catching stale/incomplete tables before downstream consumers notice.

Steps:

Enable anomaly detection on the Silver schema (uses enable_anomaly_detection_on_schema() from the anomaly-detection skill)
Exclude metadata tables that are not data pipeline outputs (e.g., dq_rules)
Verify enablement via Catalog Explorer or SDK

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.dataquality import Monitor, AnomalyDetectionConfig

w = WorkspaceClient()

# Get Silver schema UUID
schema_info = w.schemas.get(full_name=f"{catalog}.{silver_schema}")
schema_id = schema_info.schema_id

# Enable anomaly detection (exclude metadata tables)
try:
    w.data_quality.create_monitor(
        monitor=Monitor(
            object_type="schema",
            object_id=schema_id,
            anomaly_detection_config=AnomalyDetectionConfig(
                excluded_table_full_names=[
                    f"{catalog}.{silver_schema}.dq_rules",  # Metadata, not pipeline output
                ]
            )
        )
    )
    print(f"✓ Anomaly detection enabled on {catalog}.{silver_schema}")
except Exception as e:
    if "already exists" in str(e).lower():
        print(f"✓ Already enabled (skipping)")
    else:
        print(f"⚠️ Non-blocking: {e}")

Note: This is non-blocking — if anomaly detection fails to enable (permissions, preview limitations), the Silver layer setup continues. Retry later via monitoring/04-anomaly-detection/scripts/enable_anomaly_detection.py.

Common Mistakes (Silver-Specific)

Mistake 1: Deploying DLT Before DQ Setup Job

Pipeline Error: Table or view not found: dq_rules

Fix: Run silver_dq_setup_job BEFORE deploying/running DLT pipeline.

Mistake 2: Notebook Header in Loader File

# dq_rules_loader.py
# Databricks notebook source  # <-- Makes it a notebook, breaks imports!

Fix: Remove # Databricks notebook source line. Read databricks-python-imports for correct pattern.

Mistake 3: Aggregation in Silver

# WRONG: Aggregation belongs in Gold
def silver_sales_daily():
    return dlt.read_stream(...).groupBy("store", "date").agg(sum("revenue"))

Fix: Keep Silver at transaction grain. Aggregate in Gold.

Mistake 4: Manual Clustering Columns

# WRONG
@dlt.table(cluster_by=["store_number", "transaction_date"])

Fix: Always use cluster_by_auto=True. Never specify columns.

Mistake 5: Using expect_or_fail

# WRONG: Pipeline fails on bad data
@dlt.expect_or_fail("valid_id", "id IS NOT NULL")

Fix: Use @dlt.expect_all_or_drop() for critical rules. Pipeline continues, bad records quarantined.

Mistake 6: Hardcoding Table Names

# WRONG: Hardcoded table reference
dlt.read_stream("my_catalog.bronze.transactions")

Fix: Use get_bronze_table() helper with DLT configuration. See references/silver-table-patterns.md.

Mistake 7: Schema Evolution Without Full Refresh

Error: Incompatible schema change detected on streaming table

Fix: Streaming tables require a full refresh for incompatible schema changes (adding NOT NULL columns, changing types). Trigger with: databricks pipelines start-update --pipeline-name "..." --full-refresh

Mistake 8: Missing Row Tracking (Breaks Downstream MVs)

# ❌ WRONG: Missing delta.enableRowTracking
table_properties={
    "delta.enableChangeDataFeed": "true",
    # Row tracking missing!
}

Fix: ALWAYS include "delta.enableRowTracking": "true" in Silver table properties. Without it, downstream Gold materialized views cannot use incremental refresh and will do expensive full recomputation.

Mistake 9: Using Modern `dp` API with DQ Rules Framework

# ❌ WRONG: dp API doesn't work with our dlt-expectations decorators
from pyspark import pipelines as dp
@dp.table(name="silver_transactions")
@dlt.expect_all_or_drop(get_critical_rules_for_table(...))  # Mixes APIs!

Fix: ALWAYS use import dlt for Silver notebooks that use our DQ rules framework. See "Python API" section above.

Post-Creation Validation

Before considering the Silver layer complete, verify each item and confirm its source:

Common Skill Compliance

Table properties match databricks-table-properties Silver layer spec (not generated from memory)
Python imports follow databricks-python-imports patterns (loader has NO notebook header)
Asset Bundle YAML follows databricks-asset-bundles patterns (notebook_task, base_parameters)
Schema DDL follows schema-management-patterns (IF NOT EXISTS, governance metadata)
PK constraint follows unity-catalog-constraints syntax (NOT ENFORCED keyword)
Names extracted from source files per databricks-expert-agent (not hardcoded)

Silver-Domain Skill Compliance

DQ rules table follows dlt-expectations-patterns DDL
Rules loader uses cache pattern from dlt-expectations-patterns (toPandas, not collect)
DLT decorators follow dlt-expectations-patterns (expect_all_or_drop, expect_all)
Quarantine condition uses dlt-expectations-patterns generation pattern

Silver Layer Specifics

Reference Files

Silver Table Patterns

references/silver-table-patterns.md - Complete DLT table templates: standard pattern, dimension example, fact example with quarantine, get_bronze_table() helper, derived field patterns

Monitoring Patterns

references/monitoring-patterns.md - DQ monitoring views: per-table metrics, referential integrity checks, data freshness monitoring

Pipeline Configuration

references/pipeline-configuration.md - Silver-specific DLT pipeline YAML and DQ setup job YAML examples (supplements databricks-asset-bundles)

Templates

Requirements Template

assets/templates/requirements-template.md - Fill-in-first requirements gathering: project context, Bronze-to-Silver table mapping, DQ rules per entity, quarantine strategy

Related Skills

Skill	Relationship	Path
`dlt-expectations-patterns`	Mandatory - DQ rules, loader, decorators	`silver/01-dlt-expectations-patterns/SKILL.md`
`dqx-patterns`	Optional - Advanced validation	`silver/02-dqx-patterns/SKILL.md`
`anomaly-detection`	Mandatory - Schema freshness/completeness monitoring	`monitoring/04-anomaly-detection/SKILL.md`
`databricks-expert-agent`	Mandatory - Extraction principle	`common/databricks-expert-agent/SKILL.md`
`databricks-table-properties`	Mandatory - Silver TBLPROPERTIES	`common/databricks-table-properties/SKILL.md`
`databricks-python-imports`	Mandatory - Pure Python loader	`common/databricks-python-imports/SKILL.md`
`databricks-asset-bundles`	Mandatory - Pipeline/job YAML	`common/databricks-asset-bundles/SKILL.md`
`schema-management-patterns`	Mandatory - Schema DDL	`common/schema-management-patterns/SKILL.md`
`unity-catalog-constraints`	Mandatory - PK constraint	`common/unity-catalog-constraints/SKILL.md`

Pipeline Progression

Previous stage: bronze/00-bronze-layer-setup → Bronze tables must exist before creating Silver

Next stage: After completing the Silver layer, proceed to:

gold/01-gold-layer-setup — Implement Gold layer tables, merge scripts, and FK constraints from the YAML designs created in stage 1

Post-Completion: Skill Usage Summary (MANDATORY)

After completing all phases of this orchestrator, output a Skill Usage Summary reflecting what you ACTUALLY did — not a pre-written summary.

What to Include

Every skill SKILL.md or references/ file you read (via the Read tool), in the order you read them
Which phase you were in when you read it
Whether it was a Worker, Common, Cross-domain, or Reference file
A one-line description of what you specifically used it for in this session

Format

#	Phase	Skill / Reference Read	Type	What It Was Used For
1	Phase N	`path/to/SKILL.md`	Worker / Common / Cross-domain / Reference	One-line description

Summary Footer

End with:

Totals: X worker skills, Y common skills, Z reference files read across N phases
Skipped: List any skills from the dependency table above that you did NOT need to read, and why (e.g., "phase not applicable", "user skipped", "no issues encountered")
Unplanned: List any skills you read that were NOT listed in the dependency table (e.g., for troubleshooting, edge cases, or user-requested detours)

References

Official Databricks Documentation

DLT Expectations
Portable and Reusable Expectations
Share Code Between Notebooks
Automatic Clustering
Lakeflow Declarative Pipelines (SDP) - Modern pipeline framework overview
Python API: pyspark.pipelines - Modern Python API reference (future migration target)
Row Tracking - Required for incremental MV refresh
Schema Evolution in Streaming Tables - Full refresh requirements
Pipeline Edition Comparison - ADVANCED required for expectations/CDC

silver-layer-setup

Silver Layer Setup - Orchestrator Skill

Decision Tree

Mandatory Skill Dependencies

🔴 Non-Negotiable Defaults (Applied to EVERY Silver Table and Pipeline)

Silver-Domain Dependencies

Core Philosophy: Schema Cloning

Python API: ALWAYS use import dlt (Legacy API)

File Structure

Working Memory Management

Phased Implementation Workflow

Phase 1: Requirements & Schema Setup (30 min)

Phase 2: DQ Rules Table Setup (30 min)

Phase 3: Rules Loader Module (15 min)

Phase 4: DLT Notebooks - Silver Tables (1-2 hours)

Phase 5: Monitoring Views (30 min)

Phase 6: Pipeline & Job Configuration (15 min)

🛑 STOP — Artifact Creation Complete

Phase 7: Enable Anomaly Detection on Silver Schema (5 min) — USER-TRIGGERED ONLY

Common Mistakes (Silver-Specific)

Mistake 1: Deploying DLT Before DQ Setup Job

Mistake 2: Notebook Header in Loader File

Mistake 3: Aggregation in Silver

Mistake 4: Manual Clustering Columns

Mistake 5: Using expect_or_fail

Mistake 6: Hardcoding Table Names

Mistake 7: Schema Evolution Without Full Refresh

Mistake 8: Missing Row Tracking (Breaks Downstream MVs)

Mistake 9: Using Modern dp API with DQ Rules Framework

Post-Creation Validation

Common Skill Compliance

Silver-Domain Skill Compliance

Silver Layer Specifics

Reference Files

Silver Table Patterns

Monitoring Patterns

Pipeline Configuration

Templates

Requirements Template

Related Skills

Pipeline Progression

Post-Completion: Skill Usage Summary (MANDATORY)

What to Include

Format

Summary Footer

References

Official Databricks Documentation

Python API: ALWAYS use `import dlt` (Legacy API)

Mistake 9: Using Modern `dp` API with DQ Rules Framework