Databricks Asset Bundle Deployment

Package tested code into Databricks Asset Bundles (DABs) and deploy to multiple environments (dev/staging/prod) with proper parameterization and governance.

When to Use This Skill

Packaging tested code for deployment (after databricks-testing)
Creating production-ready pipeline projects
Deploying to dev/staging/prod environments
Setting up multi-environment CI/CD
Managing notebook deployments
Scheduling jobs in Databricks

Core Concepts

Databricks Asset Bundles (DABs)

DABs are the standard way to package and deploy Databricks workflows:

Infrastructure as code for Databricks
Version control friendly (Git)
Multi-environment support (dev/staging/prod)
Automated validation and deployment
Consistent project structure

Two-Phase Workflow

Phase 1: Test & Iterate (using databricks-testing skill)

Test code on cluster via MCP
Debug and fix errors
Iterate until working

Phase 2: Package & Deploy (this skill)

Create DAB project structure
Generate databricks.yml and job definitions
Validate bundle
Deploy to environment
(Optional) Run deployed job

Standard Project Structure

project_name/
├── databricks.yml              # Bundle configuration (REQUIRED)
├── resources/                  # Job/pipeline definitions (REQUIRED)
│   └── job.yml                # Job definition
├── src/                       # Source code (RECOMMENDED)
│   └── project_name/
│       └── notebooks/
│           ├── 01_data_prep.py
│           ├── 02_transform.py
│           └── 03_output.py
└── tests/                     # Unit tests (OPTIONAL)
    └── test_transformations.py

Key Files

databricks.yml - Bundle configuration:

Bundle name and variables
Environment targets (dev/staging/prod)
References to resources

resources/*.yml - Job/pipeline definitions:

Task configurations
Cluster settings (use serverless)
Schedules and triggers
Notebook paths and parameters

Deployment Workflows

Workflow 1: Create Bundle from Scratch

Package working code into new DAB project.

Pattern:

Create project directory structure
Generate databricks.yml with:
- Bundle name
- Variables (catalog, schema, etc.)
- Targets (dev, staging, prod)
Create job definition in resources/job.yml
Move tested notebooks to src/<project>/notebooks/
Add parameterization (widgets) to notebooks
Validate (automatic, no confirmation)
Deploy (automatic, no confirmation)
Ask before running (requires user confirmation)

Workflow 2: Validate and Deploy (AUTOMATIC)

After bundle creation, automatically validate and deploy.

Pattern:

# Step 1: Validate (AUTOMATIC - no confirmation needed)
databricks bundle validate -t dev

# If validation fails:
# - Show errors
# - Fix databricks.yml or resource files
# - Re-run validate

# Step 2: Deploy (AUTOMATIC - no confirmation needed)
databricks bundle deploy -t dev

# Reports:
# - Deployment success
# - Job name and ID
# - Workspace URL

IMPORTANT: These commands run automatically per CLAUDE.md rules.

Workflow 3: Run Deployed Job (REQUIRES CONFIRMATION)

Execute the deployed job.

Pattern:

# IMPORTANT: ALWAYS ask user first
# "Do you want to run the deployed job '<job_name>' now?"

# Only if user confirms:
databricks bundle run <job_name> -t dev

# Monitor and report:
# - Run URL
# - Run status (RUNNING, SUCCESS, FAILED)
# - Result state
# - Error messages if failed

IMPORTANT: Never run jobs without explicit user confirmation per CLAUDE.md rules.

Parameterization

Required Parameterization Patterns

Never hard-code values. Always use variables.

Bundle Variables (databricks.yml):

variables:
  catalog:
    description: "Unity Catalog name"
    default: "dev_catalog"

  schema:
    description: "Schema name"
    default: "default"

  project_name:
    description: "Project identifier"

Environment-Specific Values (targets):

targets:
  dev:
    mode: development
    variables:
      catalog: "dev_catalog"
      schema: "dev_schema"

  prod:
    mode: production
    variables:
      catalog: "prod_catalog"
      schema: "prod_schema"

Built-in Variables:

${var.catalog} - User-defined variable
${bundle.target} - Current environment (dev/staging/prod)
${workspace.current_user.userName} - Current user email
${workspace.file_path} - Workspace file path

Notebook Widget Parameterization

All notebooks must use widgets with defaults:

# REQUIRED pattern for all notebook parameters
try:
    catalog = dbutils.widgets.get("catalog")
except:
    catalog = "dev_catalog"

try:
    schema = dbutils.widgets.get("schema")
except:
    schema = "default"

try:
    batch_date = dbutils.widgets.get("batch_date")
except:
    from datetime import date
    batch_date = str(date.today())

Why try/except:

Allows local testing without widgets
Provides sensible defaults
Prevents errors in interactive mode

Serverless Compute Guidelines

DO:

Rely on serverless compute (no new_cluster in tasks)
Use %pip install for Python dependencies
Keep tasks small and focused
Use Delta Lake for data persistence

DON'T:

Define new_cluster in task configuration
Install libraries via cluster init scripts
Run long operations without checkpoints
Use non-Delta formats for production data

Example Task Configuration:

tasks:
  - task_key: data_prep
    notebook_task:
      notebook_path: ../src/project/notebooks/01_prep.py
      base_parameters:
        catalog: ${var.catalog}
    # NO new_cluster here - uses serverless by default

Path Resolution Rules

CRITICAL: Paths in resources/*.yml resolve relative to the resource file.

project/
├── databricks.yml
├── resources/
│   └── job.yml          # Paths resolve from HERE
└── src/
    └── notebooks/
        └── notebook.py

In resources/job.yml:

notebook_path: ../src/notebooks/notebook.py  # Relative to resources/

Not:

notebook_path: src/notebooks/notebook.py  # Wrong - from project root

Complete Bundle Examples

Example 1: Simple Data Pipeline Bundle

databricks.yml:

bundle:
  name: ${var.project_name}

variables:
  project_name:
    description: "Project identifier"
    default: "my_pipeline"

  catalog:
    description: "Unity Catalog name"
    default: "dev_catalog"

  schema:
    description: "Schema name"
    default: "pipeline_data"

targets:
  dev:
    mode: development
    workspace:
      host: ${DATABRICKS_HOST}
    variables:
      catalog: "dev_catalog"

  prod:
    mode: production
    workspace:
      host: ${DATABRICKS_HOST}
    variables:
      catalog: "prod_catalog"

resources:
  jobs:
    my_pipeline_job:
      name: ${var.project_name}_job_${bundle.target}

      tasks:
        - task_key: data_ingestion
          notebook_task:
            notebook_path: ../src/${var.project_name}/notebooks/01_ingest.py
            base_parameters:
              catalog: ${var.catalog}
              schema: ${var.schema}

        - task_key: data_transformation
          depends_on:
            - task_key: data_ingestion
          notebook_task:
            notebook_path: ../src/${var.project_name}/notebooks/02_transform.py
            base_parameters:
              catalog: ${var.catalog}
              schema: ${var.schema}

        - task_key: data_output
          depends_on:
            - task_key: data_transformation
          notebook_task:
            notebook_path: ../src/${var.project_name}/notebooks/03_output.py
            base_parameters:
              catalog: ${var.catalog}
              schema: ${var.schema}

      # Optional: Schedule
      schedule:
        quartz_cron_expression: "0 0 2 * * ?"  # Daily at 2am
        timezone_id: "UTC"

      # Optional: Email notifications
      email_notifications:
        on_failure:
          - ${workspace.current_user.userName}

resources/job.yml:

# Alternative: Define job in separate file
# Reference from databricks.yml with: resources: {jobs: job.yml}

resources:
  jobs:
    my_pipeline_job:
      # Same content as above

Example 2: ML Training Pipeline Bundle

databricks.yml:

bundle:
  name: ml_training_pipeline

variables:
  catalog:
    description: "Unity Catalog for ML assets"
    default: "ml_dev"

  schema:
    description: "Schema for models and features"
    default: "churn_model"

  experiment_name:
    description: "MLflow experiment path"

targets:
  dev:
    mode: development
    variables:
      catalog: "ml_dev"
      experiment_name: "/Users/${workspace.current_user.userName}/experiments/churn_dev"

  prod:
    mode: production
    variables:
      catalog: "ml_prod"
      experiment_name: "/Shared/experiments/churn_prod"

resources:
  jobs:
    ml_training_job:
      name: ml_training_${bundle.target}

      tasks:
        - task_key: data_preparation
          notebook_task:
            notebook_path: ../src/ml_training/notebooks/01_data_prep.py
            base_parameters:
              catalog: ${var.catalog}
              schema: ${var.schema}

        - task_key: feature_engineering
          depends_on:
            - task_key: data_preparation
          notebook_task:
            notebook_path: ../src/ml_training/notebooks/02_features.py
            base_parameters:
              catalog: ${var.catalog}
              schema: ${var.schema}

        - task_key: model_training
          depends_on:
            - task_key: feature_engineering
          notebook_task:
            notebook_path: ../src/ml_training/notebooks/03_training.py
            base_parameters:
              catalog: ${var.catalog}
              schema: ${var.schema}
              experiment_name: ${var.experiment_name}

        - task_key: model_registration
          depends_on:
            - task_key: model_training
          notebook_task:
            notebook_path: ../src/ml_training/notebooks/04_register.py
            base_parameters:
              catalog: ${var.catalog}
              schema: ${var.schema}

Example 3: Medallion Architecture Bundle

databricks.yml:

bundle:
  name: medallion_pipeline

variables:
  catalog:
    description: "Unity Catalog name"
    default: "de_dev"

targets:
  dev:
    mode: development
    variables:
      catalog: "de_dev"

  prod:
    mode: production
    variables:
      catalog: "de_prod"

resources:
  jobs:
    medallion_job:
      name: medallion_pipeline_${bundle.target}

      tasks:
        # Bronze layer - raw ingestion
        - task_key: bronze_ingestion
          notebook_task:
            notebook_path: ../src/medallion/notebooks/bronze_ingest.py
            base_parameters:
              catalog: ${var.catalog}
              bronze_schema: "bronze"

        # Silver layer - cleaned/validated
        - task_key: silver_transformation
          depends_on:
            - task_key: bronze_ingestion
          notebook_task:
            notebook_path: ../src/medallion/notebooks/silver_transform.py
            base_parameters:
              catalog: ${var.catalog}
              bronze_schema: "bronze"
              silver_schema: "silver"

        # Gold layer - business aggregates
        - task_key: gold_aggregation
          depends_on:
            - task_key: silver_transformation
          notebook_task:
            notebook_path: ../src/medallion/notebooks/gold_aggregate.py
            base_parameters:
              catalog: ${var.catalog}
              silver_schema: "silver"
              gold_schema: "gold"

      schedule:
        quartz_cron_expression: "0 0 * * * ?"  # Hourly
        timezone_id: "UTC"

Notebook Parameter Example

Parameterized notebook (01_data_prep.py):

# Databricks notebook source
# MAGIC %md
# MAGIC # Data Preparation
# MAGIC
# MAGIC Loads and prepares data for transformation

# COMMAND ----------
# Widget parameterization with defaults
try:
    catalog = dbutils.widgets.get("catalog")
except:
    catalog = "dev_catalog"

try:
    schema = dbutils.widgets.get("schema")
except:
    schema = "pipeline_data"

try:
    batch_date = dbutils.widgets.get("batch_date")
except:
    from datetime import date
    batch_date = str(date.today())

print(f"Running with parameters:")
print(f"  Catalog: {catalog}")
print(f"  Schema: {schema}")
print(f"  Batch Date: {batch_date}")

# COMMAND ----------
from pyspark.sql import functions as F

# Load data using parameterized catalog/schema
df = spark.table(f"{catalog}.{schema}.source_data")

# Filter by batch date
df_filtered = df.filter(F.col("date") == batch_date)

print(f"Loaded {df_filtered.count()} records for {batch_date}")

# COMMAND ----------
# Save prepared data
output_table = f"{catalog}.{schema}.prepared_data"

df_filtered.write \
    .format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .saveAsTable(output_table)

print(f"Saved to {output_table}")

Deployment Commands

Validate Bundle

# Validate bundle structure and configuration
databricks bundle validate -t dev

# Common validation errors:
# - Invalid YAML syntax
# - Missing required fields
# - Invalid notebook paths
# - Undefined variables

Deploy Bundle

# Deploy to development environment
databricks bundle deploy -t dev

# Deploy to production environment
databricks bundle deploy -t prod

# What happens:
# - Bundle uploaded to workspace
# - Jobs created/updated
# - Notebooks synced
# - Resources configured

Run Deployed Job

# IMPORTANT: Ask user first!
# "Do you want to run the job now?"

# If confirmed:
databricks bundle run my_job -t dev

# Monitor output for:
# - Run URL (for tracking)
# - Run status
# - Error messages

Error Handling

Validation Errors

Error: Invalid notebook path: src/notebooks/01_prep.py

Cause: Path doesn't account for relative resolution

Fix: Use ../src/notebooks/01_prep.py (relative to resources/)

Error: Variable 'catalog' is not defined

Cause: Used ${var.catalog} without defining in variables section

Fix: Add to databricks.yml:

variables:
  catalog:
    description: "Unity Catalog name"

Error: YAML syntax error at line 15

Cause: Invalid YAML (indentation, missing quotes, etc.)

Fix: Check YAML syntax, ensure consistent indentation (2 spaces)

Deployment Errors

Error: Permission denied: cannot create job

Cause: Insufficient workspace permissions

Fix: Check user has job creation permissions in workspace

Error: Notebook not found: /Workspace/...

Cause: Notebook doesn't exist at specified path

Fix: Verify notebook was created in src/ directory, check path in job definition

Integration with Other Skills

Receives From

databricks-testing - Tested, working code
databricks-unity-catalog - Schema and table names to use

Used By

databricks-ml-pipeline - Packages ML training pipelines
databricks-data-engineering - Packages data pipelines

Best Practices

1. Always Parameterize

Never hard-code catalog/schema names
Use variables for environment-specific values
Use widgets in notebooks with try/except defaults

2. Use Serverless Compute

Don't define new_cluster
Rely on Databricks serverless
Faster startup, better cost optimization

3. Validate Before Deploy

Always run databricks bundle validate first
Fix all validation errors
Then deploy

4. Use Meaningful Names

Job names: project_name_job_${bundle.target}
Task keys: Descriptive (data_prep, model_training)
Clear variable names

5. Document with Comments

Add descriptions to all variables
Comment complex job configurations
Include README in project

6. Multi-Environment from Day 1

Define dev, staging, prod targets upfront
Use same bundle for all environments
Only variables differ per environment

Security Reminders

Never embed tokens or secrets in databricks.yml
Use environment variables for credentials
Set proper job permissions
Use service principals for production

Summary

This skill packages and deploys Databricks Asset Bundles:

Create: Generate project structure, databricks.yml, job definitions
Parameterize: Variables for catalogs, schemas, environments
Validate: Automatic validation (no confirmation)
Deploy: Automatic deployment (no confirmation)
Run: Manual job execution (requires user confirmation)
Multi-environment: Support dev/staging/prod with same bundle

Use this skill after testing code with databricks-testing to deploy production-ready pipelines.

databricks-bundle-deploy

Databricks Asset Bundle Deployment

When to Use This Skill

Core Concepts

Databricks Asset Bundles (DABs)

Two-Phase Workflow

Standard Project Structure

Key Files

Deployment Workflows

Workflow 1: Create Bundle from Scratch

Workflow 2: Validate and Deploy (AUTOMATIC)

Workflow 3: Run Deployed Job (REQUIRES CONFIRMATION)

Parameterization

Required Parameterization Patterns

Notebook Widget Parameterization

Serverless Compute Guidelines

Path Resolution Rules

Complete Bundle Examples

Example 1: Simple Data Pipeline Bundle

Example 2: ML Training Pipeline Bundle

Example 3: Medallion Architecture Bundle

Notebook Parameter Example

Deployment Commands

Validate Bundle

Deploy Bundle

Run Deployed Job

Error Handling

Validation Errors

Deployment Errors

Integration with Other Skills

Receives From

Used By

Best Practices

1. Always Parameterize

2. Use Serverless Compute

3. Validate Before Deploy

4. Use Meaningful Names

5. Document with Comments

6. Multi-Environment from Day 1

Security Reminders

Summary

More from databricks-solutions/databricks-exec-code-mcp

databricks-data-engineering

databricks-unity-catalog

databricks-ml-pipeline

databricks-testing