databricks-bundle-deploy
Databricks Asset Bundle Deployment
Package tested code into Databricks Asset Bundles (DABs) and deploy to multiple environments (dev/staging/prod) with proper parameterization and governance.
When to Use This Skill
- Packaging tested code for deployment (after
databricks-testing) - Creating production-ready pipeline projects
- Deploying to dev/staging/prod environments
- Setting up multi-environment CI/CD
- Managing notebook deployments
- Scheduling jobs in Databricks
Core Concepts
Databricks Asset Bundles (DABs)
DABs are the standard way to package and deploy Databricks workflows:
- Infrastructure as code for Databricks
- Version control friendly (Git)
- Multi-environment support (dev/staging/prod)
- Automated validation and deployment
- Consistent project structure
Two-Phase Workflow
Phase 1: Test & Iterate (using databricks-testing skill)
- Test code on cluster via MCP
- Debug and fix errors
- Iterate until working
Phase 2: Package & Deploy (this skill)
- Create DAB project structure
- Generate databricks.yml and job definitions
- Validate bundle
- Deploy to environment
- (Optional) Run deployed job
Standard Project Structure
project_name/
├── databricks.yml # Bundle configuration (REQUIRED)
├── resources/ # Job/pipeline definitions (REQUIRED)
│ └── job.yml # Job definition
├── src/ # Source code (RECOMMENDED)
│ └── project_name/
│ └── notebooks/
│ ├── 01_data_prep.py
│ ├── 02_transform.py
│ └── 03_output.py
└── tests/ # Unit tests (OPTIONAL)
└── test_transformations.py
Key Files
databricks.yml - Bundle configuration:
- Bundle name and variables
- Environment targets (dev/staging/prod)
- References to resources
resources/*.yml - Job/pipeline definitions:
- Task configurations
- Cluster settings (use serverless)
- Schedules and triggers
- Notebook paths and parameters
Deployment Workflows
Workflow 1: Create Bundle from Scratch
Package working code into new DAB project.
Pattern:
- Create project directory structure
- Generate
databricks.ymlwith:- Bundle name
- Variables (catalog, schema, etc.)
- Targets (dev, staging, prod)
- Create job definition in
resources/job.yml - Move tested notebooks to
src/<project>/notebooks/ - Add parameterization (widgets) to notebooks
- Validate (automatic, no confirmation)
- Deploy (automatic, no confirmation)
- Ask before running (requires user confirmation)
Workflow 2: Validate and Deploy (AUTOMATIC)
After bundle creation, automatically validate and deploy.
Pattern:
# Step 1: Validate (AUTOMATIC - no confirmation needed)
databricks bundle validate -t dev
# If validation fails:
# - Show errors
# - Fix databricks.yml or resource files
# - Re-run validate
# Step 2: Deploy (AUTOMATIC - no confirmation needed)
databricks bundle deploy -t dev
# Reports:
# - Deployment success
# - Job name and ID
# - Workspace URL
IMPORTANT: These commands run automatically per CLAUDE.md rules.
Workflow 3: Run Deployed Job (REQUIRES CONFIRMATION)
Execute the deployed job.
Pattern:
# IMPORTANT: ALWAYS ask user first
# "Do you want to run the deployed job '<job_name>' now?"
# Only if user confirms:
databricks bundle run <job_name> -t dev
# Monitor and report:
# - Run URL
# - Run status (RUNNING, SUCCESS, FAILED)
# - Result state
# - Error messages if failed
IMPORTANT: Never run jobs without explicit user confirmation per CLAUDE.md rules.
Parameterization
Required Parameterization Patterns
Never hard-code values. Always use variables.
Bundle Variables (databricks.yml):
variables:
catalog:
description: "Unity Catalog name"
default: "dev_catalog"
schema:
description: "Schema name"
default: "default"
project_name:
description: "Project identifier"
Environment-Specific Values (targets):
targets:
dev:
mode: development
variables:
catalog: "dev_catalog"
schema: "dev_schema"
prod:
mode: production
variables:
catalog: "prod_catalog"
schema: "prod_schema"
Built-in Variables:
${var.catalog}- User-defined variable${bundle.target}- Current environment (dev/staging/prod)${workspace.current_user.userName}- Current user email${workspace.file_path}- Workspace file path
Notebook Widget Parameterization
All notebooks must use widgets with defaults:
# REQUIRED pattern for all notebook parameters
try:
catalog = dbutils.widgets.get("catalog")
except:
catalog = "dev_catalog"
try:
schema = dbutils.widgets.get("schema")
except:
schema = "default"
try:
batch_date = dbutils.widgets.get("batch_date")
except:
from datetime import date
batch_date = str(date.today())
Why try/except:
- Allows local testing without widgets
- Provides sensible defaults
- Prevents errors in interactive mode
Serverless Compute Guidelines
DO:
- Rely on serverless compute (no
new_clusterin tasks) - Use
%pip installfor Python dependencies - Keep tasks small and focused
- Use Delta Lake for data persistence
DON'T:
- Define
new_clusterin task configuration - Install libraries via cluster init scripts
- Run long operations without checkpoints
- Use non-Delta formats for production data
Example Task Configuration:
tasks:
- task_key: data_prep
notebook_task:
notebook_path: ../src/project/notebooks/01_prep.py
base_parameters:
catalog: ${var.catalog}
# NO new_cluster here - uses serverless by default
Path Resolution Rules
CRITICAL: Paths in resources/*.yml resolve relative to the resource file.
project/
├── databricks.yml
├── resources/
│ └── job.yml # Paths resolve from HERE
└── src/
└── notebooks/
└── notebook.py
In resources/job.yml:
notebook_path: ../src/notebooks/notebook.py # Relative to resources/
Not:
notebook_path: src/notebooks/notebook.py # Wrong - from project root
Complete Bundle Examples
Example 1: Simple Data Pipeline Bundle
databricks.yml:
bundle:
name: ${var.project_name}
variables:
project_name:
description: "Project identifier"
default: "my_pipeline"
catalog:
description: "Unity Catalog name"
default: "dev_catalog"
schema:
description: "Schema name"
default: "pipeline_data"
targets:
dev:
mode: development
workspace:
host: ${DATABRICKS_HOST}
variables:
catalog: "dev_catalog"
prod:
mode: production
workspace:
host: ${DATABRICKS_HOST}
variables:
catalog: "prod_catalog"
resources:
jobs:
my_pipeline_job:
name: ${var.project_name}_job_${bundle.target}
tasks:
- task_key: data_ingestion
notebook_task:
notebook_path: ../src/${var.project_name}/notebooks/01_ingest.py
base_parameters:
catalog: ${var.catalog}
schema: ${var.schema}
- task_key: data_transformation
depends_on:
- task_key: data_ingestion
notebook_task:
notebook_path: ../src/${var.project_name}/notebooks/02_transform.py
base_parameters:
catalog: ${var.catalog}
schema: ${var.schema}
- task_key: data_output
depends_on:
- task_key: data_transformation
notebook_task:
notebook_path: ../src/${var.project_name}/notebooks/03_output.py
base_parameters:
catalog: ${var.catalog}
schema: ${var.schema}
# Optional: Schedule
schedule:
quartz_cron_expression: "0 0 2 * * ?" # Daily at 2am
timezone_id: "UTC"
# Optional: Email notifications
email_notifications:
on_failure:
- ${workspace.current_user.userName}
resources/job.yml:
# Alternative: Define job in separate file
# Reference from databricks.yml with: resources: {jobs: job.yml}
resources:
jobs:
my_pipeline_job:
# Same content as above
Example 2: ML Training Pipeline Bundle
databricks.yml:
bundle:
name: ml_training_pipeline
variables:
catalog:
description: "Unity Catalog for ML assets"
default: "ml_dev"
schema:
description: "Schema for models and features"
default: "churn_model"
experiment_name:
description: "MLflow experiment path"
targets:
dev:
mode: development
variables:
catalog: "ml_dev"
experiment_name: "/Users/${workspace.current_user.userName}/experiments/churn_dev"
prod:
mode: production
variables:
catalog: "ml_prod"
experiment_name: "/Shared/experiments/churn_prod"
resources:
jobs:
ml_training_job:
name: ml_training_${bundle.target}
tasks:
- task_key: data_preparation
notebook_task:
notebook_path: ../src/ml_training/notebooks/01_data_prep.py
base_parameters:
catalog: ${var.catalog}
schema: ${var.schema}
- task_key: feature_engineering
depends_on:
- task_key: data_preparation
notebook_task:
notebook_path: ../src/ml_training/notebooks/02_features.py
base_parameters:
catalog: ${var.catalog}
schema: ${var.schema}
- task_key: model_training
depends_on:
- task_key: feature_engineering
notebook_task:
notebook_path: ../src/ml_training/notebooks/03_training.py
base_parameters:
catalog: ${var.catalog}
schema: ${var.schema}
experiment_name: ${var.experiment_name}
- task_key: model_registration
depends_on:
- task_key: model_training
notebook_task:
notebook_path: ../src/ml_training/notebooks/04_register.py
base_parameters:
catalog: ${var.catalog}
schema: ${var.schema}
Example 3: Medallion Architecture Bundle
databricks.yml:
bundle:
name: medallion_pipeline
variables:
catalog:
description: "Unity Catalog name"
default: "de_dev"
targets:
dev:
mode: development
variables:
catalog: "de_dev"
prod:
mode: production
variables:
catalog: "de_prod"
resources:
jobs:
medallion_job:
name: medallion_pipeline_${bundle.target}
tasks:
# Bronze layer - raw ingestion
- task_key: bronze_ingestion
notebook_task:
notebook_path: ../src/medallion/notebooks/bronze_ingest.py
base_parameters:
catalog: ${var.catalog}
bronze_schema: "bronze"
# Silver layer - cleaned/validated
- task_key: silver_transformation
depends_on:
- task_key: bronze_ingestion
notebook_task:
notebook_path: ../src/medallion/notebooks/silver_transform.py
base_parameters:
catalog: ${var.catalog}
bronze_schema: "bronze"
silver_schema: "silver"
# Gold layer - business aggregates
- task_key: gold_aggregation
depends_on:
- task_key: silver_transformation
notebook_task:
notebook_path: ../src/medallion/notebooks/gold_aggregate.py
base_parameters:
catalog: ${var.catalog}
silver_schema: "silver"
gold_schema: "gold"
schedule:
quartz_cron_expression: "0 0 * * * ?" # Hourly
timezone_id: "UTC"
Notebook Parameter Example
Parameterized notebook (01_data_prep.py):
# Databricks notebook source
# MAGIC %md
# MAGIC # Data Preparation
# MAGIC
# MAGIC Loads and prepares data for transformation
# COMMAND ----------
# Widget parameterization with defaults
try:
catalog = dbutils.widgets.get("catalog")
except:
catalog = "dev_catalog"
try:
schema = dbutils.widgets.get("schema")
except:
schema = "pipeline_data"
try:
batch_date = dbutils.widgets.get("batch_date")
except:
from datetime import date
batch_date = str(date.today())
print(f"Running with parameters:")
print(f" Catalog: {catalog}")
print(f" Schema: {schema}")
print(f" Batch Date: {batch_date}")
# COMMAND ----------
from pyspark.sql import functions as F
# Load data using parameterized catalog/schema
df = spark.table(f"{catalog}.{schema}.source_data")
# Filter by batch date
df_filtered = df.filter(F.col("date") == batch_date)
print(f"Loaded {df_filtered.count()} records for {batch_date}")
# COMMAND ----------
# Save prepared data
output_table = f"{catalog}.{schema}.prepared_data"
df_filtered.write \
.format("delta") \
.mode("overwrite") \
.option("overwriteSchema", "true") \
.saveAsTable(output_table)
print(f"Saved to {output_table}")
Deployment Commands
Validate Bundle
# Validate bundle structure and configuration
databricks bundle validate -t dev
# Common validation errors:
# - Invalid YAML syntax
# - Missing required fields
# - Invalid notebook paths
# - Undefined variables
Deploy Bundle
# Deploy to development environment
databricks bundle deploy -t dev
# Deploy to production environment
databricks bundle deploy -t prod
# What happens:
# - Bundle uploaded to workspace
# - Jobs created/updated
# - Notebooks synced
# - Resources configured
Run Deployed Job
# IMPORTANT: Ask user first!
# "Do you want to run the job now?"
# If confirmed:
databricks bundle run my_job -t dev
# Monitor output for:
# - Run URL (for tracking)
# - Run status
# - Error messages
Error Handling
Validation Errors
Error: Invalid notebook path: src/notebooks/01_prep.py
Cause: Path doesn't account for relative resolution
Fix: Use ../src/notebooks/01_prep.py (relative to resources/)
Error: Variable 'catalog' is not defined
Cause: Used ${var.catalog} without defining in variables section
Fix: Add to databricks.yml:
variables:
catalog:
description: "Unity Catalog name"
Error: YAML syntax error at line 15
Cause: Invalid YAML (indentation, missing quotes, etc.)
Fix: Check YAML syntax, ensure consistent indentation (2 spaces)
Deployment Errors
Error: Permission denied: cannot create job
Cause: Insufficient workspace permissions
Fix: Check user has job creation permissions in workspace
Error: Notebook not found: /Workspace/...
Cause: Notebook doesn't exist at specified path
Fix: Verify notebook was created in src/ directory, check path in job definition
Integration with Other Skills
Receives From
databricks-testing- Tested, working codedatabricks-unity-catalog- Schema and table names to use
Used By
databricks-ml-pipeline- Packages ML training pipelinesdatabricks-data-engineering- Packages data pipelines
Best Practices
1. Always Parameterize
- Never hard-code catalog/schema names
- Use variables for environment-specific values
- Use widgets in notebooks with try/except defaults
2. Use Serverless Compute
- Don't define new_cluster
- Rely on Databricks serverless
- Faster startup, better cost optimization
3. Validate Before Deploy
- Always run
databricks bundle validatefirst - Fix all validation errors
- Then deploy
4. Use Meaningful Names
- Job names:
project_name_job_${bundle.target} - Task keys: Descriptive (data_prep, model_training)
- Clear variable names
5. Document with Comments
- Add descriptions to all variables
- Comment complex job configurations
- Include README in project
6. Multi-Environment from Day 1
- Define dev, staging, prod targets upfront
- Use same bundle for all environments
- Only variables differ per environment
Security Reminders
- Never embed tokens or secrets in databricks.yml
- Use environment variables for credentials
- Set proper job permissions
- Use service principals for production
Summary
This skill packages and deploys Databricks Asset Bundles:
- Create: Generate project structure, databricks.yml, job definitions
- Parameterize: Variables for catalogs, schemas, environments
- Validate: Automatic validation (no confirmation)
- Deploy: Automatic deployment (no confirmation)
- Run: Manual job execution (requires user confirmation)
- Multi-environment: Support dev/staging/prod with same bundle
Use this skill after testing code with databricks-testing to deploy production-ready pipelines.
More from databricks-solutions/databricks-exec-code-mcp
databricks-data-engineering
Production data engineering pipelines following medallion architecture (Bronze/Silver/Gold layers) with data ingestion, transformation, quality checks, Delta Lake optimization, and orchestration. Use when building ETL pipelines, medallion architecture, data lakes, or data transformation workflows.
2databricks-unity-catalog
Manage Unity Catalog resources including catalogs, schemas, and tables. Handles discovery, creation, updates, and deletions with proper naming conventions and governance. Use when exploring catalogs, creating schemas, managing tables, or setting up data governance.
1databricks-ml-pipeline
End-to-end machine learning pipelines on Databricks including data exploration, feature engineering, model training with hyperparameter optimization, MLflow experiment tracking, model registration to Unity Catalog, and deployment as DABs. Use when building ML workflows, training models, or deploying ML pipelines.
1databricks-testing
Execute code on Databricks clusters using MCP Command Execution API. Supports stateless quick validation and stateful iterative development. Use when testing Python/SQL code on clusters, debugging pipelines, or validating transformations.
1