databricks-python-imports
Databricks Python Imports and Code Sharing
Core Principle: Pure Python Files for Importable Code
Key Rule: To share code between Databricks notebooks using standard Python imports, the shared code must be a pure Python file (.py), not a Databricks notebook.
Reference: Share code between Databricks notebooks
⚠️ CRITICAL: Asset Bundle Path Setup
When deploying notebooks via Databricks Asset Bundles, you MUST add a sys.path setup block to enable imports from other folders. Without this, you'll get ModuleNotFoundError: No module named 'src'.
Required Path Setup Pattern
Add this block immediately after # Databricks notebook source:
# Databricks notebook source
# ===========================================================================
# PATH SETUP FOR ASSET BUNDLE IMPORTS
# ===========================================================================
# This enables imports from src.ml.config and src.ml.utils when deployed
# via Databricks Asset Bundles. The bundle root is computed dynamically.
# Reference: https://docs.databricks.com/aws/en/notebooks/share-code
import sys
import os
try:
# Get current notebook path and compute bundle root
_notebook_path = dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
_bundle_root = "/Workspace" + str(_notebook_path).rsplit('/src/', 1)[0]
if _bundle_root not in sys.path:
sys.path.insert(0, _bundle_root)
print(f"✓ Added bundle root to sys.path: {_bundle_root}")
except Exception as e:
print(f"⚠ Path setup skipped (local execution): {e}")
# ===========================================================================
"""
Your notebook docstring here...
"""
# COMMAND ----------
# Now imports work!
from src.ml.config.feature_registry import FeatureRegistry
from src.ml.utils.training_base import setup_training_environment
Why This Is Needed
- Asset Bundles deploy to
/Workspace/.bundle/<target>/files/ - The Python path doesn't include the bundle root by default
- This setup dynamically computes the bundle root from the notebook path
Script to Add Path Setup
Use scripts/add_path_setup_to_notebooks.py to batch-add this setup to all notebooks:
python3 scripts/add_path_setup_to_notebooks.py
File Type Identification
Pure Python File (✅ Importable)
"""
Module documentation
This file can be imported using standard Python imports.
"""
from databricks.sdk import WorkspaceClient
import pyspark.sql.types as T
def get_configuration():
"""Shared function"""
return {...}
Characteristics:
- ✅ No special Databricks headers
- ✅ Standard Python module structure
- ✅ Can be imported with
from module import function - ✅ Works after
dbutils.library.restartPython()
Databricks Notebook (❌ Not Importable)
# Databricks notebook source
"""
Module documentation
This file CANNOT be imported using standard Python imports.
"""
from databricks.sdk import WorkspaceClient
import pyspark.sql.types as T
def get_configuration():
"""Shared function"""
return {...}
Characteristics:
- ❌ Has
# Databricks notebook sourceheader - ❌ Cannot be imported after
restartPython() - ❌ Must use
%runmagic command (doesn't persist after restart) - ✅ Can be executed as a job/task
Pattern Recognition
When You See Import Errors After restartPython()
# Notebook with restartPython()
%pip install --upgrade "databricks-sdk>=0.28.0" --quiet
dbutils.library.restartPython()
# Databricks notebook source
from monitor_configs import get_all_monitor_configs # ❌ ModuleNotFoundError
# This fails if monitor_configs.py is a Databricks notebook!
Checklist:
- ✅ Check if the module file has
# Databricks notebook sourceheader - ✅ If present, remove it to convert to pure Python file
- ✅ Test import - should work with standard Python import
- ❌ Don't create complex workarounds (code duplication, sys.path manipulation)
Conversion Pattern
Converting Databricks Notebook to Pure Python File
BEFORE (Notebook - Not Importable):
# Databricks notebook source
"""
Centralized Monitor Configuration
"""
from databricks.sdk.service.catalog import MonitorTimeSeries
def get_all_configs():
return [...]
AFTER (Pure Python - Importable):
"""
Centralized Monitor Configuration
"""
from databricks.sdk.service.catalog import MonitorTimeSeries
def get_all_configs():
return [...]
Change Required: Remove line 1: # Databricks notebook source
Import Patterns
✅ CORRECT: Standard Python Import
# notebook.py (Databricks notebook)
%pip install --upgrade "databricks-sdk>=0.28.0" --quiet
# Databricks notebook source
dbutils.library.restartPython()
# Databricks notebook source
# ✅ Works if config_module.py is a pure Python file
from config_module import get_configuration
from databricks.sdk import WorkspaceClient
...
def main():
config = get_configuration() # ✅ Available
...
Requirements:
config_module.pymust be a pure Python file (no notebook header)- Place import after
restartPython()block - Use standard Python import syntax
❌ WRONG: Complex Workarounds
# ❌ DON'T: Use %run (doesn't work after restartPython() in Asset Bundles)
%run ./config_module
# ❌ DON'T: Manipulate sys.path
import sys
sys.path.insert(0, "/some/path")
# ❌ DON'T: Duplicate code
def get_configuration(): # Duplicated from another file
return {...}
# ❌ DON'T: Use exec() or eval()
exec(open("config_module.py").read())
Why These Fail:
%rundoesn't persist afterrestartPython()in deployed .py filessys.pathmanipulation doesn't help if file is a notebook- Code duplication creates maintenance burden
exec()is a security risk and hard to debug
Use Cases
Shared Configuration Modules
Pattern: Configuration loaded in multiple notebooks/jobs
# monitor_configs.py (pure Python file)
"""
Centralized monitor configurations for all monitoring jobs.
"""
from databricks.sdk.service.catalog import MonitorTimeSeries
def get_all_monitor_configs(catalog: str, schema: str):
"""Returns list of monitor configurations with custom metrics."""
return [
{
"table_name": f"{catalog}.{schema}.fact_sales",
"custom_metrics": _get_sales_metrics(),
...
}
]
def _get_sales_metrics():
"""99 custom metrics for sales monitoring."""
return [...]
Usage in Multiple Notebooks:
# setup_monitors.py
from monitor_configs import get_all_monitor_configs
configs = get_all_monitor_configs(catalog, schema)
workspace_client.quality_monitors.create(**configs[0])
# update_monitors.py
from monitor_configs import get_all_monitor_configs
configs = get_all_monitor_configs(catalog, schema)
workspace_client.quality_monitors.update(**configs[0])
Shared Utility Functions
Pattern: Utility functions used across layers
# data_quality_rules.py (pure Python file)
"""
Centralized data quality rules for all DLT tables.
"""
def get_critical_rules_for_table(table_name: str):
"""Returns critical DQ rules that will drop records."""
return {...}
def get_warning_rules_for_table(table_name: str):
"""Returns warning DQ rules that will log but pass."""
return {...}
Usage in DLT Notebooks:
# silver_transactions.py
import dlt
from data_quality_rules import get_critical_rules_for_table
@dlt.table(...)
@dlt.expect_all_or_fail(get_critical_rules_for_table("silver_transactions"))
def silver_transactions():
return dlt.read_stream("bronze_transactions")
Shared Helper Functions
# helpers.py (pure Python file)
"""
Common helper functions for data transformations.
"""
from pyspark.sql import DataFrame
from pyspark.sql.functions import col, sha2, concat_ws
def generate_surrogate_key(df: DataFrame, key_columns: list) -> DataFrame:
"""Generates MD5 surrogate key from specified columns."""
return df.withColumn(
"surrogate_key",
sha2(concat_ws("||", *[col(c) for c in key_columns]), 256)
)
When Each Approach Is Appropriate
Use Pure Python File When:
- ✅ Code needs to be imported in multiple notebooks
- ✅ Configuration shared across create/update operations
- ✅ Utility functions used across layers (Bronze/Silver/Gold)
- ✅ Need code after
restartPython()(SDK upgrades) - ✅ Want standard Python import semantics
Use Databricks Notebook When:
- ✅ Executable job/task (not shared code)
- ✅ Interactive development and testing
- ✅ Running as workflow step
- ✅ Not imported by other notebooks
- ✅ Need Databricks magic commands (
%run,%sql, etc.)
Use %run When:
- ✅ Before
restartPython()only - ✅ One-time code execution in interactive notebooks
- ❌ Not after
restartPython()in Asset Bundles - ❌ Not for shared code that needs to persist
Common Mistakes
❌ Mistake 1: Notebook Header in Shared Code
# config.py
# Databricks notebook source # ❌ Makes it a notebook!
def get_config():
return {...}
Fix: Remove the notebook header
# config.py
def get_config():
return {...}
❌ Mistake 2: Trying to Import Notebook
# job.py
%pip install --upgrade "databricks-sdk>=0.28.0" --quiet
dbutils.library.restartPython()
from config import get_config # ❌ Fails if config.py is notebook
Error: ModuleNotFoundError: No module named 'config'
Fix: Convert config.py to pure Python file (remove notebook header)
❌ Mistake 3: Using %run After restartPython()
# job.py
%pip install --upgrade "databricks-sdk>=0.28.0" --quiet
dbutils.library.restartPython()
%run ./config # ❌ Doesn't work in deployed Asset Bundles
get_config() # ❌ NameError: name 'get_config' is not defined
Fix: Convert to pure Python file and use standard import
%pip install --upgrade "databricks-sdk>=0.28.0" --quiet
dbutils.library.restartPython()
from config import get_config # ✅ Works with pure Python file
get_config() # ✅ Available
Validation Checklist
When creating shared code:
- File is pure Python (no
# Databricks notebook sourceheader) - Has proper docstring explaining purpose
- Functions are well-documented
- Can be imported with standard
importorfrom ... import ... - Works after
restartPython()if needed - Used in at least 2 notebooks (if not, consider inlining)
When importing shared code:
- Import statement after
restartPython()block - Using standard Python import (not
%run) - Source file is pure Python file
- No sys.path manipulation needed
- No code duplication
Troubleshooting
Problem: ModuleNotFoundError after restartPython()
Symptoms:
dbutils.library.restartPython()
from config import get_config
# ModuleNotFoundError: No module named 'config'
Diagnosis Steps:
- Check if
config.pyhas# Databricks notebook sourceheader - Verify file is in same directory as importing notebook
- Check file has
.pyextension
Solution:
# In config.py, remove this line if present:
# Databricks notebook source # ❌ Remove this!
# File should start with module docstring:
"""
Configuration module
"""
Problem: NameError after %run and restartPython()
Symptoms:
%run ./config
dbutils.library.restartPython()
get_config() # NameError: name 'get_config' is not defined
Root Cause: restartPython() clears all function definitions, including from %run
Solution: Use standard import instead of %run
dbutils.library.restartPython()
from config import get_config # ✅ Persistent import
get_config() # ✅ Works
References
- Share code between Databricks notebooks - Official documentation
- Work with Python and R modules
- dbutils.library.restartPython()
Related Patterns
- Databricks Asset Bundles Configuration - Deployment patterns
- Lakehouse Monitoring Patterns - Monitor configuration sharing
- DLT Expectations Patterns - DQ rules sharing
Last Updated: October 24, 2025
Pattern Origin: Production issue resolution - update_monitors job
Key Lesson: Always check if shared code is pure Python file vs. Databricks notebook