adhoc-exploration-notebooks
Ad-Hoc Exploration Notebook Patterns
Overview
Every data product should include exploration notebooks in two formats to support different development workflows: Databricks workspace format (.py) for interactive use in Databricks UI, and Jupyter format (.ipynb) for local development with Databricks Connect. Magic commands (%pip, %sql, dbutils.library.restartPython()) only work in Databricks workspace, not with Databricks Connect or local execution.
Quick Start (30 minutes)
Goal: Create interactive notebooks for data exploration (works both in Databricks and locally).
What You'll Create:
adhoc_exploration.py— Databricks workspace version (with magic commands)adhoc_exploration.ipynb— Local Jupyter version (Databricks Connect)- Standard helper functions (list tables, explore, check quality, compare, show properties)
Before You Begin: Fill in the Requirements Template with your catalog, schemas, and key tables.
Fast Track: Databricks Workspace
# 1. Upload adhoc_exploration.py to Databricks workspace
# 2. Run setup cells (installs databricks-sdk)
# 3. Use helper functions:
list_tables(catalog, "my_schema")
explore_table(catalog, "my_schema", "fact_sales_daily")
check_data_quality(catalog, "my_schema", "fact_sales_daily")
compare_tables(catalog, "silver_schema", "silver_transactions",
catalog, "gold_schema", "fact_sales_daily")
Fast Track: Local Jupyter
# 1. Install requirements
pip install -r requirements.txt
# 2. Configure Databricks Connect
databricks configure --profile my_profile
# 3. Launch Jupyter and open adhoc_exploration.ipynb
jupyter lab
Key Differences:
- Databricks (.py): Uses magic commands (
%pip,%sql),dbutils.widgets.text()for params - Local (.ipynb): Pure Python, direct variable assignment, requires Databricks Connect
When to Use This Skill
- Building data exploration tools for Bronze/Silver/Gold layers
- Debugging data quality issues
- Creating interactive analysis notebooks
- Validating schema changes
- Testing query patterns before production
- Testing Metric Views and monitoring tables
- Cross-layer data flow validation (Bronze → Silver → Gold)
Critical Rules
1. Dual-Format Requirement
CRITICAL: Create both .py (Databricks workspace) and .ipynb (local Jupyter) formats.
Why: Magic commands (%pip, %sql, dbutils.library.restartPython()) only work in Databricks workspace, not with Databricks Connect.
2. Spark Session Initialization
CRITICAL: Must specify either .serverless() or .clusterId() — default will fail:
# ✅ CORRECT
spark = DatabricksSession.builder.serverless().profile("profile").getOrCreate()
# OR
spark = DatabricksSession.builder.clusterId("cluster-id").getOrCreate()
# ❌ WRONG
spark = DatabricksSession.builder.getOrCreate() # Exception: Cluster id or serverless required
3. Widget Fallback Pattern
Always use widget fallback pattern for cross-environment compatibility:
# Default values
catalog = "default_catalog"
# Try widgets (Databricks workspace), fall back to defaults
try:
dbutils.widgets.text("catalog", catalog, "Catalog")
catalog = dbutils.widgets.get("catalog")
except Exception:
# Widgets not available (Databricks Connect)
pass
Quick Reference
File Structure
src/exploration/
├── adhoc_exploration.py # Databricks workspace format
├── adhoc_exploration.ipynb # Local Jupyter format
├── requirements.txt # Python dependencies
├── README.md # Full documentation
└── QUICKSTART.md # 5-minute setup guide
Standard Helper Functions
Every exploration notebook should include:
- Table Discovery:
list_tables(catalog_name, schema_name)— List all tables with metadata - Table Exploration:
explore_table(catalog_name, schema_name, table_name, limit)— Schema + sample data + row count - Data Quality:
check_data_quality(catalog_name, schema_name, table_name)— Null counts, duplicates - Comparison:
compare_tables(catalog1, schema1, table1, catalog2, schema2, table2)— Compare across layers - Governance:
show_table_properties(catalog_name, schema_name, table_name)— View metadata, tags, TBLPROPERTIES
See Notebook Patterns for complete implementations.
Core Patterns
Databricks Workspace Format (.py)
Key Features:
# Databricks notebook sourceheader# COMMAND ----------cell separators# MAGIC %mdfor markdown cells- Widget support with fallback pattern
- Magic commands work (
%pip,%sql)
See Notebook Patterns for complete template.
Local Jupyter Format (.ipynb)
Key Features:
- Standard Jupyter notebook format
- No Databricks-specific headers
- No magic commands
- Direct variable assignment (no widgets)
- Setup instructions in markdown cell
Requirements:
databricks-sdk[notebook]>=0.28.0
pyspark>=3.5.0
Multiple Execution Options
Option 1: Run via Databricks UI
- Open the notebook in Databricks workspace
- Attach to a cluster
- Adjust widget parameters if needed
- Run cells interactively
Option 2: Run via Asset Bundle
Critical: Include .py extension in notebook path:
notebook_task:
notebook_path: ../src/exploration/adhoc_exploration.py # ✅ Include .py extension
base_parameters:
catalog: ${var.catalog}
# Deploy and run
databricks bundle deploy -t dev
databricks bundle run -t dev adhoc_exploration_job
See Notebook Patterns for complete YAML example.
Option 3: Direct Execution via %run
# In any Databricks notebook, import and use the functions:
%run ./exploration/adhoc_exploration
# Then use the helper functions
list_tables(catalog, bronze_schema)
df = explore_table(catalog, gold_schema, "fact_sales_daily")
Common Mistakes to Avoid
❌ Mistake 1: Missing .serverless() or .clusterId()
# ❌ DON'T: Use default builder
spark = DatabricksSession.builder.getOrCreate()
# Exception: Cluster id or serverless are required
✅ Correct: Specify execution environment
spark = DatabricksSession.builder.serverless().profile("profile").getOrCreate()
❌ Mistake 2: Forgetting File Extension in Bundle
# ❌ DON'T: Omit .py extension
notebook_path: ../src/exploration/adhoc_exploration
# Error: notebook not found
✅ Correct: Include .py extension
notebook_path: ../src/exploration/adhoc_exploration.py
❌ Mistake 3: Using Magic Commands Locally
# ❌ DON'T: Use magic commands
%pip install 'databricks-sdk[notebook]'
# SyntaxError: invalid syntax
✅ Correct: Install packages before running
pip install -r requirements.txt
Validation Checklist
File Structure
- Both
.pyand.ipynbformats created -
requirements.txtwith dependencies - README.md with full documentation
- QUICKSTART.md with 5-minute setup
.py File (Databricks Workspace)
- Has
# Databricks notebook sourceheader - Uses
# COMMAND ----------cell separators - Markdown cells use
# MAGIC %md - Initializes Spark with
.serverless()or.clusterId() - Uses widget fallback pattern
- All helper functions defined
.ipynb File (Local Jupyter)
- Standard Jupyter notebook format
- No Databricks-specific headers
- No magic commands
- Direct variable assignment (no widgets)
- Setup instructions in markdown cell
- Same helper functions as .py
Helper Functions
-
list_tables()— Table discovery -
explore_table()— Schema + samples -
check_data_quality()— DQ checks -
compare_tables()— Layer comparison -
show_table_properties()— Governance metadata
Reference Files
Notebook Patterns
references/notebook-patterns.md — Complete code patterns for both .py and .ipynb formats, helper function implementations, widget fallback patterns, Spark session initialization, dev/prod configuration, %run import patterns, and Asset Bundle integration.
Analysis Workflows
references/analysis-workflows.md — Step-by-step analysis workflows for data discovery, data quality investigation, schema validation, Metric View testing, monitoring metrics review, and cross-layer comparison.
Tips & Troubleshooting
references/tips-and-troubleshooting.md — Visualization tips, summary statistics, performance analysis, extending notebooks with custom functions and matplotlib/plotly, troubleshooting common errors (table not found, permission denied, slow queries, connection issues), and best practices.
Assets
Exploration Notebook Template
assets/templates/exploration-notebook.py — Starter notebook template with all standard helper functions and widget fallback patterns.
Requirements Template
assets/templates/requirements-template.md — Fill-in-the-blank planning template for catalog configuration, environment settings, tables to explore, DQ focus areas, and cross-layer comparisons.
References
Official Databricks Documentation
Related Patterns
- Databricks Asset Bundles — Deployment patterns
- Python File Imports — Sharing code between notebooks
- DLT Expectations — DQ rules and validation
- Metric Views — Metric View testing patterns
- Observability Setup — Monitoring table patterns