databricks-app-python
Databricks Python Application
Build Python-based Databricks applications. For full examples and recipes, see the Databricks Apps Cookbook.
Critical Rules (always follow)
- MUST confirm framework choice or use Framework Selection below
- MUST use SDK
Config()for authentication (never hardcode tokens) - MUST use
app.yamlvalueFromfor resources (never hardcode resource IDs) - MUST use
dash-bootstrap-componentsfor Dash app layout and styling - MUST use
@st.cache_resourcefor Streamlit database connections - MUST deploy Flask with Gunicorn, FastAPI with uvicorn (not dev servers)
Required Steps
Copy this checklist and verify each item:
- [ ] Framework selected
- [ ] Auth strategy decided: app auth, user auth, or both
- [ ] App resources identified (SQL warehouse, Lakebase, serving endpoint, etc.)
- [ ] Backend data strategy decided (SQL warehouse, Lakebase, or SDK)
- [ ] Deployment method: CLI or DABs
Framework Selection
| Framework | Best For | app.yaml Command |
|---|---|---|
| Dash | Production dashboards, BI tools, complex interactivity | ["python", "app.py"] |
| Streamlit | Rapid prototyping, data science apps, internal tools | ["streamlit", "run", "app.py"] |
| Gradio | ML demos, model interfaces, chat UIs | ["python", "app.py"] |
| Flask | Custom REST APIs, lightweight apps, webhooks | ["gunicorn", "app:app", "-w", "4", "-b", "0.0.0.0:8000"] |
| FastAPI | Async APIs, auto-generated OpenAPI docs | ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"] |
| Reflex | Full-stack Python apps without JavaScript | ["reflex", "run", "--env", "prod"] |
Default: Recommend Streamlit for prototypes, Dash for production dashboards, FastAPI for APIs, Gradio for ML demos.
Quick Reference
| Concept | Details |
|---|---|
| Runtime | Python 3.11, Ubuntu 22.04, 2 vCPU, 6 GB RAM |
| Pre-installed | Dash 2.18.1, Streamlit 1.38.0, Gradio 4.44.0, Flask 3.0.3, FastAPI 0.115.0 |
| Auth (app) | Service principal via Config() — auto-injected DATABRICKS_CLIENT_ID/DATABRICKS_CLIENT_SECRET |
| Auth (user) | x-forwarded-access-token header — see 1-authorization.md |
| Resources | valueFrom in app.yaml — see 2-app-resources.md |
| Cookbook | https://apps-cookbook.dev/ |
| Docs | https://docs.databricks.com/aws/en/dev-tools/databricks-apps/ |
Detailed Guides
Authorization: Use 1-authorization.md when configuring app or user authorization — covers service principal auth, on-behalf-of user tokens, OAuth scopes, and per-framework code examples. (Keywords: OAuth, service principal, user auth, on-behalf-of, access token, scopes)
App resources: Use 2-app-resources.md when connecting your app to Databricks resources — covers SQL warehouses, Lakebase, model serving, secrets, volumes, and the valueFrom pattern. (Keywords: resources, valueFrom, SQL warehouse, model serving, secrets, volumes, connections)
Frameworks: See 3-frameworks.md for Databricks-specific patterns per framework — covers Dash, Streamlit, Gradio, Flask, FastAPI, and Reflex with auth integration, deployment commands, and Cookbook links. (Keywords: Dash, Streamlit, Gradio, Flask, FastAPI, Reflex, framework selection)
Deployment: Use 4-deployment.md when deploying your app — covers Databricks CLI, Asset Bundles (DABs), app.yaml configuration, and post-deployment verification. (Keywords: deploy, CLI, DABs, asset bundles, app.yaml, logs)
Lakebase: Use 5-lakebase.md when using Lakebase (PostgreSQL) as your app's data layer — covers auto-injected env vars, psycopg2/asyncpg patterns, and when to choose Lakebase vs SQL warehouse. (Keywords: Lakebase, PostgreSQL, psycopg2, asyncpg, transactional, PGHOST)
MCP tools: Use 6-mcp-approach.md for managing app lifecycle via MCP tools — covers creating, deploying, monitoring, and deleting apps programmatically. (Keywords: MCP, create app, deploy app, app logs)
Workflow
-
Determine the task type:
New app from scratch? → Use Framework Selection, then read 3-frameworks.md Setting up authorization? → Read 1-authorization.md Connecting to data/resources? → Read 2-app-resources.md Using Lakebase (PostgreSQL)? → Read 5-lakebase.md Deploying to Databricks? → Read 4-deployment.md Using MCP tools? → Read 6-mcp-approach.md
-
Follow the instructions in the relevant guide
-
For full code examples, browse https://apps-cookbook.dev/
Core Architecture
All Python Databricks apps follow this pattern:
app-directory/
├── app.py # Main application (or framework-specific name)
├── models.py # Pydantic data models
├── backend.py # Data access layer
├── requirements.txt # Additional Python dependencies
├── app.yaml # Databricks Apps configuration
└── README.md
Backend Toggle Pattern
import os
from databricks.sdk.core import Config
USE_MOCK = os.getenv("USE_MOCK_BACKEND", "true").lower() == "true"
if USE_MOCK:
from backend_mock import MockBackend as Backend
else:
from backend_real import RealBackend as Backend
backend = Backend()
SQL Warehouse Connection (shared across all frameworks)
from databricks.sdk.core import Config
from databricks import sql
cfg = Config() # Auto-detects credentials from environment
conn = sql.connect(
server_hostname=cfg.host,
http_path=f"/sql/1.0/warehouses/{os.getenv('DATABRICKS_WAREHOUSE_ID')}",
credentials_provider=lambda: cfg.authenticate,
)
Pydantic Models
from pydantic import BaseModel, Field
from datetime import datetime
from enum import Enum
class Status(str, Enum):
ACTIVE = "active"
PENDING = "pending"
class EntityOut(BaseModel):
id: str
name: str
status: Status
created_at: datetime
class EntityIn(BaseModel):
name: str = Field(..., min_length=1)
status: Status = Status.PENDING
Common Issues
| Issue | Solution |
|---|---|
| Connection exhausted | Use @st.cache_resource (Streamlit) or connection pooling |
| Auth token not found | Check x-forwarded-access-token header — only available when deployed, not locally |
| App won't start | Check app.yaml command matches framework; check databricks apps logs <name> |
| Resource not accessible | Add resource via UI, verify SP has permissions, use valueFrom in app.yaml |
| Import error on deploy | Add missing packages to requirements.txt (pre-installed packages don't need listing) |
| Lakebase app crashes on start | psycopg2/asyncpg are NOT pre-installed — MUST add to requirements.txt |
| Port conflict | Apps must bind to DATABRICKS_APP_PORT env var (defaults to 8000). Never use 8080. Streamlit is auto-configured; for others, read the env var in code or use 8000 in app.yaml command |
| Streamlit: set_page_config error | st.set_page_config() must be the first Streamlit command |
| Dash: unstyled layout | Add dash-bootstrap-components; use dbc.themes.BOOTSTRAP |
| Slow queries | Use Lakebase for transactional/low-latency; SQL warehouse for analytical queries |
Platform Constraints
| Constraint | Details |
|---|---|
| Runtime | Python 3.11, Ubuntu 22.04 LTS |
| Compute | 2 vCPUs, 6 GB memory (default) |
| Pre-installed frameworks | Dash, Streamlit, Gradio, Flask, FastAPI, Shiny |
| Custom packages | Add to requirements.txt in app root |
| Network | Apps can reach Databricks APIs; external access depends on workspace config |
| User auth | Public Preview — workspace admin must enable before adding scopes |
Official Documentation
- Databricks Apps Overview — main docs hub
- Apps Cookbook — ready-to-use code snippets (Streamlit, Dash, Reflex, FastAPI)
- Authorization — app auth and user auth
- Resources — SQL warehouse, Lakebase, serving, secrets
- app.yaml Reference — command and env config
- System Environment — pre-installed packages, runtime details
Related Skills
- databricks-app-apx - full-stack apps with FastAPI + React
- databricks-asset-bundles - deploying apps via DABs
- databricks-python-sdk - backend SDK integration
- databricks-lakebase-provisioned - adding persistent PostgreSQL state
- databricks-model-serving - serving ML models for app integration