databricks-jobs
Databricks Lakeflow Jobs
Overview
Databricks Jobs orchestrate data workflows with multi-task DAGs, flexible triggers, and comprehensive monitoring. Jobs support diverse task types and can be managed via Python SDK, CLI, or Asset Bundles.
Reference Files
| Use Case | Reference File |
|---|---|
| Configure task types (notebook, Python, SQL, dbt, etc.) | task-types.md |
| Set up triggers and schedules | triggers-schedules.md |
| Configure notifications and health monitoring | notifications-monitoring.md |
| Complete working examples | examples.md |
Quick Start
Python SDK
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.jobs import Task, NotebookTask, Source
w = WorkspaceClient()
job = w.jobs.create(
name="my-etl-job",
tasks=[
Task(
task_key="extract",
notebook_task=NotebookTask(
notebook_path="/Workspace/Users/user@example.com/extract",
source=Source.WORKSPACE
)
)
]
)
print(f"Created job: {job.job_id}")
CLI
databricks jobs create --json '{
"name": "my-etl-job",
"tasks": [{
"task_key": "extract",
"notebook_task": {
"notebook_path": "/Workspace/Users/user@example.com/extract",
"source": "WORKSPACE"
}
}]
}'
Asset Bundles (DABs)
# resources/jobs.yml
resources:
jobs:
my_etl_job:
name: "[${bundle.target}] My ETL Job"
tasks:
- task_key: extract
notebook_task:
notebook_path: ../src/notebooks/extract.py
Core Concepts
Multi-Task Workflows
Jobs support DAG-based task dependencies:
tasks:
- task_key: extract
notebook_task:
notebook_path: ../src/extract.py
- task_key: transform
depends_on:
- task_key: extract
notebook_task:
notebook_path: ../src/transform.py
- task_key: load
depends_on:
- task_key: transform
run_if: ALL_SUCCESS # Only run if all dependencies succeed
notebook_task:
notebook_path: ../src/load.py
run_if conditions:
ALL_SUCCESS(default) - Run when all dependencies succeedALL_DONE- Run when all dependencies complete (success or failure)AT_LEAST_ONE_SUCCESS- Run when at least one dependency succeedsNONE_FAILED- Run when no dependencies failedALL_FAILED- Run when all dependencies failedAT_LEAST_ONE_FAILED- Run when at least one dependency failed
Task Types Summary
| Task Type | Use Case | Reference |
|---|---|---|
notebook_task |
Run notebooks | task-types.md#notebook-task |
spark_python_task |
Run Python scripts | task-types.md#spark-python-task |
python_wheel_task |
Run Python wheels | task-types.md#python-wheel-task |
sql_task |
Run SQL queries/files | task-types.md#sql-task |
dbt_task |
Run dbt projects | task-types.md#dbt-task |
pipeline_task |
Trigger DLT/SDP pipelines | task-types.md#pipeline-task |
spark_jar_task |
Run Spark JARs | task-types.md#spark-jar-task |
run_job_task |
Trigger other jobs | task-types.md#run-job-task |
for_each_task |
Loop over inputs | task-types.md#for-each-task |
Trigger Types Summary
| Trigger Type | Use Case | Reference |
|---|---|---|
schedule |
Cron-based scheduling | triggers-schedules.md#cron-schedule |
trigger.periodic |
Interval-based | triggers-schedules.md#periodic-trigger |
trigger.file_arrival |
File arrival events | triggers-schedules.md#file-arrival-trigger |
trigger.table_update |
Table change events | triggers-schedules.md#table-update-trigger |
continuous |
Always-running jobs | triggers-schedules.md#continuous-jobs |
Compute Configuration
Job Clusters (Recommended)
Define reusable cluster configurations:
job_clusters:
- job_cluster_key: shared_cluster
new_cluster:
spark_version: "15.4.x-scala2.12"
node_type_id: "i3.xlarge"
num_workers: 2
spark_conf:
spark.speculation: "true"
tasks:
- task_key: my_task
job_cluster_key: shared_cluster
notebook_task:
notebook_path: ../src/notebook.py
Autoscaling Clusters
new_cluster:
spark_version: "15.4.x-scala2.12"
node_type_id: "i3.xlarge"
autoscale:
min_workers: 2
max_workers: 8
Existing Cluster
tasks:
- task_key: my_task
existing_cluster_id: "0123-456789-abcdef12"
notebook_task:
notebook_path: ../src/notebook.py
Serverless Compute
For notebook and Python tasks, omit cluster configuration to use serverless:
tasks:
- task_key: serverless_task
notebook_task:
notebook_path: ../src/notebook.py
# No cluster config = serverless
Job Parameters
Define Parameters
parameters:
- name: env
default: "dev"
- name: date
default: "{{start_date}}" # Dynamic value reference
Access in Notebook
# In notebook
dbutils.widgets.get("env")
dbutils.widgets.get("date")
Pass to Tasks
tasks:
- task_key: my_task
notebook_task:
notebook_path: ../src/notebook.py
base_parameters:
env: "{{job.parameters.env}}"
custom_param: "value"
Common Operations
Python SDK Operations
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# List jobs
jobs = w.jobs.list()
# Get job details
job = w.jobs.get(job_id=12345)
# Run job now
run = w.jobs.run_now(job_id=12345)
# Run with parameters
run = w.jobs.run_now(
job_id=12345,
job_parameters={"env": "prod", "date": "2024-01-15"}
)
# Cancel run
w.jobs.cancel_run(run_id=run.run_id)
# Delete job
w.jobs.delete(job_id=12345)
CLI Operations
# List jobs
databricks jobs list
# Get job details
databricks jobs get 12345
# Run job
databricks jobs run-now 12345
# Run with parameters
databricks jobs run-now 12345 --job-params '{"env": "prod"}'
# Cancel run
databricks jobs cancel-run 67890
# Delete job
databricks jobs delete 12345
Asset Bundle Operations
# Validate configuration
databricks bundle validate
# Deploy job
databricks bundle deploy
# Run job
databricks bundle run my_job_resource_key
# Deploy to specific target
databricks bundle deploy -t prod
# Destroy resources
databricks bundle destroy
Permissions (DABs)
resources:
jobs:
my_job:
name: "My Job"
permissions:
- level: CAN_VIEW
group_name: "data-analysts"
- level: CAN_MANAGE_RUN
group_name: "data-engineers"
- level: CAN_MANAGE
user_name: "admin@example.com"
Permission levels:
CAN_VIEW- View job and run historyCAN_MANAGE_RUN- View, trigger, and cancel runsCAN_MANAGE- Full control including edit and delete
Common Issues
| Issue | Solution |
|---|---|
| Job cluster startup slow | Use job clusters with job_cluster_key for reuse across tasks |
| Task dependencies not working | Verify task_key references match exactly in depends_on |
| Schedule not triggering | Check pause_status: UNPAUSED and valid timezone |
| File arrival not detecting | Ensure path has proper permissions and uses cloud storage URL |
| Table update trigger missing events | Verify Unity Catalog table and proper grants |
| Parameter not accessible | Use dbutils.widgets.get() in notebooks |
| "admins" group error | Cannot modify admins permissions on jobs |
| Serverless task fails | Ensure task type supports serverless (notebook, Python) |
Related Skills
- databricks-bundles - Deploy jobs via Databricks Asset Bundles
- databricks-spark-declarative-pipelines - Configure pipelines triggered by jobs
Resources
More from databricks-solutions/ai-dev-kit
databricks-python-sdk
Databricks development guidance including Python SDK, Databricks Connect, CLI, and REST API. Use when working with databricks-sdk, databricks-connect, or Databricks APIs.
132python-dev
Python development guidance with code quality standards, error handling, testing practices, and environment management. Use when writing, reviewing, or modifying Python code (.py files) or Jupyter notebooks (.ipynb files).
68skill-test
Testing framework for evaluating Databricks skills. Use when building test cases for skills, running skill evaluations, comparing skill versions, or creating ground truth datasets with the Generate-Review-Promote (GRP) pipeline. Triggers include "test skill", "evaluate skill", "skill regression", "ground truth", "GRP pipeline", "skill quality", and "skill metrics".
53databricks-docs
Databricks documentation reference via llms.txt index. Use when other skills do not cover a topic, looking up unfamiliar Databricks features, or needing authoritative docs on APIs, configurations, or platform capabilities.
29databricks-config
Manage Databricks workspace connections: check current workspace, switch profiles, list available workspaces, or authenticate to a new workspace. Use when the user mentions \"switch workspace\", \"which workspace\", \"current profile\", \"databrickscfg\", \"connect to workspace\", or \"databricks auth\".
26databricks-app-python
Builds Python-based Databricks applications using Dash, Streamlit, Gradio, Flask, FastAPI, or Reflex. Handles OAuth authorization (app and user auth), app resources, SQL warehouse and Lakebase connectivity, model serving integration, foundation model APIs, LLM integration, and deployment. Use when building Python web apps, dashboards, ML demos, or REST APIs for Databricks, or when the user mentions Streamlit, Dash, Gradio, Flask, FastAPI, Reflex, or Databricks app.
22