databricks-hello-world
SKILL.md
Databricks Hello World
Overview
Create your first Databricks cluster and notebook to verify setup.
Prerequisites
- Completed
databricks-install-authsetup - Valid API credentials configured
- Workspace access with cluster creation permissions
Instructions
Step 1: Create a Cluster
# Create a small development cluster via CLI
databricks clusters create --json '{
"cluster_name": "hello-world-cluster",
"spark_version": "14.3.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"autotermination_minutes": 30,
"num_workers": 0,
"spark_conf": {
"spark.databricks.cluster.profile": "singleNode",
"spark.master": "local[*]"
},
"custom_tags": {
"ResourceClass": "SingleNode"
}
}'
Step 2: Create a Notebook
# hello_world.py - upload as notebook
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# Create notebook content
notebook_content = """
# Databricks Hello World
# COMMAND ----------
# Simple DataFrame operations
data = [("Alice", 28), ("Bob", 35), ("Charlie", 42)]
df = spark.createDataFrame(data, ["name", "age"])
display(df)
# COMMAND ----------
# Delta Lake example
df.write.format("delta").mode("overwrite").save("/tmp/hello_world_delta")
# COMMAND ----------
# Read it back
df_read = spark.read.format("delta").load("/tmp/hello_world_delta")
display(df_read)
# COMMAND ----------
print("Hello from Databricks!")
"""
import base64
w.workspace.import_(
path="/Users/your-email/hello_world",
format="SOURCE",
language="PYTHON",
content=base64.b64encode(notebook_content.encode()).decode(),
overwrite=True
)
print("Notebook created!")
Step 3: Run the Notebook
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.jobs import Task, NotebookTask, RunNow
w = WorkspaceClient()
# Create a one-time run
run = w.jobs.submit(
run_name="hello-world-run",
tasks=[
Task(
task_key="hello",
existing_cluster_id="your-cluster-id",
notebook_task=NotebookTask(
notebook_path="/Users/your-email/hello_world"
)
)
]
)
# Wait for completion
result = w.jobs.get_run(run.response.run_id).result()
print(f"Run completed with state: {result.state.result_state}")
Step 4: Verify with CLI
# List clusters
databricks clusters list
# Get cluster status
databricks clusters get --cluster-id your-cluster-id
# List workspace contents
databricks workspace list /Users/your-email/
# Get run output
databricks runs get-output --run-id your-run-id
Output
- Development cluster created and running
- Hello world notebook created in workspace
- Successful notebook execution
- Delta table created at
/tmp/hello_world_delta
Error Handling
| Error | Cause | Solution |
|---|---|---|
Cluster quota exceeded |
Workspace limits | Terminate unused clusters |
Invalid node type |
Wrong instance type | Check available node types |
Notebook path exists |
Duplicate path | Use overwrite=True |
Cluster pending |
Startup in progress | Wait for RUNNING state |
Permission denied |
Insufficient privileges | Request workspace admin access |
Examples
Interactive Cluster (Cost-Effective Dev)
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.compute import ClusterSpec
w = WorkspaceClient()
# Create single-node cluster for development
cluster = w.clusters.create_and_wait(
cluster_name="dev-cluster",
spark_version="14.3.x-scala2.12",
node_type_id="Standard_DS3_v2",
num_workers=0,
autotermination_minutes=30,
spark_conf={
"spark.databricks.cluster.profile": "singleNode",
"spark.master": "local[*]"
}
)
print(f"Cluster created: {cluster.cluster_id}")
SQL Warehouse (Serverless)
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# Create SQL warehouse for queries
warehouse = w.warehouses.create_and_wait(
name="hello-warehouse",
cluster_size="2X-Small",
auto_stop_mins=15,
warehouse_type="PRO",
enable_serverless_compute=True
)
print(f"Warehouse created: {warehouse.id}")
Quick DataFrame Test
# Run in notebook or Databricks Connect
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
# Create sample data
df = spark.range(1000).toDF("id") # 1000: 1 second in ms
df = df.withColumn("value", df.id * 2)
# Show results
df.show(5)
print(f"Row count: {df.count()}")
Resources
Next Steps
Proceed to databricks-local-dev-loop for local development setup.
Weekly Installs
17
Repository
jeremylongshore…s-skillsGitHub Stars
1.6K
First Seen
Feb 14, 2026
Security Audits
Installed on
codex17
opencode16
github-copilot16
kimi-cli16
gemini-cli16
amp16