databricks-hello-world

SKILL.md

Databricks Hello World

Overview

Create your first Databricks cluster and notebook to verify setup.

Prerequisites

  • Completed databricks-install-auth setup
  • Valid API credentials configured
  • Workspace access with cluster creation permissions

Instructions

Step 1: Create a Cluster

# Create a small development cluster via CLI
databricks clusters create --json '{
  "cluster_name": "hello-world-cluster",
  "spark_version": "14.3.x-scala2.12",
  "node_type_id": "Standard_DS3_v2",
  "autotermination_minutes": 30,
  "num_workers": 0,
  "spark_conf": {
    "spark.databricks.cluster.profile": "singleNode",
    "spark.master": "local[*]"
  },
  "custom_tags": {
    "ResourceClass": "SingleNode"
  }
}'

Step 2: Create a Notebook

# hello_world.py - upload as notebook
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Create notebook content
notebook_content = """
# Databricks Hello World

# COMMAND ----------

# Simple DataFrame operations
data = [("Alice", 28), ("Bob", 35), ("Charlie", 42)]
df = spark.createDataFrame(data, ["name", "age"])
display(df)

# COMMAND ----------

# Delta Lake example
df.write.format("delta").mode("overwrite").save("/tmp/hello_world_delta")

# COMMAND ----------

# Read it back
df_read = spark.read.format("delta").load("/tmp/hello_world_delta")
display(df_read)

# COMMAND ----------

print("Hello from Databricks!")
"""

import base64
w.workspace.import_(
    path="/Users/your-email/hello_world",
    format="SOURCE",
    language="PYTHON",
    content=base64.b64encode(notebook_content.encode()).decode(),
    overwrite=True
)
print("Notebook created!")

Step 3: Run the Notebook

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.jobs import Task, NotebookTask, RunNow

w = WorkspaceClient()

# Create a one-time run
run = w.jobs.submit(
    run_name="hello-world-run",
    tasks=[
        Task(
            task_key="hello",
            existing_cluster_id="your-cluster-id",
            notebook_task=NotebookTask(
                notebook_path="/Users/your-email/hello_world"
            )
        )
    ]
)

# Wait for completion
result = w.jobs.get_run(run.response.run_id).result()
print(f"Run completed with state: {result.state.result_state}")

Step 4: Verify with CLI

# List clusters
databricks clusters list

# Get cluster status
databricks clusters get --cluster-id your-cluster-id

# List workspace contents
databricks workspace list /Users/your-email/

# Get run output
databricks runs get-output --run-id your-run-id

Output

  • Development cluster created and running
  • Hello world notebook created in workspace
  • Successful notebook execution
  • Delta table created at /tmp/hello_world_delta

Error Handling

Error Cause Solution
Cluster quota exceeded Workspace limits Terminate unused clusters
Invalid node type Wrong instance type Check available node types
Notebook path exists Duplicate path Use overwrite=True
Cluster pending Startup in progress Wait for RUNNING state
Permission denied Insufficient privileges Request workspace admin access

Examples

Interactive Cluster (Cost-Effective Dev)

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.compute import ClusterSpec

w = WorkspaceClient()

# Create single-node cluster for development
cluster = w.clusters.create_and_wait(
    cluster_name="dev-cluster",
    spark_version="14.3.x-scala2.12",
    node_type_id="Standard_DS3_v2",
    num_workers=0,
    autotermination_minutes=30,
    spark_conf={
        "spark.databricks.cluster.profile": "singleNode",
        "spark.master": "local[*]"
    }
)
print(f"Cluster created: {cluster.cluster_id}")

SQL Warehouse (Serverless)

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Create SQL warehouse for queries
warehouse = w.warehouses.create_and_wait(
    name="hello-warehouse",
    cluster_size="2X-Small",
    auto_stop_mins=15,
    warehouse_type="PRO",
    enable_serverless_compute=True
)
print(f"Warehouse created: {warehouse.id}")

Quick DataFrame Test

# Run in notebook or Databricks Connect
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

# Create sample data
df = spark.range(1000).toDF("id")  # 1000: 1 second in ms
df = df.withColumn("value", df.id * 2)

# Show results
df.show(5)
print(f"Row count: {df.count()}")

Resources

Next Steps

Proceed to databricks-local-dev-loop for local development setup.

Weekly Installs
17
GitHub Stars
1.6K
First Seen
Feb 14, 2026
Installed on
codex17
opencode16
github-copilot16
kimi-cli16
gemini-cli16
amp16