databricks-iceberg
Apache Iceberg on Databricks
Databricks provides multiple ways to work with Apache Iceberg: native managed Iceberg tables, UniForm for Delta-to-Iceberg interoperability, and the Iceberg REST Catalog (IRC) for external engine access.
Critical Rules (always follow)
- MUST use Unity Catalog — all Iceberg features require UC-enabled workspaces
- MUST NOT install an Iceberg library into Databricks Runtime (DBR includes built-in Iceberg support; adding a library causes version conflicts)
- MUST NOT set
write.metadata.pathorwrite.metadata.previous-versions-max— Databricks manages metadata locations automatically; overriding causes corruption - MUST determine which Iceberg pattern fits the use case before writing code — see the When to Use section below
- MUST know that both
PARTITIONED BYandCLUSTER BYproduce the same Iceberg metadata for external engines — UC maintains an Iceberg partition spec with partition fields corresponding to the clustering keys, so external engines reading via IRC see a partitioned Iceberg table (not Hive-style, but proper Iceberg partition fields) and can prune on those fields; internally UC uses those fields as liquid clustering keys; the only differences between the two syntaxes are: (1)PARTITIONED BYis standard Iceberg DDL (any engine can create the table), whileCLUSTER BYis DBR-only DDL; (2)PARTITIONED BYauto-handles DV/row-tracking properties, whileCLUSTER BYrequires manual TBLPROPERTIES on v2 - MUST NOT use expression-based partition transforms (
bucket(),years(),months(),days(),hours()) withPARTITIONED BYon managed Iceberg tables — only plain column references are supported; expression transforms cause errors - MUST disable deletion vectors and row tracking when using
CLUSTER BYon Iceberg v2 tables — set'delta.enableDeletionVectors' = falseand'delta.enableRowTracking' = falsein TBLPROPERTIES (Iceberg v3 handles this automatically;PARTITIONED BYhandles this automatically on both v2 and v3)
Key Concepts
| Concept | Summary |
|---|---|
| Managed Iceberg Table | Native Iceberg table created with USING ICEBERG — full read/write in Databricks and via external Iceberg engines |
| External Iceberg Reads (Uniform) | Delta table that auto-generates Iceberg metadata — read as Iceberg externally, write as Delta internally |
| Compatibility Mode | UniForm variant for streaming tables and materialized views in SDP pipelines |
| Iceberg REST Catalog (IRC) | Unity Catalog's built-in REST endpoint implementing the Iceberg REST Catalog spec — lets external engines (Spark, PyIceberg, Snowflake) access UC-managed Iceberg data |
| Iceberg v3 | Next-gen format (Beta, DBR 17.3+) — deletion vectors, VARIANT type, row lineage |
Quick Start
Create a Managed Iceberg Table
-- No clustering
CREATE TABLE my_catalog.my_schema.events
USING ICEBERG
AS SELECT * FROM raw_events;
-- PARTITIONED BY (recommended for cross-platform): standard Iceberg syntax, works on EMR/OSS Spark/Trino/Flink
-- auto-disables DVs and row tracking — no TBLPROPERTIES needed on v2 or v3
CREATE TABLE my_catalog.my_schema.events
USING ICEBERG
PARTITIONED BY (event_date)
AS SELECT * FROM raw_events;
-- CLUSTER BY on Iceberg v2 (DBR-only syntax): must manually disable DVs and row tracking
CREATE TABLE my_catalog.my_schema.events
USING ICEBERG
TBLPROPERTIES (
'delta.enableDeletionVectors' = false,
'delta.enableRowTracking' = false
)
CLUSTER BY (event_date)
AS SELECT * FROM raw_events;
-- CLUSTER BY on Iceberg v3 (DBR-only syntax): no TBLPROPERTIES needed
CREATE TABLE my_catalog.my_schema.events
USING ICEBERG
TBLPROPERTIES ('format-version' = '3')
CLUSTER BY (event_date)
AS SELECT * FROM raw_events;
Enable UniForm on an Existing Delta Table
ALTER TABLE my_catalog.my_schema.customers
SET TBLPROPERTIES (
'delta.columnMapping.mode' = 'name',
'delta.enableIcebergCompatV2' = 'true',
'delta.universalFormat.enabledFormats' = 'iceberg'
);
Read/Write Capability Matrix
| Table Type | Databricks Read | Databricks Write | External IRC Read | External IRC Write |
|---|---|---|---|---|
Managed Iceberg (USING ICEBERG) |
Yes | Yes | Yes | Yes |
| Delta + UniForm | Yes (as Delta) | Yes (as Delta) | Yes (as Iceberg) | No |
| Delta + Compatibility Mode | Yes (as Delta) | Yes | Yes (as Iceberg) | No |
Reference Files
| File | Summary | Keywords |
|---|---|---|
| 1-managed-iceberg-tables.md | Creating and managing native Iceberg tables — DDL, DML, Liquid Clustering, Predictive Optimization, Iceberg v3, limitations | CREATE TABLE USING ICEBERG, CTAS, MERGE, time travel, deletion vectors, VARIANT |
| 2-uniform-and-compatibility.md | Making Delta tables readable as Iceberg — UniForm for regular tables, Compatibility Mode for streaming tables and MVs | UniForm, universalFormat, Compatibility Mode, streaming tables, materialized views, SDP |
| 3-iceberg-rest-catalog.md | Exposing Databricks tables to external engines via the IRC endpoint — auth, credential vending, IP access lists | IRC, REST Catalog, credential vending, EXTERNAL USE SCHEMA, PAT, OAuth |
| 4-snowflake-interop.md | Bidirectional Snowflake-Databricks integration — catalog integration, foreign catalogs, vended credentials | Snowflake, catalog integration, external volume, vended credentials, REFRESH_INTERVAL_SECONDS |
| 5-external-engine-interop.md | Connecting PyIceberg, OSS Spark, AWS EMR, Apache Flink, and Kafka Connect via IRC | PyIceberg, OSS Spark, EMR, Flink, Kafka Connect, pyiceberg.yaml |
When to Use
- Creating a new Iceberg table → 1-managed-iceberg-tables.md
- Making an existing Delta table readable as Iceberg → 2-uniform-and-compatibility.md
- Making a streaming table or MV readable as Iceberg → 2-uniform-and-compatibility.md (Compatibility Mode section)
- Choosing between Managed Iceberg vs UniForm vs Compatibility Mode → decision table in 2-uniform-and-compatibility.md
- Exposing Databricks tables to external engines via REST API → 3-iceberg-rest-catalog.md
- Integrating Databricks with Snowflake (either direction) → 4-snowflake-interop.md
- Connecting PyIceberg, OSS Spark, Flink, EMR, or Kafka → 5-external-engine-interop.md
Common Issues
| Issue | Solution |
|---|---|
| No Change Data Feed (CDF) | CDF is not supported on managed Iceberg tables. Use Delta + UniForm if you need CDF. |
| UniForm async delay | Iceberg metadata generation is asynchronous. After a write, there may be a brief delay before external engines see the latest data. Check status with DESCRIBE EXTENDED table_name. |
| Compression codec change | Managed Iceberg tables use zstd compression by default (not snappy). Older Iceberg readers that don't support zstd will fail. Verify reader compatibility or set write.parquet.compression-codec to snappy. |
| Snowflake 1000-commit limit | Snowflake's Iceberg catalog integration can only see the last 1000 Iceberg commits. High-frequency writers must compact metadata or Snowflake will lose visibility of older data. |
| Deletion vectors with UniForm | UniForm requires deletion vectors to be disabled (delta.enableDeletionVectors = false). If your table has deletion vectors enabled, disable them before enabling UniForm. |
| No shallow clone for Iceberg | SHALLOW CLONE is not supported for Iceberg tables. Use DEEP CLONE or CREATE TABLE ... AS SELECT instead. |
| Version mismatch with external engines | Ensure external engines use an Iceberg library version compatible with the format version of your tables. Iceberg v3 tables require Iceberg library 1.9.0+. |
Related Skills
- databricks-unity-catalog — catalog/schema management, governance, system tables
- databricks-spark-declarative-pipelines — SDP pipelines (streaming tables, materialized views with Compatibility Mode)
- databricks-python-sdk — Python SDK and REST API for Databricks operations
- databricks-dbsql — SQL warehouse features, query patterns
Resources
- Iceberg Overview — main hub for Iceberg on Databricks
- UniForm — Delta Universal Format
- Iceberg REST Catalog — IRC endpoint and external engine access
- Compatibility Mode — UniForm for streaming tables and MVs
- Iceberg v3 — next-gen format features (Beta)
- Foreign Tables — reading external catalog data
More from databricks-solutions/ai-dev-kit
databricks-python-sdk
Databricks development guidance including Python SDK, Databricks Connect, CLI, and REST API. Use when working with databricks-sdk, databricks-connect, or Databricks APIs.
132python-dev
Python development guidance with code quality standards, error handling, testing practices, and environment management. Use when writing, reviewing, or modifying Python code (.py files) or Jupyter notebooks (.ipynb files).
68skill-test
Testing framework for evaluating Databricks skills. Use when building test cases for skills, running skill evaluations, comparing skill versions, or creating ground truth datasets with the Generate-Review-Promote (GRP) pipeline. Triggers include "test skill", "evaluate skill", "skill regression", "ground truth", "GRP pipeline", "skill quality", and "skill metrics".
53databricks-docs
Databricks documentation reference via llms.txt index. Use when other skills do not cover a topic, looking up unfamiliar Databricks features, or needing authoritative docs on APIs, configurations, or platform capabilities.
29databricks-config
Manage Databricks workspace connections: check current workspace, switch profiles, list available workspaces, or authenticate to a new workspace. Use when the user mentions \"switch workspace\", \"which workspace\", \"current profile\", \"databrickscfg\", \"connect to workspace\", or \"databricks auth\".
26databricks-app-python
Builds Python-based Databricks applications using Dash, Streamlit, Gradio, Flask, FastAPI, or Reflex. Handles OAuth authorization (app and user auth), app resources, SQL warehouse and Lakebase connectivity, model serving integration, foundation model APIs, LLM integration, and deployment. Use when building Python web apps, dashboards, ML demos, or REST APIs for Databricks, or when the user mentions Streamlit, Dash, Gradio, Flask, FastAPI, Reflex, or Databricks app.
22