databricks-iceberg
SKILL.md
Apache Iceberg on Databricks
Databricks provides multiple ways to work with Apache Iceberg: native managed Iceberg tables, UniForm for Delta-to-Iceberg interoperability, and the Iceberg REST Catalog (IRC) for external engine access.
Critical Rules (always follow)
- MUST use Unity Catalog — all Iceberg features require UC-enabled workspaces
- MUST NOT install an Iceberg library into Databricks Runtime (DBR includes built-in Iceberg support; adding a library causes version conflicts)
- MUST NOT set
write.metadata.pathorwrite.metadata.previous-versions-max— Databricks manages metadata locations automatically; overriding causes corruption - MUST determine which Iceberg pattern fits the use case before writing code — see the When to Use section below
- MUST know that both
PARTITIONED BYandCLUSTER BYproduce the same Iceberg metadata for external engines — UC maintains an Iceberg partition spec with partition fields corresponding to the clustering keys, so external engines reading via IRC see a partitioned Iceberg table (not Hive-style, but proper Iceberg partition fields) and can prune on those fields; internally UC uses those fields as liquid clustering keys; the only differences between the two syntaxes are: (1)PARTITIONED BYis standard Iceberg DDL (any engine can create the table), whileCLUSTER BYis DBR-only DDL; (2)PARTITIONED BYauto-handles DV/row-tracking properties, whileCLUSTER BYrequires manual TBLPROPERTIES on v2 - MUST NOT use expression-based partition transforms (
bucket(),years(),months(),days(),hours()) withPARTITIONED BYon managed Iceberg tables — only plain column references are supported; expression transforms cause errors - MUST disable deletion vectors and row tracking when using
CLUSTER BYon Iceberg v2 tables — set'delta.enableDeletionVectors' = falseand'delta.enableRowTracking' = falsein TBLPROPERTIES (Iceberg v3 handles this automatically;PARTITIONED BYhandles this automatically on both v2 and v3)
Key Concepts
| Concept | Summary |
|---|---|
| Managed Iceberg Table | Native Iceberg table created with USING ICEBERG — full read/write in Databricks and via external Iceberg engines |
| External Iceberg Reads (Uniform) | Delta table that auto-generates Iceberg metadata — read as Iceberg externally, write as Delta internally |
| Compatibility Mode | UniForm variant for streaming tables and materialized views in SDP pipelines |
| Iceberg REST Catalog (IRC) | Unity Catalog's built-in REST endpoint implementing the Iceberg REST Catalog spec — lets external engines (Spark, PyIceberg, Snowflake) access UC-managed Iceberg data |
| Iceberg v3 | Next-gen format (Beta, DBR 17.3+) — deletion vectors, VARIANT type, row lineage |
Quick Start
Create a Managed Iceberg Table
-- No clustering
CREATE TABLE my_catalog.my_schema.events
USING ICEBERG
AS SELECT * FROM raw_events;
-- PARTITIONED BY (recommended for cross-platform): standard Iceberg syntax, works on EMR/OSS Spark/Trino/Flink
-- auto-disables DVs and row tracking — no TBLPROPERTIES needed on v2 or v3
CREATE TABLE my_catalog.my_schema.events
USING ICEBERG
PARTITIONED BY (event_date)
AS SELECT * FROM raw_events;
-- CLUSTER BY on Iceberg v2 (DBR-only syntax): must manually disable DVs and row tracking
CREATE TABLE my_catalog.my_schema.events
USING ICEBERG
TBLPROPERTIES (
'delta.enableDeletionVectors' = false,
'delta.enableRowTracking' = false
)
CLUSTER BY (event_date)
AS SELECT * FROM raw_events;
-- CLUSTER BY on Iceberg v3 (DBR-only syntax): no TBLPROPERTIES needed
CREATE TABLE my_catalog.my_schema.events
USING ICEBERG
TBLPROPERTIES ('format-version' = '3')
CLUSTER BY (event_date)
AS SELECT * FROM raw_events;
Enable UniForm on an Existing Delta Table
ALTER TABLE my_catalog.my_schema.customers
SET TBLPROPERTIES (
'delta.columnMapping.mode' = 'name',
'delta.enableIcebergCompatV2' = 'true',
'delta.universalFormat.enabledFormats' = 'iceberg'
);
Read/Write Capability Matrix
| Table Type | Databricks Read | Databricks Write | External IRC Read | External IRC Write |
|---|---|---|---|---|
Managed Iceberg (USING ICEBERG) |
Yes | Yes | Yes | Yes |
| Delta + UniForm | Yes (as Delta) | Yes (as Delta) | Yes (as Iceberg) | No |
| Delta + Compatibility Mode | Yes (as Delta) | Yes | Yes (as Iceberg) | No |
Reference Files
| File | Summary | Keywords |
|---|---|---|
| 1-managed-iceberg-tables.md | Creating and managing native Iceberg tables — DDL, DML, Liquid Clustering, Predictive Optimization, Iceberg v3, limitations | CREATE TABLE USING ICEBERG, CTAS, MERGE, time travel, deletion vectors, VARIANT |
| 2-uniform-and-compatibility.md | Making Delta tables readable as Iceberg — UniForm for regular tables, Compatibility Mode for streaming tables and MVs | UniForm, universalFormat, Compatibility Mode, streaming tables, materialized views, SDP |
| 3-iceberg-rest-catalog.md | Exposing Databricks tables to external engines via the IRC endpoint — auth, credential vending, IP access lists | IRC, REST Catalog, credential vending, EXTERNAL USE SCHEMA, PAT, OAuth |
| 4-snowflake-interop.md | Bidirectional Snowflake-Databricks integration — catalog integration, foreign catalogs, vended credentials | Snowflake, catalog integration, external volume, vended credentials, REFRESH_INTERVAL_SECONDS |
| 5-external-engine-interop.md | Connecting PyIceberg, OSS Spark, AWS EMR, Apache Flink, and Kafka Connect via IRC | PyIceberg, OSS Spark, EMR, Flink, Kafka Connect, pyiceberg.yaml |
When to Use
- Creating a new Iceberg table → 1-managed-iceberg-tables.md
- Making an existing Delta table readable as Iceberg → 2-uniform-and-compatibility.md
- Making a streaming table or MV readable as Iceberg → 2-uniform-and-compatibility.md (Compatibility Mode section)
- Choosing between Managed Iceberg vs UniForm vs Compatibility Mode → decision table in 2-uniform-and-compatibility.md
- Exposing Databricks tables to external engines via REST API → 3-iceberg-rest-catalog.md
- Integrating Databricks with Snowflake (either direction) → 4-snowflake-interop.md
- Connecting PyIceberg, OSS Spark, Flink, EMR, or Kafka → 5-external-engine-interop.md
Common Issues
| Issue | Solution |
|---|---|
| No Change Data Feed (CDF) | CDF is not supported on managed Iceberg tables. Use Delta + UniForm if you need CDF. |
| UniForm async delay | Iceberg metadata generation is asynchronous. After a write, there may be a brief delay before external engines see the latest data. Check status with DESCRIBE EXTENDED table_name. |
| Compression codec change | Managed Iceberg tables use zstd compression by default (not snappy). Older Iceberg readers that don't support zstd will fail. Verify reader compatibility or set write.parquet.compression-codec to snappy. |
| Snowflake 1000-commit limit | Snowflake's Iceberg catalog integration can only see the last 1000 Iceberg commits. High-frequency writers must compact metadata or Snowflake will lose visibility of older data. |
| Deletion vectors with UniForm | UniForm requires deletion vectors to be disabled (delta.enableDeletionVectors = false). If your table has deletion vectors enabled, disable them before enabling UniForm. |
| No shallow clone for Iceberg | SHALLOW CLONE is not supported for Iceberg tables. Use DEEP CLONE or CREATE TABLE ... AS SELECT instead. |
| Version mismatch with external engines | Ensure external engines use an Iceberg library version compatible with the format version of your tables. Iceberg v3 tables require Iceberg library 1.9.0+. |
Related Skills
- databricks-unity-catalog — catalog/schema management, governance, system tables
- databricks-spark-declarative-pipelines — SDP pipelines (streaming tables, materialized views with Compatibility Mode)
- databricks-python-sdk — Python SDK and REST API for Databricks operations
- databricks-dbsql — SQL warehouse features, query patterns
Resources
- Iceberg Overview — main hub for Iceberg on Databricks
- UniForm — Delta Universal Format
- Iceberg REST Catalog — IRC endpoint and external engine access
- Compatibility Mode — UniForm for streaming tables and MVs
- Iceberg v3 — next-gen format features (Beta)
- Foreign Tables — reading external catalog data
Weekly Installs
5
Repository
databricks-solu…-dev-kitGitHub Stars
900
First Seen
Feb 27, 2026
Security Audits
Installed on
mcpjam5
claude-code5
replit5
junie5
windsurf5
zencoder5