databricks-unity-catalog
Databricks Unity Catalog Management
Discover, create, and manage Unity Catalog resources (catalogs, schemas, tables) via MCP tools. Enforces best practices for naming, governance, and metadata.
When to Use This Skill
- Exploring available catalogs and schemas
- Creating schemas for new projects
- Discovering tables and their metadata
- Setting up data governance (owners, comments)
- Managing table lifecycle (create, update, delete)
- Validating catalog structure before pipeline deployment
Core Concepts
Unity Catalog Hierarchy
Catalog (highest level)
└── Schema (database)
└── Table (data asset)
├── Columns
├── Properties
└── Permissions
Resource Naming
Three-level namespace: catalog.schema.table
Example: ml_production.feature_store.customer_features
Discovery Workflows
Workflow 1: Explore Catalog Structure
Discover what's available in Unity Catalog.
Pattern:
- Use
list_catalogsMCP tool → Get all catalogs - Use
get_catalog→ Get details for specific catalog - Use
list_schemas→ Get schemas in catalog - Use
list_tables→ Get tables in schema - Use
get_table→ Get table details including columns
Example:
# User: "What tables are available in my catalog?"
# Step 1: List all catalogs
list_catalogs()
# Returns: [{name: "main", name: "ml_dev", name: "ml_prod", ...}]
# Step 2: List schemas in a catalog
list_schemas(catalog_name="ml_dev")
# Returns: [{name: "bronze", name: "silver", name: "gold", ...}]
# Step 3: List tables in a schema
list_tables(catalog_name="ml_dev", schema_name="silver")
# Returns: [{name: "clean_events", name: "customer_features", ...}]
# Step 4: Get table details
get_table(full_table_name="ml_dev.silver.clean_events")
# Returns: {
# name: "clean_events",
# columns: [{name: "event_id", type: "string"}, ...],
# table_type: "MANAGED",
# owner: "user@company.com",
# comment: "Cleaned event data from bronze layer"
# }
Workflow 2: Validate Table Schema
Check table structure before using in pipeline.
Pattern:
- Use
get_tablewith full table name - Examine columns array
- Check data types
- Verify required columns exist
Example:
# User: "Check if the customer table has email column"
get_table(full_table_name="ml_prod.gold.customers")
# Returns table metadata including:
# columns: [
# {name: "customer_id", type: "bigint"},
# {name: "email", type: "string"},
# {name: "created_date", type: "date"}
# ]
# Verify: Yes, email column exists with type string
Schema Management
Workflow 3: Create Schema for New Project
Set up schema with proper naming and metadata.
Pattern:
- Verify catalog exists with
get_catalog - Create schema with
create_schema:- Use lowercase with underscores
- Add descriptive comment
- System sets owner automatically
- Verify creation with
get_schema
Example:
# User: "Create a schema for my ML feature engineering project"
# Step 1: Verify catalog exists
get_catalog(catalog_name="ml_dev")
# Returns: {name: "ml_dev", owner: "...", ...}
# Step 2: Create schema
create_schema(
catalog_name="ml_dev",
schema_name="feature_engineering",
comment="Feature engineering workspace for customer churn prediction model. Contains raw features, engineered features, and training datasets."
)
# Returns: Schema created successfully
# Step 3: Verify
get_schema(full_schema_name="ml_dev.feature_engineering")
# Returns: {
# name: "feature_engineering",
# full_name: "ml_dev.feature_engineering",
# comment: "Feature engineering workspace for...",
# owner: "user@company.com"
# }
Workflow 4: Update Schema Metadata
Modify schema properties after creation.
Pattern:
- Use
update_schemawith full schema name - Provide new values for:
comment- Updated descriptionowner- New owner emailnew_name- Rename schema (optional)
Example:
# User: "Update the schema description"
update_schema(
full_schema_name="ml_dev.feature_engineering",
comment="UPDATED: Feature engineering workspace for customer churn prediction v2. Includes time-series features and demographic aggregations.",
owner="ml-team@company.com"
)
# Returns: Schema updated successfully
Table Management
Workflow 5: Get Table Information
Retrieve comprehensive table metadata.
Pattern:
- Use
get_tablewith full three-level name - Examine returned metadata:
- Columns and types
- Table type (MANAGED vs EXTERNAL)
- Owner and permissions
- Properties and comments
- Storage location
Example:
# User: "Show me the structure of the features table"
get_table(full_table_name="ml_prod.features.customer_360")
# Returns: {
# name: "customer_360",
# catalog_name: "ml_prod",
# schema_name: "features",
# table_type: "MANAGED",
# columns: [
# {name: "customer_id", type_name: "BIGINT", comment: "Primary key"},
# {name: "lifetime_value", type_name: "DOUBLE", comment: "Total spend"},
# {name: "churn_score", type_name: "DOUBLE", comment: "0-1 churn probability"},
# {name: "last_updated", type_name: "TIMESTAMP", comment: "Feature refresh timestamp"}
# ],
# owner: "ml-team@company.com",
# comment: "360-degree customer features for ML models"
# }
Workflow 6: Delete Table
Remove table from Unity Catalog.
Pattern:
- Confirm with user before deleting
- Use
delete_tablewith full table name - For MANAGED tables: data is deleted too
- For EXTERNAL tables: only metadata removed
Example:
# User: "Delete the old test table"
# IMPORTANT: Always confirm deletions
# Ask user: "Are you sure you want to delete ml_dev.test.old_data? This cannot be undone."
# If confirmed:
delete_table(full_table_name="ml_dev.test.old_data")
# Returns: Table deleted successfully
Workflow 7: Delete Schema
Remove schema and all its tables.
Pattern:
- Confirm with user (DANGEROUS operation)
- List tables in schema first to show what will be deleted
- Use
delete_schemawith full schema name - All tables in schema are deleted too
Example:
# User: "Delete the temp schema"
# IMPORTANT: List contents first
list_tables(catalog_name="ml_dev", schema_name="temp")
# Shows: 3 tables will be deleted
# Ask user: "Are you sure? This will delete the schema 'ml_dev.temp' and all 3 tables. This cannot be undone."
# If confirmed:
delete_schema(full_schema_name="ml_dev.temp")
# Returns: Schema deleted successfully
Naming Conventions
Required Naming Rules
DO:
- Use lowercase letters only
- Use underscores to separate words
- Use descriptive names
- Examples:
ml_productionfeature_storecustomer_featuresdaily_aggregates
DON'T:
- Use hyphens:
ml-production❌ - Use spaces:
ml production❌ - Use special characters:
ml_prod!❌ - Use camelCase:
mlProduction❌ - Use numbers at start:
2024_features❌
Recommended Naming Patterns
Catalogs:
<domain>_<environment>:ml_dev,ml_prod,analytics_staging<team>_<purpose>:data_eng_pipelines,ml_models
Schemas:
- Medallion layers:
bronze,silver,gold - By function:
feature_store,models,metrics - By project:
churn_prediction,recommendation_engine
Tables:
- Include domain:
customer_features,order_metrics - Include grain:
daily_sales,hourly_events - Include status:
active_users,archived_orders
Best Practices
1. Always Add Comments
Every schema and table should have a descriptive comment explaining:
- What data it contains
- Who owns/maintains it
- When/how it's updated
- Related upstream/downstream dependencies
# Good
create_schema(
catalog_name="ml_dev",
schema_name="feature_engineering",
comment="Feature engineering workspace for churn model v2. Updated daily at 2am UTC. Owned by ML team."
)
# Bad - no comment
create_schema(
catalog_name="ml_dev",
schema_name="feature_engineering"
)
2. Use MANAGED Tables by Default
Unless you have external data in cloud storage:
- Use MANAGED tables (Databricks manages storage)
- Data deleted when table dropped
- Simpler governance
Use EXTERNAL only when:
- Data shared across multiple systems
- Need independent data lifecycle
- Existing data in cloud storage
3. Set Proper Owners
- Individual for personal dev:
user@company.com - Team/group for shared:
ml-team@company.com - Enables proper governance and access control
4. Verify Before Using
Always verify structure before using tables:
# Before writing pipeline
table_info = get_table(full_table_name="catalog.schema.table")
# Check required columns exist
required_columns = {"customer_id", "purchase_date", "amount"}
actual_columns = {col["name"] for col in table_info["columns"]}
if not required_columns.issubset(actual_columns):
missing = required_columns - actual_columns
print(f"ERROR: Missing required columns: {missing}")
5. Use Three-Level Names
Always use full names for clarity:
# Good - explicit
get_table(full_table_name="ml_prod.features.customer_360")
# Bad - ambiguous
# "features" table in which catalog/schema?
Integration with Other Skills
Called By
databricks-ml-pipeline- Creates schemas for models and featuresdatabricks-data-engineering- Creates medallion architecture schemas (bronze, silver, gold)
Works With
databricks-testing- Tests against discovered tables
Common Patterns
Pattern: Set Up Medallion Architecture
# Create bronze, silver, gold schemas for data pipeline
catalog = "de_prod"
# Bronze layer - raw data
create_schema(
catalog_name=catalog,
schema_name="bronze",
comment="Raw ingested data with minimal transformation. Append-only. Retained for 90 days."
)
# Silver layer - cleaned data
create_schema(
catalog_name=catalog,
schema_name="silver",
comment="Cleaned and validated data. Deduplicated, type-cast, quality-checked. Retained for 1 year."
)
# Gold layer - business aggregates
create_schema(
catalog_name=catalog,
schema_name="gold",
comment="Business-ready aggregates and metrics. Optimized for analytics queries. Retained for 3 years."
)
Pattern: Audit Schema Contents
# User: "Show me all tables in the feature store"
# Get schema details
schema_info = get_schema(full_schema_name="ml_prod.feature_store")
print(f"Schema: {schema_info['full_name']}")
print(f"Owner: {schema_info['owner']}")
print(f"Comment: {schema_info['comment']}")
# List all tables
tables = list_tables(catalog_name="ml_prod", schema_name="feature_store")
print(f"\nTables ({len(tables)}):")
for table in tables:
# Get details for each
table_info = get_table(full_table_name=f"ml_prod.feature_store.{table['name']}")
print(f" - {table_info['name']}")
print(f" Type: {table_info['table_type']}")
print(f" Columns: {len(table_info['columns'])}")
print(f" Comment: {table_info.get('comment', 'No description')}")
Troubleshooting
Issue: Catalog Not Found
Error: "Catalog 'xyz' does not exist"
Solution:
- Use
list_catalogs()to see available catalogs - Check spelling and case (must be lowercase)
- Verify you have access to catalog
Issue: Schema Already Exists
Error: "Schema 'catalog.schema' already exists"
Solution:
- Use
get_schema()to check if exists - Use
update_schema()to modify instead of create - Use different name or delete existing first
Issue: Invalid Name
Error: "Invalid name: contains invalid characters"
Solution:
- Use only lowercase letters, numbers, underscores
- No hyphens, spaces, or special characters
- Don't start with numbers
Security Reminders
- Never grant overly broad permissions
- Use least-privilege access
- Set appropriate owners for governance
- Document who has access in comments
Summary
This skill manages Unity Catalog resources:
- Discovery: Explore catalogs, schemas, tables
- Creation: Set up schemas with proper naming and governance
- Updates: Modify metadata, owners, comments
- Deletion: Remove schemas/tables (with confirmation)
- Best practices: Naming conventions, comments, MANAGED tables, proper ownership
Use this skill to set up catalog structure before building pipelines with other skills.
More from databricks-solutions/databricks-exec-code-mcp
databricks-data-engineering
Production data engineering pipelines following medallion architecture (Bronze/Silver/Gold layers) with data ingestion, transformation, quality checks, Delta Lake optimization, and orchestration. Use when building ETL pipelines, medallion architecture, data lakes, or data transformation workflows.
2databricks-bundle-deploy
Package and deploy Databricks Asset Bundles with proper parameterization, multi-environment support, and serverless compute. Handles project structure, databricks.yml generation, validation, and deployment. Use when packaging tested code for production, deploying pipelines, or managing multi-environment deployments.
1databricks-ml-pipeline
End-to-end machine learning pipelines on Databricks including data exploration, feature engineering, model training with hyperparameter optimization, MLflow experiment tracking, model registration to Unity Catalog, and deployment as DABs. Use when building ML workflows, training models, or deploying ML pipelines.
1