metric-views-patterns
Metric Views Patterns for Genie & AI/BI
Overview
Metric Views provide a semantic layer for natural language queries via Genie and AI/BI dashboards. This skill standardizes the YAML structure for comprehensive, LLM-friendly metric definitions following Databricks Metric View Specification v1.1.
Predecessor: Gold tables must exist before creating metric views. Use gold-layer-design + gold-layer-setup skills first.
Key Capabilities:
- Create metric views with proper SQL syntax (
WITH METRICS LANGUAGE YAML) - Validate schemas before deployment to prevent 100% of common errors
- Structure joins (direct and snowflake schema patterns)
- Optimize comments for Genie natural language queries
- Handle SCD2 dimensions with proper
is_currentfiltering
When to Use This Skill
Use this skill when:
- Creating new metric views for Genie Spaces
- Troubleshooting metric view creation errors
- Validating schema references before deployment
- Implementing joins (including transitive relationships)
- Optimizing metric views for Genie natural language queries
- Ensuring compliance with v1.1 specification
- Following the requirements gathering template to design metric views
Prerequisites
⚠️ MANDATORY: Complete these before creating metric views:
- Gold layer tables exist in Unity Catalog (use
gold-layer-design+gold-layer-setupskills) - Gold layer YAML schemas exist in
gold_layer_design/yaml/(for validation script) - Serverless SQL warehouse available (for metric view creation and querying)
- Databricks Runtime 17.2+ (required for YAML version 1.1)
MCP Tools (from upstream databricks-metric-views)
The manage_metric_views MCP tool supports all metric view operations:
| Action | Description |
|---|---|
create |
Create a metric view with dimensions and measures |
alter |
Update a metric view's YAML definition |
describe |
Get the full definition and metadata |
query |
Query measures grouped by dimensions |
drop |
Drop a metric view |
grant |
Grant SELECT privileges to users/groups |
Quick Start (2 hours)
What You'll Create:
metric_views/{view_name}.yaml— Semantic definitions (dimensions, measures, joins, formats)create_metric_views.py— Script reads YAML, creates views withWITH METRICS LANGUAGE YAMLmetric_views_job.yml— Asset Bundle job for deployment
Deploy: databricks bundle deploy -t dev && databricks bundle run metric_views_job -t dev
Critical Rules
⚠️ CRITICAL: Correct SQL Syntax
Metric views MUST be created using WITH METRICS LANGUAGE YAML syntax:
create_sql = f"""
CREATE OR REPLACE VIEW {fully_qualified_name}
WITH METRICS
LANGUAGE YAML
COMMENT '{view_comment_escaped}'
AS $$
{yaml_str}
$$
"""
Key Requirements:
WITH METRICS— Identifies the view as a metric viewLANGUAGE YAML— Specifies YAML formatAS $$ ... $$— YAML content wrapped in dollar-quote delimiters- No SELECT statement — The YAML definition IS the view definition
versionfield — Must be included in each metric view YAML
❌ WRONG: Regular view with TBLPROPERTIES (creates regular VIEW, not METRIC_VIEW)
⚠️ CRITICAL: v1.1 Unsupported Fields
These fields will cause errors and MUST NOT be used:
| Field | Error | Action |
|---|---|---|
name |
Unrecognized field "name" |
❌ NEVER include — name is in CREATE VIEW statement |
time_dimension |
Unrecognized field "time_dimension" |
❌ Remove entirely |
window_measures |
Unrecognized field "window_measures" |
❌ Remove top-level window_measures: array. Individual measure window: property is Experimental (v0.1 only, DBR 16.4-17.1). See references/composability-patterns.md for details. |
join_type |
Unsupported | ❌ Remove — defaults to LEFT OUTER JOIN |
table (in joins) |
Missing required creator property 'source' |
✅ Use source instead |
⚠️ MANDATORY: Pre-Creation Schema Validation
ALWAYS validate schemas BEFORE creating metric view YAML. 100% of deployment failures are preventable schema issues.
Schema Validation Checklist:
- Verified source table schema (ran DESCRIBE TABLE or checked YAML)
- Verified all joined table schemas
- Created column reference checklist for all tables
- Validated every dimension
exprcolumn exists - Validated every measure
exprcolumn exists - Validated join key columns exist in both tables
- Verified no transitive joins (all joins are source → table)
- For COUNT measures, verified primary key column exists
- For SCD2 joins, verified
is_currentcolumn exists
See references/validation-checklist.md for detailed validation steps.
⚠️ CRITICAL: Source Table Selection
Rule: Revenue/bookings/transactions → FACT table. Property/host counts → DIMENSION table.
❌ WRONG: Revenue from dimension table (under-reports by 4x)
source: ${catalog}.${schema}.dim_property # ❌ Wrong for revenue!
✅ CORRECT: Revenue from fact table
source: ${catalog}.${schema}.fact_booking_daily # ✅ Correct for revenue!
⚠️ CRITICAL: Transitive Join Limitations
Metric Views DO NOT support transitive/chained joins (where join B's on clause references join A instead of source).
How to detect: If ANY join's on clause references a join alias (not source), it is transitive and will fail.
❌ WRONG: Transitive join (join B references join A)
joins:
- name: dim_property # Join A
source: catalog.schema.dim_property
'on': source.property_id = dim_property.property_id
- name: dim_destination # Join B
source: catalog.schema.dim_destination
'on': dim_property.destination_id = dim_destination.destination_id # ❌ References dim_property!
✅ FIX 1 (Preferred — simplest): Use denormalized columns from existing dimension
If dim_property already has destination_name and destination_country, reference them directly — no second join needed:
dimensions:
- name: destination_name
expr: dim_property.destination_name # ✅ Already in dim_property
- name: destination_country
expr: dim_property.destination_country
✅ FIX 2: Snowflake schema (nested joins) — requires DBR 17.1+
joins:
- name: dim_property
source: catalog.schema.dim_property
'on': source.property_id = dim_property.property_id
joins: # ✅ Nested under dim_property — snowflake schema
- name: dim_destination
source: catalog.schema.dim_destination
'on': dim_property.destination_id = dim_destination.destination_id
Validation gate: Before generating YAML, inspect all join on clauses. If the left side of any on references a join name (not source), restructure as nested joins or use denormalized columns.
See references/advanced-patterns.md for additional snowflake schema examples.
Implementation Workflow
Phase 1: Design (30 min)
Read: references/requirements-template.md
- Identify fact table as primary source
- List dimensions to join (2-5 dimension tables)
- Define key measures (5-10 measures with aggregation type)
- List common user questions (guides synonym creation)
- Map synonyms for each dimension and measure (3-5 each)
Phase 2: YAML Creation (1 hour)
Read: references/yaml-reference.md and references/advanced-patterns.md
- Create one YAML file per metric view (filename = view name)
- Define
sourcetable (fully qualified with${catalog}and${gold_schema}placeholders) - Add
joinswithname,source,'on'(includeis_current = truefor SCD2) - Define dimensions with correct prefix (
source.or{join_name}.)- Business-friendly comments and display names
- 3-10 synonyms each (max 10 per field, max 255 chars each)
- Define measures with correct aggregation (SUM, AVG, COUNT)
- Proper formatting (currency, number, percentage)
- Comprehensive comments for Genie
- 3-10 synonyms each (max 10 per field, max 255 chars each)
Phase 3: Script & Bundle (30 min)
Read: references/implementation-workflow.md
- Validate YAML with
scripts/validate_metric_view.py - Use
scripts/create_metric_views.pyfor deployment - Configure Asset Bundle job (see
assets/templates/metric-views-job-template.yml) - Add YAML file sync to
databricks.yml
Phase 4: Deploy & Test (30 min)
Read: references/validation-queries.md
- Deploy:
databricks bundle deploy -t dev - Run:
databricks bundle run metric_views_job -t dev - Verify:
DESCRIBE EXTENDEDshows Type: METRIC_VIEW - Test:
SELECT ... MEASURE(\Total Revenue`) ... GROUP BY ...` - Test with Genie: Ask natural language questions
Quick Reference
YAML Structure (v1.1)
version: "1.1"
comment: >
PURPOSE: [One-line description]
BEST FOR: [Question 1] | [Question 2] | [Question 3]
NOT FOR: [What to avoid] (use [correct_asset] instead)
DIMENSIONS: [dim1], [dim2], [dim3]
MEASURES: [measure1], [measure2], [measure3]
SOURCE: [fact_table] ([domain] domain)
JOINS: [dim_table1] ([description])
NOTE: [Critical caveats]
source: ${catalog}.${gold_schema}.<fact_table>
filter: <sql_boolean_expression> # Optional: WHERE clause applied to all queries
joins:
- name: <dim_table_alias>
source: ${catalog}.${gold_schema}.<dim_table>
'on': source.<fk> = <dim_table_alias>.<pk> AND <dim_table_alias>.is_current = true
dimensions:
- name: <dimension_name>
expr: source.<column>
comment: <Business description>
display_name: <User-Friendly Name>
synonyms: [<alt1>, <alt2>]
measures:
- name: <measure_name>
expr: SUM(source.<column>)
comment: <Business description>
display_name: <User-Friendly Name>
format:
type: currency|number|percentage
currency_code: USD
decimal_places:
type: exact|all
places: 2
synonyms: [<alt1>, <alt2>]
Valid format types (exhaustive):
| Type | Use For | Common Mistake |
|---|---|---|
byte |
Data sizes (storage, memory) | — |
currency |
Monetary values (revenue, cost) | — |
date |
Date-only values | — |
date_time |
Timestamp values | — |
number |
Counts, averages, decimals, integers | ❌ decimal, ❌ integer |
percentage |
Ratios, rates, percentages | ❌ percent |
⚠️ percent is NOT valid (use percentage). decimal is NOT valid (use number).
Column References
- Main table columns: Use
source.prefix in allexprfields - Joined table columns: Use join
nameas prefix (e.g.,dim_store.column_name) - Never reference table names directly: Use
source.or{join_name}.
Join Requirements
- Each join MUST have
name,source, and either'on'orusing ONclause: boolean expression usingsource.for main table, join name for joined table (quote the key:'on')USINGclause: array of column names shared between source and join table- Each first-level join must reference
source(NOT another join alias — that's transitive) - For transitive relationships, use nested
joins:(snowflake schema, DBR 17.1+) or denormalized columns - SCD2 joins must include
AND {dim_table}.is_current = true - MAP type columns are NOT supported in joined tables
Core Patterns
Composability (MEASURE Function)
Metric views support composability — building complex metrics by referencing simpler measures via the MEASURE() function. Define atomic measures first, then compose derived KPIs:
measures:
- name: total_revenue
expr: SUM(source.net_revenue)
- name: order_count
expr: COUNT(source.order_id)
- name: avg_order_value
expr: MEASURE(total_revenue) / MEASURE(order_count) # ✅ Composed measure
Measure-level filtering with FILTER clause:
- name: fulfilled_orders
expr: COUNT(1) FILTER (WHERE source.order_status = 'F')
- name: fulfillment_rate
expr: MEASURE(fulfilled_orders) / MEASURE(order_count)
format:
type: percentage
Best practices: Define atomic measures (SUM, COUNT, AVG) first; always use MEASURE() to reference other measures (never repeat the aggregation logic).
See references/composability-patterns.md for full guide including conditional logic, window measures (Experimental), and complete examples.
Standardized Comment Format (v3.0)
Use structured format for Genie optimization:
comment: >
PURPOSE: Comprehensive cost analytics for Databricks billing and usage analysis.
BEST FOR: Total spend by workspace | Cost trend over time | SKU cost breakdown
NOT FOR: Commit/contract tracking (use commit_tracking) | Real-time cost alerts
DIMENSIONS: usage_date, workspace_name, sku_name, owner, tag_team
MEASURES: total_cost, total_dbus, cost_7d, cost_30d
SOURCE: fact_usage (billing domain)
JOINS: dim_workspace (workspace details), dim_sku (SKU details)
NOTE: Cost values are list prices. Actual billed amounts may differ.
Dimension & Measure Patterns
See references/advanced-patterns.md for complete dimension patterns (geographic, product, time), measure patterns (revenue, count, percentage), and a full worked retail example.
Common Mistakes to Avoid
Top 5 mistakes (with paired wrong/correct examples): wrong syntax (TBLPROPERTIES), unsupported fields (time_dimension, window_measures), wrong column references, including name in YAML, transitive joins.
See references/advanced-patterns.md for detailed wrong/correct code examples for each mistake.
Python Script Error Handling
Key rules: strip name before yaml.dump(), drop existing VIEW/TABLE before CREATE, track failures and raise RuntimeError, verify METRIC_VIEW type via DESCRIBE EXTENDED.
See scripts/create_metric_views.py for the full working script and references/implementation-workflow.md for the detailed error handling patterns.
Time Estimates
| Metric Views | Design | YAML Creation | Deploy & Test | Total |
|---|---|---|---|---|
| 1 view | 20 min | 30 min | 20 min | ~1 hour |
| 2-3 views | 30 min | 1 hour | 30 min | ~2 hours |
| 5+ views | 1 hour | 2 hours | 30 min | ~3.5 hours |
Reference Files
references/yaml-reference.md— Complete YAML fields, syntax, format optionsreferences/advanced-patterns.md— Dimension/measure patterns, joins, snowflake schema, worked examplesreferences/composability-patterns.md— MEASURE() function, FILTER clause, window measures (Experimental)references/validation-checklist.md— Pre-creation validation stepsreferences/requirements-template.md— Design template for dimensions, measures, joinsreferences/implementation-workflow.md— Step-by-step creation workflowreferences/validation-queries.md— SQL queries for deployment verification
Scripts & Assets
scripts/validate_metric_view.py— Pre-deployment column reference validationscripts/create_metric_views.py— YAML loading, parameter substitution, METRIC_VIEW verificationassets/templates/metric-view-template.yaml— Starter YAML templateassets/templates/metric-views-job-template.yml— Asset Bundle job template
Materialization (Experimental)
Metric views support optional materialization for pre-computed aggregations. Lakeflow Spark Declarative Pipelines orchestrates materialized views, and the query optimizer automatically routes queries to the best materialized view using aggregate-aware query rewriting.
materialization:
schedule: every 6 hours
mode: relaxed
materialized_views:
- name: baseline
type: unaggregated
- name: revenue_breakdown
type: aggregated
dimensions: [category, color]
measures: [total_revenue]
Materialized view types: unaggregated (full data) and aggregated (pre-computed for specific dimension/measure combinations). Check refresh status with DESCRIBE TABLE EXTENDED.
Requires serverless compute enabled. Currently experimental — use for high-query-volume metric views where pre-computation reduces latency.
Common Issues
| Issue | Solution |
|---|---|
SELECT * not supported |
Must explicitly list dimensions and use MEASURE() for measures |
| "Cannot resolve column" | Dimension/measure names with spaces need backtick quoting |
| JOIN at query time fails | Joins must be in the YAML definition, not in the SELECT query |
MEASURE() required |
All measure references must be wrapped: MEASURE(\name`)` |
| DBR version error | Requires Runtime 17.2+ for YAML v1.1, or 16.4+ for v0.1 |
| Materialization not working | Requires serverless compute enabled; currently experimental |
External References
Official Documentation
- Metric Views SQL Creation
- Metric Views YAML Syntax Reference
- Metric Views Joins
- Metric Views Semantic Metadata
- Composability in Metric Views
- Window Measures (Experimental)
- Materialization for Metric Views
Related Skills
databricks-table-valued-functions— TVF patterns for Geniegenie-space-patterns— Genie Space setupdatabricks-aibi-dashboards— AI/BI dashboard patterns
Version History
- v5.0 (Feb 2026) — Expanded transitive joins with inline fixes; exhaustive format type table (6 types); composability (MEASURE function) patterns; FILTER clause; USING join clause; filter top-level field; window measures clarification (Experimental v0.1); materialization expansion; progressive disclosure restructure (Notes to Carry Forward + Next Step); 7 upstream_sources from official docs; new composability-patterns.md reference
- v4.0 (Feb 2026) — Merged prompt content: Quick Start, implementation workflow, requirements template, creation script, validation queries, worked examples, common mistakes with paired examples
- v3.0 (Dec 19, 2025) — Standardized structured comment format
- v2.0 (Dec 16, 2025) — Genie optimization patterns from production post-mortem
- v1.0 (Oct 2025) — Initial rule based on metric view deployment learnings
Metric Views Notes to Carry Forward
After completing metric view creation, carry these notes to the next worker:
- Metric View names and paths: List of all created MVs with YAML file paths
- Grain per view: Which fact table sources each MV
- Measure counts: Number of dimensions and measures per MV
- Validation status: Which MVs passed schema validation, any unresolved issues
- Composability notes: Any composed measures using MEASURE() that downstream workers should know about
Next Step
After metric views are deployed and validated, proceed to:
semantic-layer/02-databricks-table-valued-functions/SKILL.md — Create TVFs for Genie Spaces using the Gold tables referenced by your metric views.