anomaly-detection
Overview
Anomaly Detection is a schema-level monitoring capability that automatically assesses data quality by evaluating freshness (is the table up-to-date?) and completeness (did we get the expected number of rows?). It uses ML models built from historical patterns — no custom metric definitions needed.
API Status: Public Preview (POST /api/data-quality/v1/monitors)
SDK Module: databricks.sdk.service.dataquality
When to Use This Skill vs. Lakehouse Monitoring
| Question | Use This Skill | Use Lakehouse Monitoring |
|---|---|---|
| "Did my pipeline break?" | Yes — detects stale/incomplete tables | No |
| "Is my revenue trending correctly?" | No | Yes — custom AGGREGATE/DERIVED/DRIFT metrics |
| "Which tables are unhealthy?" | Yes — schema-wide scan | No (per-table only) |
| "What's the average transaction value?" | No | Yes — custom business KPIs |
| "How quickly can I set up monitoring?" | Yes — minutes (enable on schema) | 2+ hours (custom metrics per table) |
| "Do I need ML model monitoring?" | No | Yes — InferenceLogConfig support |
Use both together for comprehensive monitoring: anomaly detection for baseline reliability + custom metrics for business KPIs.
Core Concepts
Freshness
Analyzes commit history to build a per-table model predicting next commit time. If a commit is unusually late, the table is marked stale.
Completeness
Analyzes historical row counts to predict expected rows in the last 24 hours. If actual rows fall below the lower bound, the table is marked incomplete.
Intelligent Scanning
Automatically aligns scan frequency with table update cadence. High-impact tables (by popularity and downstream usage) are prioritized; less critical tables are scanned less frequently.
Results Storage
All results are stored in the system table:
system.data_quality_monitoring.table_results
Access: Only account admins by default. Grant access to others as needed.
Quick Start
Enable via UI (Simplest)
- Navigate to schema in Catalog Explorer
- Click Details tab
- Click Enable under Data Quality Monitoring
- Click Save
Enable via SDK
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.dataquality import Monitor, AnomalyDetectionConfig
w = WorkspaceClient()
# Get schema UUID (required — NOT the three-level name)
schema_info = w.schemas.get(full_name=f"{catalog}.{schema}")
schema_id = schema_info.schema_id
# Enable anomaly detection
monitor = w.data_quality.create_monitor(
monitor=Monitor(
object_type="schema",
object_id=schema_id,
anomaly_detection_config=AnomalyDetectionConfig(
excluded_table_full_names=[
f"{catalog}.{schema}.staging_table", # Optional: tables to skip
]
)
)
)
Disable via SDK
w.data_quality.delete_monitor(
object_type="schema",
object_id=schema_id
)
Warning: Disabling deletes ALL anomaly detection tables and information. This cannot be undone.
Critical Rules
Rule 1: create_monitor() Takes a Monitor Object with UUID
# ✅ CORRECT: Wrap in Monitor object, use schema UUID
schema_info = w.schemas.get(full_name=f"{catalog}.{schema}")
w.data_quality.create_monitor(
monitor=Monitor(object_type="schema", object_id=schema_info.schema_id, ...)
)
# ❌ WRONG: Three-level name, keyword args instead of Monitor object
w.data_quality.create_monitor(object_id=f"{catalog}.{schema}", ...)
Rule 2: Permissions Required
- MANAGE SCHEMA or MANAGE CATALOG privileges on the target schema
- To query results: SELECT on
system.data_quality_monitoring.table_results
Rule 3: System Table Access Is Restricted
The results table contains data from ALL catalogs in the metastore. Use caution when granting access — it includes sample values from every monitored table.
Rule 4: Anomaly Detection Does NOT Monitor Views
Only tables are scanned. Views are silently skipped.
Rule 5: Completeness Only Checks Row Counts
Completeness does NOT evaluate null fractions, zero values, or NaN. For column-level quality monitoring, use the lakehouse-monitoring-comprehensive skill with custom metrics.
Configuration Options
Exclude Specific Tables
AnomalyDetectionConfig(
excluded_table_full_names=[
f"{catalog}.{schema}.staging_raw",
f"{catalog}.{schema}.temp_processing",
]
)
Advanced Configuration (via REST API)
For freshness/completeness threshold overrides and table-specific settings, see references/configuration-guide.md.
Querying Results
-- Find all unhealthy tables in a schema
SELECT
table_name,
status,
freshness.status as freshness_status,
completeness.status as completeness_status,
downstream_impact.impact_level,
downstream_impact.num_downstream_tables
FROM system.data_quality_monitoring.table_results
WHERE catalog_name = '{catalog}'
AND schema_name = '{schema}'
AND status = 'Unhealthy'
ORDER BY event_time DESC;
For the complete results schema and query patterns, see references/results-schema.md.
Setting Up Alerts
-- Alert on unhealthy tables with downstream impact
SELECT
CONCAT(catalog_name, '.', schema_name, '.', table_name) AS full_table_name,
status,
downstream_impact.num_queries_on_affected_tables AS impacted_queries
FROM system.data_quality_monitoring.table_results
WHERE event_time >= current_timestamp() - INTERVAL 6 HOURS
AND status = 'Unhealthy'
AND downstream_impact.num_queries_on_affected_tables > :min_tables_affected;
For complete alert templates including custom email notifications, see references/alert-patterns.md.
Integration with Silver & Gold Layer Setup
This skill is automatically invoked by silver-layer-setup and gold-layer-setup orchestrators. Every new Silver or Gold schema should have anomaly detection enabled to provide baseline freshness/completeness monitoring from day one.
Reusable Enable Function
Use this pattern in setup scripts to enable anomaly detection as part of schema creation:
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.dataquality import Monitor, AnomalyDetectionConfig
def enable_anomaly_detection_on_schema(
w: WorkspaceClient,
catalog: str,
schema: str,
excluded_tables: list = None,
):
"""
Enable anomaly detection on a schema. Safe to call if already enabled.
Called by Silver and Gold layer setup workflows to ensure every schema
has baseline freshness/completeness monitoring.
Args:
w: WorkspaceClient
catalog: Catalog name
schema: Schema name
excluded_tables: Optional list of table names to exclude
(e.g., staging, quarantine, temp tables)
"""
schema_info = w.schemas.get(full_name=f"{catalog}.{schema}")
schema_id = schema_info.schema_id
excluded_full_names = None
if excluded_tables:
excluded_full_names = [
f"{catalog}.{schema}.{t}" for t in excluded_tables
]
try:
monitor = w.data_quality.create_monitor(
monitor=Monitor(
object_type="schema",
object_id=schema_id,
anomaly_detection_config=AnomalyDetectionConfig(
excluded_table_full_names=excluded_full_names
)
)
)
print(f"✓ Anomaly detection enabled on {catalog}.{schema}")
return monitor
except Exception as e:
if "already exists" in str(e).lower():
print(f"✓ Anomaly detection already enabled on {catalog}.{schema} (skipping)")
else:
print(f"⚠️ Could not enable anomaly detection on {catalog}.{schema}: {e}")
print(" (Non-blocking — schema setup continues)")
When Invoked by Silver Layer Setup
# After Silver schema creation and DLT pipeline deployment
enable_anomaly_detection_on_schema(
w, catalog, silver_schema,
excluded_tables=["dq_rules"] # Exclude metadata tables
)
When Invoked by Gold Layer Setup
# After Gold tables are created via setup_tables.py
enable_anomaly_detection_on_schema(
w, catalog, gold_schema,
excluded_tables=None # Monitor all Gold tables
)
Why Enable at Schema Setup Time?
- Freshness baselines start building immediately — ML models need historical commit patterns
- Completeness models train earlier — more scan cycles = more accurate predictions
- Zero cost to enable — no custom metrics needed, runs automatically
- Catches pipeline breaks early — stale/incomplete tables detected before anyone queries them
- Complements custom metrics — anomaly detection for reliability, Lakehouse Monitoring for business KPIs
Workflow
| Phase | Duration | Activities |
|---|---|---|
| Phase 1: Enable | 5 min | Enable anomaly detection on schema via UI or SDK |
| Phase 2: Wait | 15-30 min | First scan runs automatically |
| Phase 3: Review | 10 min | Check results in Catalog Explorer or system table |
| Phase 4: Alerts | 15 min | Set up SQL alerts on system table |
Reference Files
configuration-guide.md
Complete configuration reference including:
- REST API endpoint and request schema
- SDK-based enable/disable patterns
excluded_table_full_namesconfiguration- Advanced threshold overrides (freshness, completeness)
- Table skip/scan lists
- Event timestamp column configuration
results-schema.md
System results table documentation including:
- Full schema of
system.data_quality_monitoring.table_results - Nested struct documentation (freshness, completeness, downstream_impact, root_cause_analysis)
- Query patterns for common scenarios
- Metastore-level dashboard template reference
alert-patterns.md
SQL alert configuration including:
- Alert query template with downstream impact filtering
- Custom email notification template
- Alert configuration steps
- Integration with Databricks SQL Alerts
Scripts
enable_anomaly_detection.py
SDK-based anomaly detection management including:
- Enable on schema with optional table exclusions
- Disable (with warning about data deletion)
- List monitored schemas
- Check monitor status
query_results.py
Results querying utilities including:
- Query unhealthy tables
- Extract root cause analysis
- Filter by freshness/completeness status
- Generate summary reports
Assets
alert-query-template.sql
Ready-to-use SQL query for Databricks SQL Alerts with configurable thresholds and custom email template.
Limitations
- Does NOT support views (only tables)
- Completeness does not evaluate null fractions, zero values, or NaN
- System table access is restricted to account admins by default
- Disabling deletes all monitoring data (irreversible)
- Event freshness (based on event time columns) is not supported in the current version
Troubleshooting
Monitor Not Detecting Anomalies
- Check permissions: Ensure MANAGE SCHEMA or MANAGE CATALOG
- Wait for scan: First scan may take 15-30 minutes
- Check table activity: Tables that rarely change may be classified as "static"
Cannot Access Results Table
- Default access: Only account admins can read
system.data_quality_monitoring.table_results - Grant access: Account admin must grant SELECT on the system table
Tables Showing as "Training"
- Normal behavior: New tables need historical data to build ML models
- Wait: Models improve after several scan cycles
References
Summary
Anomaly Detection provides automated, schema-level monitoring for table freshness and completeness. It complements the lakehouse-monitoring-comprehensive skill's custom business metrics with baseline data reliability monitoring. Enable it in minutes via UI or SDK, query results from the system table, and set up SQL alerts for proactive notification.