metabase-semantic-checker
Metabase semantic checker
The semantic checker validates a tree of Metabase Representation Format YAML files for referential integrity. Schema-level validation (shape of each file, required fields, enum values) is handled separately by npx @metabase/representations validate-schema; the semantic checker runs after schema validation and focuses on cross-file and cross-system consistency.
It compiles every MBQL query down to SQL against the database metadata and checks that each entity reference and each column reference resolves to something that actually exists. Concretely, it answers:
- Does every
collection_id,parent_id,dashboard_id,document_id,based_on_card_id, transform tag, snippet name, etc. resolve to an entity that actually exists in the tree? - For each MBQL query, do every
source-table, field reference, join target, segment, measure, and expression resolve against the database schema? (Verified by compiling the query to SQL.) - For each native query, do the referenced tables, columns, and snippets exist?
- Do dashboards' and documents' embedded card references point at real cards?
Each run takes 1 minute or more — roughly a minute of fixed JVM + metadata-loading overhead before any checks start, plus query-compilation time that scales with the tree.
The checker ships inside the Metabase Enterprise JAR and is invoked via --mode checker. Default Docker image: metabase/metabase-enterprise:latest. Use metabase/metabase-enterprise-head:latest only when the user explicitly wants the in-development build — e.g. testing unreleased checker changes.
Inputs
Two inputs, both required:
- The representation tree — the repo root containing
collections/,databases/,transforms/,python_libraries/. This is what gets checked. - The database metadata — a JSON file exported from a Metabase instance. By default located at
.metadata/metadata.json. The checker uses it to resolve column/table references inside queries; without it, query-level checks cannot run.
If .metadata/metadata.json is missing, do not run the checker. Tell the user it needs to be exported from their Metabase instance first, and only run the checker once the metadata file is present on disk.
When to run
Do not run the semantic checker by default when making edits. It is slow (≥1 minute per run) and in most projects is wired up as a CI step that runs on every push or PR — that is where it belongs. Local runs are for targeted diagnosis, not routine validation.
Only run it locally when the user explicitly asks for one of these:
- verify that all entity references resolve (collections, dashboards, cards, snippets, transform tags, etc.), or
- verify that all column references in queries — MBQL or SQL — are correct.
Phrasings that count as an explicit ask: "semantic check", "check references", "validate queries against the schema", "make sure the columns still exist", or diagnosing a broken reference the user already suspects. A bare "run the checker" does not count — by default "the checker" means the fast schema checker (npx @metabase/representations validate-schema). Only wording that explicitly names references or queries should trigger the semantic checker.
Otherwise, skip it. After editing YAML, rely on npx @metabase/representations validate-schema for local feedback and leave the semantic check to CI. Do not run it proactively at session start, and do not run it as a self-imposed "finishing step" after edits unless the user asked for it.
If you do run it, batch. Make all the YAML changes first, then run the checker once. Each invocation pays the ≥1-minute fixed overhead; running between edits multiplies that cost. If it surfaces issues, fix everything you can see in one pass before re-running.
Running the checker
Once .metadata/metadata.json exists and Docker is available:
docker pull metabase/metabase-enterprise:latest
docker run --rm \
-v "$PWD:/workspace" \
--entrypoint "" \
-w /app \
metabase/metabase-enterprise:latest \
java -jar metabase.jar \
--mode checker \
--export /workspace \
--schema-dir /workspace/.metadata/metadata.json \
--schema-format concise
Flag reference:
--mode checker— selects semantic-check mode (skips server startup, import, etc.).--export /workspace— path inside the container to the representation tree root. With the-v "$PWD:/workspace"mount above, this maps to the current repo root on the host.--schema-dir /workspace/.metadata/metadata.json— path to the database metadata JSON. Despite the-dirsuffix the flag accepts a single JSON file. Point it elsewhere only if the user has stored metadata at a non-default path.--schema-format concise— format the input metadata is in.concisematches what a Metabase instance exports. Do not change unless the user explicitly has a different dump format.
The container needs no network access for the check itself — pull the image first if the host is offline-prone.
Exit code is non-zero on findings. Surface the checker's stdout/stderr verbatim to the user; do not summarize away specific paths or entity names, since those are how the user locates the broken reference.
Common failure modes
- "Database metadata not found" / schema load errors —
.metadata/metadata.jsonis missing, stale, or malformed. Ask the user to re-export it from their Metabase instance. - Unknown collection / card / dashboard / snippet / tag reference — the referenced
entity_idor name does not exist in the tree. Either the target YAML is missing, or the reference is a typo; grep the tree for the id/name to confirm which. - Unknown table or field inside a query — the query references a column that the database metadata doesn't know about. Either the warehouse schema has drifted (refetch metadata), or the query itself is wrong.
- Docker image missing / not pulled — run
docker pull metabase/metabase-enterprise:latestfirst. On slow networks warn the user; the image is multi-hundred-MB.
More from metabase/agent-skills
metabase-database-metadata
Understands the Metabase Database Metadata Format — a YAML-based on-disk representation of databases, tables, and fields synced from a Metabase instance. Use when the user needs to read, edit, or understand metadata files produced by `@metabase/database-metadata`, or when reasoning about a project's schema (columns, types, FK relationships) through the `.metadata/databases` folder.
67metabase-representation-format
Understands the Metabase Representation Format — a YAML-based serialization format for Metabase content (collections, cards, dashboards, documents, segments, measures, snippets, transforms). Use when the user needs to create, edit, understand, or validate Metabase representation YAML files, or when working with Metabase serialization/deserialization (serdes). Covers entity schemas, MBQL and native queries, visualization settings, parameters, and folder structure.
65metabase-modular-embedding-version-upgrade
Upgrades a project's Metabase Modular embedding SDK (@metabase/embedding-sdk-react) or Modular embedding (embed.js) version. Use when the user wants to upgrade their Metabase modular embedding integration to a newer version.
57metabase-embedding-sso-implementation
Implements JWT SSO authentication for Metabase embedding in a project. Supports all embedding types that use SSO — Modular embedding (embed.js web components), Modular embedding SDK (@metabase/embedding-sdk-react), and Full app embedding (iframe-based). Creates the JWT signing endpoint, configures the frontend auth layer, and sets up group mappings. Use when the user wants to add SSO/JWT auth to their Metabase embedding, implement user identity for embedded analytics, set up JWT authentication for Metabase, or connect their app's authentication to Metabase embedding.
47metabase-static-embedding-to-guest-embedding-upgrade
Migrates a project from Metabase static embedding to guest embeds (web components via embed.js). Use when the user wants to migrate/convert/switch/upgrade from static embedding to guest embeds, from signed embed iframes to web components, or replace /embed/ iframes with metabase-dashboard/metabase-question components.
44metabase-full-app-to-modular-embedding-upgrade
Migrates a project from Metabase Full App / Interactive (iframe-based) embedding to Modular (web-component-based) embedding. Use when the user wants to replace Metabase iframes with Modular embedding web components.
42