mz-developer-analysis
Materialize Developer Analysis
Analyze a Materialize environment by querying system catalog tables via the MCP
Developer endpoint (query_system_catalog tool), and produce a structured
report with health status, performance findings, and optimization recommendations.
Discovering Tables and Columns
Do NOT guess column names. Before writing queries, check if the mz_ontology
schema is available by running:
SHOW TABLES FROM mz_ontology
If mz_ontology is available
Use it to discover the correct tables, columns, join paths, and ID types:
| Table | What it tells you |
|---|---|
mz_ontology.mz_ontology_entity_types |
What catalog entities exist and which mz_* table they map to. |
mz_ontology.mz_ontology_link_types |
Relationships between entities (foreign keys, metrics, etc.). |
mz_ontology.mz_ontology_properties |
Column names, types, and descriptions for each entity. |
mz_ontology.mz_ontology_semantic_types |
Typed ID domains (CatalogItemId, ReplicaId, etc.). |
Example queries:
-- Find the right table for an entity
SELECT name, relation, description
FROM mz_ontology.mz_ontology_entity_types
WHERE name LIKE '%source%'
-- Find join paths between entities
SELECT name, source_entity, target_entity, properties, description
FROM mz_ontology.mz_ontology_link_types
WHERE source_entity = 'source' OR target_entity = 'source'
-- Find columns for a table
SELECT column_name, semantic_type, description
FROM mz_ontology.mz_ontology_properties
WHERE entity_type = 'source_status'
If mz_ontology is NOT available
Use SHOW COLUMNS FROM <schema>.<table> to verify column names before querying.
Refer to the Critical Rules below for known pitfalls.
Critical Rules
Known column name pitfalls
Even with the ontology, be aware of these common mistakes:
| Wrong | Correct | Table |
|---|---|---|
updated_at |
last_status_change_at |
mz_source_statuses, mz_sink_statuses |
cluster_name |
Must JOIN through replica_id to mz_cluster_replicas then mz_clusters |
mz_cluster_replica_utilization |
When unsure, run SHOW COLUMNS FROM <schema>.<table> to verify.
Do NOT query mz_dataflow_arrangement_sizes
NEVER query mz_introspection.mz_dataflow_arrangement_sizes via MCP. It
fails for two reasons:
- Cluster-scoped: Only returns data for the session's current cluster,
and the MCP tool does not support
SET cluster = ...to switch clusters. - Type mismatch: Its
idcolumn isuint8, nottextlikemz_catalog.mz_objects.id. JOINs fail withoperator does not exist: uint8 = text.
Instead, use:
mz_internal.mz_cluster_replica_utilization— memory/CPU/disk percentagemz_internal.mz_cluster_replica_metrics— raw memory bytesmz_internal.mz_index_advice— find MVs/indexes that can be removed
Type casting notes
Some mz_introspection views use uint8 for ID columns instead of text.
Avoid JOINing mz_introspection views with mz_catalog views unless you
cast IDs explicitly. The mz_internal views all use text IDs and are safe
to JOIN with mz_catalog.
Discovering tables without the ontology
If mz_ontology is not available, use these fallbacks:
SHOW COLUMNS FROM <schema>.<table>to check a table's columns- Do NOT use
SHOW TABLES FROM mz_internal LIKE '...'— this only shows tables, not views. Most system catalog objects are views.
Workflow Overview
- Connect — Verify the MCP Developer tools are available
- Discover — Use the ontology + catalog queries to inventory all deployed objects
- Analyze — Assess performance metrics: freshness, hydration, memory, utilization
- Report — Produce a structured markdown report with findings and recommendations
Step 1: Verify MCP Connection
Confirm you have access to the query_system_catalog tool. Run a quick test:
query_system_catalog: SELECT mz_version()
If this fails, check:
- The MCP server is configured in
.mcp.json - The
enable_mcp_developerfeature flag is enabled on the environment - Your authentication credentials are valid
Running Queries
All queries are run via the query_system_catalog MCP tool. Constraints:
- One statement per call (no semicolons)
- Read-only: SELECT, SHOW, EXPLAIN only
- System tables only: no access to user tables
- No
SETstatements
When filtering out system schemas, always exclude: mz_catalog, mz_internal,
pg_catalog, information_schema, and mz_introspection.
Step 2: Discover — Inventory the Environment
Run the discovery queries to understand what is deployed. See
references/queries.md for the full query set. The discovery phase covers:
Environment Overview
- Materialize version (
SELECT mz_version()) - Clusters and replicas — names, sizes, and replica counts
- Schemas in use
Deployed Objects Inventory
- Sources: type (Kafka, Postgres, MySQL, Webhook, etc.), cluster assignment, status
- Materialized Views: cluster assignment, indexes, dependencies
- Views: (non-materialized) and their usage patterns
- Sinks: type, destination, cluster assignment
- Indexes: what they're on, cluster assignment
- Connections: external system connections configured
Build a mental model of the data pipeline: what data comes in (sources), how it's transformed (views/MVs), and where it goes out (sinks).
Object Definitions
Retrieve SQL definitions for materialized views, views, indexes, and sources
using references/queries.md. This is critical for optimization analysis —
the SQL definitions tell you how things are computed:
- Join patterns and join order
- Filter predicates (or lack thereof — missing temporal filters are a common issue)
- Aggregation strategies
- Whether MVs duplicate logic that could be shared
Step 3: Analyze — Performance and Resource Metrics
Freshness (Lag Analysis)
Query mz_internal.mz_materialization_lag for per-object lag.
Important: The write_frontier column is of type mz_timestamp (a uint8),
not a standard timestamp. You cannot subtract it from now() directly. Cast to
get a human-readable time: to_timestamp(write_frontier::bigint / 1000).
Hydration Status
Query mz_internal.mz_hydration_statuses to check whether all dataflows are
hydrated. Non-hydrated objects after initial startup may indicate resource
pressure or configuration issues.
Memory and Resource Consumption
mz_internal.mz_cluster_replica_utilizationfor memory/CPU percentage per replicamz_internal.mz_cluster_replica_metricsfor raw memory metricsmz_internal.mz_index_adviceto identify which MVs/indexes can be optimized
Index Advice
Query mz_internal.mz_index_advice — Materialize's built-in advisor. Hint types:
- "keep" — the MV/index is needed as-is
- "drop unless queried directly" — no structural dependencies; only useful for direct SELECT queries
- "convert to a view" — MV can be dematerialized entirely, saving all arrangement memory
- "convert to a view with an index" — convert MV to a view but keep its indexes
- "add index" — object would benefit from an index
Cost Analysis (optional)
Query mz_catalog.mz_cluster_replica_sizes to get credit rates per cluster
size, then calculate: credits_per_hour * replication_factor * 730 hours/month.
When writing recommendations, always quantify the credit impact.
Object Dependencies
Query mz_internal.mz_object_dependencies to understand the dependency graph.
Step 4: Report — Generate the Analysis
Produce a structured markdown report:
# Environment Analysis
**Date**: <date>
**Materialize Version**: <version>
## Executive Summary
<2-3 paragraph high-level assessment>
## Cluster Topology
| Cluster | Size | Replicas | Credits/Hr | Monthly Credits | Utilization |
## Deployed Objects
### Sources (<count>)
### Materialized Views (<count>)
### Sinks (<count>)
### Indexes (<count>)
## Performance Analysis
### Freshness
### Hydration
### Cluster Utilization
## Cost Analysis (if requested)
## Index Advice Summary
## SQL-Level Analysis
### Materialized View Definitions
### Index Analysis
## Optimization Recommendations
<numbered list with specific SQL for each>
Writing Recommendations
Always include specific SQL commands. For example:
Good:
Recommendation: Dematerialize
my_schema.unused_mvto save memory.SHOW CREATE MATERIALIZED VIEW my_schema.unused_mv; DROP MATERIALIZED VIEW my_schema.unused_mv; CREATE VIEW my_schema.unused_mv AS <definition>;
Bad:
Recommendation: Consider dematerializing
my_schema.unused_mv.
Troubleshooting Runbooks
For focused troubleshooting, use these diagnostic paths. Always end with specific SQL commands to fix the issue.
"Why is my materialized view stale?"
Diagnostic steps:
- Check
mz_internal.mz_materialization_lagfor the MV's lag - Check
mz_internal.mz_hydration_statuses— is it hydrated? - Check
mz_internal.mz_cluster_replica_statuses— is the replica healthy? - Check
mz_internal.mz_cluster_replica_utilization— memory pressure causing restarts? - Check
mz_internal.mz_source_statuses— upstream source errors?
Common fixes:
If the cluster is overloaded (high memory/CPU):
-- Option A: Scale up the cluster
ALTER CLUSTER <cluster_name> SET (SIZE = '<next_size_up>');
-- Option B: Move the MV to a different cluster
SHOW CREATE MATERIALIZED VIEW <schema>.<mv_name>;
DROP MATERIALIZED VIEW <schema>.<mv_name>;
CREATE MATERIALIZED VIEW <schema>.<mv_name> IN CLUSTER <new_cluster> AS <definition>;
If the MV is not hydrated and the cluster recently restarted: Hydration will complete on its own once the cluster stabilizes. If it persists, the cluster likely needs more memory.
If an upstream source has errors:
SELECT name, status, error, last_status_change_at
FROM mz_internal.mz_source_statuses
WHERE status != 'running'
Fix the upstream source issue first — MV freshness depends on source health.
"Why is my cluster running out of memory?"
Diagnostic steps:
- Check
mz_internal.mz_cluster_replica_utilizationfor memory percentage - Check
mz_internal.mz_index_advicefor MVs that can be dematerialized - Check MV definitions for missing temporal filters
- Check for redundant indexes
Common fixes:
Dematerialize MVs that don't need to be materialized:
SELECT o.name, o.type, sc.name AS schema_name, ia.hint, ia.details
FROM mz_internal.mz_index_advice ia
JOIN mz_catalog.mz_objects o ON ia.object_id = o.id
JOIN mz_catalog.mz_schemas sc ON o.schema_id = sc.id
WHERE ia.hint = 'convert to a view'
-- For each candidate:
SHOW CREATE MATERIALIZED VIEW <schema>.<mv_name>;
DROP MATERIALIZED VIEW <schema>.<mv_name>;
CREATE VIEW <schema>.<mv_name> AS <definition>;
Drop unused indexes:
SELECT o.name, o.type, sc.name AS schema_name, ia.hint, ia.details
FROM mz_internal.mz_index_advice ia
JOIN mz_catalog.mz_objects o ON ia.object_id = o.id
JOIN mz_catalog.mz_schemas sc ON o.schema_id = sc.id
WHERE ia.hint = 'drop unless queried directly'
-- Verify with the user before dropping
DROP INDEX <schema>.<index_name>;
Scale up the cluster:
ALTER CLUSTER <cluster_name> SET (SIZE = '<next_size_up>');
"Are my sources healthy? / Has my source finished snapshotting?"
Diagnostic steps:
- Check
mz_internal.mz_source_statusesfor source errors - Check
mz_internal.mz_source_statisticsfor ingestion progress - Check
mz_internal.mz_materialization_lagfor end-to-end lag
Common fixes:
If a source is stalled or erroring:
SELECT name, status, error
FROM mz_internal.mz_source_statuses
WHERE status != 'running'
-- If the connection credentials are wrong:
ALTER SECRET <secret_name> AS '<new_value>';
If snapshot_committed is false, the source is still loading its initial
snapshot. This is normal for large sources — wait for it to complete.
"What's the health of my environment?"
Run these checks in order:
mz_internal.mz_cluster_replica_statuses— all replicas ready?mz_internal.mz_source_statuses— all sources running?mz_internal.mz_sink_statuses— all sinks running?mz_internal.mz_cluster_replica_utilization— resource pressure?
"What can I optimize to save costs?"
- Check
mz_internal.mz_index_advicefor optimization candidates - Check cluster utilization — are clusters over-provisioned?
- Check credit rates from
mz_catalog.mz_cluster_replica_sizes
Notes
- All queries run through the MCP Developer endpoint are read-only.
- Query results are limited to system catalog tables — no access to user data.
- Access is governed by RBAC — you only see objects your credentials have access to.
- Freshness numbers are point-in-time snapshots. Re-run to check if lag is stable or growing.
More from materializeinc/agent-skills
materialize-docs
Materialize documentation for SQL syntax, data ingestion, concepts, and best practices. Use when users ask about Materialize queries, sources, sinks, views, or clusters.
106mcp-developer-analysis
Analyze a Materialize environment for health, performance, and optimization opportunities using the MCP Developer endpoint. Use this skill when someone wants to check environment health, investigate performance issues, troubleshoot stale materialized views, diagnose memory pressure, audit resource utilization, or get optimization recommendations. Trigger this even if the user just says "check my environment", "why is my MV stale", "why is my cluster slow", or "what can I optimize".
5