genie-metadata-optimizer
Genie Metadata Optimizer
Analyzes evaluation failures and generates metadata change proposals using GEPA or structured LLM introspection. Produces control lever mappings consumed by the Applier worker.
When to Use This Skill
- Evaluation scores are below target after benchmarking
- Need to generate metadata change proposals from failure patterns
- Running GEPA-powered metadata evolution (L2)
- Running failure clustering and ASI-grounded introspective analysis (L1)
- Optimizing judge prompt accuracy (SIMBA)
Inputs (from Orchestrator)
| Input | Type | Description |
|---|---|---|
eval_results |
dict | Evaluation results with per-row rows list containing feedback/*, rationale/*, and metadata/* columns. MUST include per-row data from the evaluator's evaluation/failures.json artifact, not just aggregate metrics. When ASI metadata fields (metadata/failure_type, metadata/blame_set, metadata/counterfactual_fix) are present in rows, cluster_failures() uses them for higher-precision clustering instead of keyword extraction. |
judge_feedback |
list | Judge rationales per question |
metadata_snapshot |
dict | Current Genie metadata snapshot (for GEPA or introspection) |
space_id |
str | Genie Space ID |
use_asi |
bool | Enable ASI-grounded patch proposals (default: True) |
use_patch_dsl |
bool | Enable Patch DSL validation and conflict checking (default: True) |
use_gepa |
bool | Enable GEPA evaluation of candidate patch sets (default: False) |
target_lever |
int or None | When provided, filter proposals to this lever only (1-6). Used by the orchestrator for per-lever optimization (default: None = all levers) |
Hard Constraint
Optimizer MUST consume the evaluator's per-row failures artifact (with ASI metadata), not aggregate metrics. When ASI fields (failure_type, blame_set, counterfactual_fix) are present, use them for clustering instead of keyword extraction from rationale text.
Outputs (to Orchestrator)
| Output | Type | Description |
|---|---|---|
patch_set |
list | List of patch dicts (when use_patch_dsl=True) |
validation_result |
dict | Conflict check and blast radius enforcement result |
proposals |
list | Legacy format proposals (for backward compatibility) |
optimized_candidate |
dict | GEPA-optimized metadata (L2) or proposals (L1) |
lever_mapping |
list | Control lever proposals with dual persistence paths |
pareto_stats |
dict | GEPA optimization statistics (L2 only) |
judge_quality_feedback |
list | Per-judge quality signals when counterfactual_fix values are too generic to drive optimization. Each entry: {judge_name, feedback_type, example, desired, count}. Feeds into SIMBA Tier 3 judge alignment when judges are the bottleneck. |
Optimization Tiers (L1/L2/L3 Maturity Model)
| Tier | Engine | Description | When to Use |
|---|---|---|---|
| L1 | Greedy / Introspection | ASI-grounded patch proposals via propose_patch_set_from_asi(). Uses FAILURE_TAXONOMY and blame_set grouping. Default mode. |
Optimizing UC table/column descriptions from judge feedback |
| L2 | GEPA | GenieMetadataAdapter evaluates candidate patch set JSONs. Enabled via use_gepa=True flag. Lever 6 only. |
Optimizing Genie instructions + example SQLs with >=15 benchmarks |
| L3 | Multi-Objective | Pareto frontier tracking with multiple objectives. Future work. | When multi-objective tradeoffs are needed |
SEQUENCING: Always run L1 introspection for Levers 1-5 BEFORE running L2 (GEPA) for Lever 6. GEPA is NOT a replacement for introspection — it is a complement that handles the lowest-priority lever. Running GEPA first wastes the optimization budget on the least durable lever (~4000 char limit). The orchestrator enforces this via hard constraint #14.
Load: Read gepa-integration.md for L2 GEPA implementation: seed extraction, evaluator, orchestration.
Load: Read feedback-to-metadata-mapping.md for L1 introspection: failure clustering, proposal generation, conflict detection.
Load: Read prompt-registry-patterns.md for judge prompt optimization and the prompt lifecycle.
Patch DSL (P2)
Structured patch language for metadata changes with conflict detection and blast radius enforcement.
- 32 PATCH_TYPES with defined scopes and risk levels (e.g.,
ALTER_TABLE_COMMENT,ALTER_COLUMN_COMMENT,CREATE_METRIC_VIEW, etc.) - 16 CONFLICT_RULES pairs: patches that cannot be applied together (e.g., same table + different descriptions)
- validate_patch_set() for conflict checking and blast radius enforcement (max 5 objects per patch set)
- Feature flag:
use_patch_dsl=True/False(default: True)
Actionable Side Information (ASI) (P4)
ASI structures judge feedback for patch synthesis and scoring.
- FAILURE_TAXONOMY: 23 failure types (e.g., wrong_table, wrong_column, wrong_aggregation, wrong_filter, hallucination, etc.)
- ASI_SCHEMA: 12 fields per judge feedback (e.g., question_id, failure_type, blame_set, rationale, suggested_fix, etc.)
- propose_patch_set_from_asi() workflow:
- Collect failures from judge feedback
- Group by blame_set (root cause clustering)
- Synthesize patches per group
- Score patches (impact, blast radius)
- Select best patch set
- Feature flag:
use_asi=True/False(default: True)
ASI Consumption Contract
The optimizer reads structured ASI from the evaluator using a priority chain:
- UC table (primary):
read_asi_from_uc(catalog, schema, run_id, warehouse_id)queriesgenie_eval_asi_resultsvia Databricks SQL Statement API. Returns one dict per (question, judge) pair with all ASI fields. {judge}/metadatacolumns: Fall back to reading metadata columns fromeval_result.tables["eval_results"]DataFrame.- Assessments list: Parse the
assessmentslist from evaluation results via_extract_asi_from_assessments(). - Regex on rationale (last resort):
_infer_blame_from_rationale()parses free-text judge rationale for blame keywords.
Structured fields consumed by cluster_failures(): asi_severity, asi_confidence, asi_wrong_clause, asi_expected_value, asi_actual_value, asi_missing_metadata, asi_ambiguity_detected.
Generic counterfactual_fix handling: When counterfactual_fix contains only generic guidance (e.g., "Review table/column references in Genie metadata", "Check X in metadata", "Verify Y"), the optimizer SHOULD ignore it and use blame_set + wrong_clause to synthesize a specific fix. Generic fixes are treated as equivalent to "no fix provided." The _describe_fix() function detects these by checking if the fix starts with generic verbs ("Review", "Check", "Verify") without referencing a specific asset, column, or TVF name.
Proposal enrichment: Each proposal emitted by generate_metadata_proposals() includes an asi dict with failure_type, blame_set, severity, counterfactual_fixes, ambiguity_detected from the source cluster. This enables the applier's proposals_to_patches() to generate targeted Patch DSL patches.
Blast Radius Tracking (P13)
Penalizes patch sets that touch too many objects to limit regression risk.
- adjusted_score = raw_score - 0.1 × (blast_objects / total_objects)
- Max 5 objects per patch set (enforced by validate_patch_set when use_patch_dsl=True)
Introspective Analysis (L1)
When L2 (GEPA) is not used, the optimizer clusters failures and generates proposals:
- Cluster failures by systemic root cause (>=2 questions per cluster).
cluster_failures()returns cluster dicts with:cluster_id,root_cause,question_ids,affected_judge,confidence,asi_failure_type,asi_blame_set,asi_counterfactual_fixes. - Map each cluster to a control lever (1-6)
- Generate proposals with dual persistence paths and net impact scores
- Detect conflicts and batch non-conflicting proposals together
- Sort by net impact descending — highest impact applied first
Judge rationale fields are the primary learning signal. GEPA receives rationales as Actionable Side Information (ASI); introspection clusters by root cause pattern.
Root Cause → Lever Mapping
| Root Cause | Lever | API Command |
|---|---|---|
| Wrong table | 1 (UC Tables) | ALTER TABLE ... SET TBLPROPERTIES |
| Wrong column | 1 (UC Columns) | ALTER COLUMN ... COMMENT |
| Wrong join | 1 (UC Tables) | ALTER TABLE ... SET TBLPROPERTIES |
| Wrong aggregation | 2 (Metric Views) | CREATE OR REPLACE VIEW |
| Wrong filter | 3 (TVFs) | CREATE OR REPLACE FUNCTION |
| Missing filter (TVF param format) | 3 (TVFs) | CREATE OR REPLACE FUNCTION — fix COMMENT PARAMS section |
| Missing filter (temporal on MV) | 2 (Metric Views) | CREATE OR REPLACE VIEW — add temporal filtering guidance to date dimension |
| Missing filter (date param value) | 3 (TVFs) | CREATE OR REPLACE FUNCTION — fix COMMENT SYNTAX example |
| Missing temporal filter | 2 (Metric Views) | CREATE OR REPLACE VIEW — add date dimension comment with temporal patterns |
| Monitoring gap, stale data, data freshness | 4 (Monitoring) | ALTER TABLE ... SET TBLPROPERTIES (monitoring config) |
| ML feature missing, model scoring error, feature store mismatch | 5 (ML Tables) | ALTER TABLE ... SET TBLPROPERTIES (ML feature metadata) |
| Repeatability issue (TABLE/MV) | 1 (UC Tables/Columns) | ALTER TABLE SET TBLPROPERTIES, ALTER COLUMN COMMENT |
| Repeatability issue (TVF) | 6 (Instructions) | PATCH /api/2.0/genie/spaces/{id} |
| Wrong asset routing | 6 (Instructions) | PATCH /api/2.0/genie/spaces/{id} |
Repeatability → Lever routing rationale: Repeatability issues often stem from unstructured metadata creating an ambiguous search space for the LLM. Structured metadata (Lever 1) -- business_definition, synonyms[], grain, join_keys[], do_not_use_when[], preferred_questions[] in column comments, plus UC tags like preferred_for_genie=true, deprecated_for_genie=true, domain=<value> -- narrows the search space and reduces SQL variance. Structured metadata can be added as tags (via TBLPROPERTIES/ALTER TABLE SET TAGS) or within descriptions/column comments depending on the situation. When the asset is already a TVF (output is already constrained by function signature), the optimizer falls back to Lever 6 (instructions for deterministic parameter selection). TVF conversion remains a secondary recommendation for TABLE/MV assets when structured metadata alone is insufficient.
TVF COMMENT Format (Lever 3)
TVF COMMENTs are the primary mechanism for guiding Genie's parameter selection. The COMMENT must follow this structured bullet-point layout — Genie parses these sections to understand when/how to call the function:
• PURPOSE: What this function returns and when to use it
• BEST FOR: Specific query patterns that should use this TVF
• NOT FOR: Patterns that should use a different asset (with redirect to preferred asset)
• RETURNS: Column list (helps Genie understand output schema)
• PARAMS:
- param_name (TYPE): Description. Format: <format_rule>.
Pass NULL (not the string 'NULL') to omit this filter.
- date_param (STRING): Start date. Format: CAST(DATE_ADD(CURRENT_DATE(), -N) AS STRING).
Do NOT use hardcoded dates like '2025-01-01'.
• SYNTAX: SELECT * FROM schema.tvf_name(CAST(DATE_ADD(CURRENT_DATE(), -365) AS STRING), NULL)
• NOTE: Additional constraints (e.g., ROW_NUMBER ranking, no GROUP BY needed)
Critical: The PARAMS and SYNTAX sections directly control Genie's parameter handling. Without explicit NULL-handling guidance and dynamic date syntax examples, Genie consistently: (1) passes the string literal 'NULL' instead of SQL NULL for optional parameters, and (2) uses hardcoded dates like '2025-02-23' instead of dynamic expressions.
Metadata Effectiveness by Complexity
Column comments and Genie Space instructions have varying effectiveness depending on expression complexity. The optimizer MUST factor this into proposal generation — do not waste iterations on metadata changes for patterns with LOW effectiveness.
| Temporal Pattern | Example Phrasing | Required SQL | Metadata Lever | Effectiveness |
|---|---|---|---|---|
| "this year" | "total revenue this year" | WHERE YEAR(date_col) = YEAR(CURRENT_DATE()) |
Column comment (Lever 1) | HIGH — single function, Genie reliably applies |
| "last N days" | "bookings last 30 days" | WHERE date_col >= DATE_ADD(CURRENT_DATE(), -30) |
Column comment or TVF SYNTAX (Lever 1/3) | HIGH — single function, Genie reliably applies |
| "last month" | "revenue last month" | WHERE date_col >= DATE_TRUNC('month', ADD_MONTHS(CURRENT_DATE(), -1)) AND date_col < DATE_TRUNC('month', CURRENT_DATE()) |
Genie Space instruction (Lever 6) | MEDIUM — range with two boundaries, sometimes partially applied |
| "last quarter" | "total revenue last quarter" | WHERE date_col >= ADD_MONTHS(DATE_TRUNC('quarter', CURRENT_DATE()), -3) AND date_col < DATE_TRUNC('quarter', CURRENT_DATE()) |
Genie Space instruction (Lever 6) | LOW — nested functions, Genie frequently ignores |
| "YoY comparison" | "revenue growth vs last year" | Window functions with LAG(..., 12) |
TVF (Lever 3) or pre-computed column | LOW — complex logic beyond metadata influence |
Escalation ladder for LOW-effectiveness patterns:
- Column comment (try first, expect failure for complex expressions)
- Genie Space instruction with explicit SQL example (try second)
- TVF with temporal parameter (absorb complexity into function signature)
- Schema simplification — pre-computed columns (e.g.,
fiscal_quarter,is_last_quarterboolean,quarter_start_date) that reduce the expression to a simple equality filter
When the optimizer generates a proposal for a LOW-effectiveness pattern and it fails in the subsequent evaluation, it SHOULD escalate to the next rung instead of re-proposing the same lever.
Repeatability Failure Clustering
When cross-iteration repeatability checks detect questions whose SQL changed between iterations (flagged as repeatability_issue synthetic failures), the optimizer uses specialized clustering to identify and fix the root cause:
1. Detect Asset-Type Oscillation
Examine the SQL variants across iterations for each non-repeatable question. Look for oscillation patterns:
- TABLE ↔ MV oscillation: Genie alternates between querying a base table and its metric view (e.g.,
SELECT ... FROM dim_productvsSELECT ... FROM mv_product_summary) - TVF ↔ TABLE/MV oscillation: Genie alternates between calling a TVF and direct table/MV queries
- Parameter oscillation: TVF calls with different parameter values across runs
2. Map to Lever Sequence
Based on the oscillation type, apply fixes in this specific order:
- Lever 1 (Structured Metadata): Add decisive metadata to disambiguate:
preferred_for_genie=trueUC tag on the canonical assetdeprecated_for_genie=truetag on the non-canonical assetbusiness_definitionin column comments with synonym phrasesdo_not_use_when[]in column comments for the non-preferred asset
- Lever 2 (MV YAML + negative routing): If oscillation involves metric views, add measure-level comments with negative routing (e.g., "Use this MV for aggregated KPIs, NOT for row-level lists")
- Lever 6 (Instructions, last resort): If structured metadata alone is insufficient, add explicit routing instructions to the Genie Space instructions
3. Bilateral Disambiguation (MANDATORY)
Genie routing is probabilistic, not deterministic. Bilateral disambiguation improves routing odds but does not guarantee consistency across runs. A question that routes correctly in iteration N may regress in iteration N+1 when routing remains ambiguous. This is a known platform behavior (KGL-2), and usually indicates metadata authoring needs clearer positive and negative routing guidance to reduce ambiguity.
When a question oscillates between two asset types, BOTH sides of the disambiguation must be addressed:
- Positive routing (Lever 1/2): Tell the preferred asset it IS the right choice for this question pattern
- Negative routing (Lever 3): Tell the competing asset it is NOT the right choice
Applying only one side is insufficient -- both assets will continue competing. Example: if a TVF COMMENT says "BEST FOR: total revenue last quarter" and the MV has positive routing for the same pattern, the TVF COMMENT must add NOT FOR: Simple total revenue without destination breakdown (use booking_analytics_metrics MEASURE(total_revenue)).
The optimizer MUST auto-detect asset-type oscillation in failure clusters and generate proposals for BOTH the preferred and competing assets.
Routing-sensitive benchmarks require repeatability testing: For any question where bilateral disambiguation was applied, the evaluator SHOULD run 3-5 passes of the same question and flag inconsistent routing. If routing variance exceeds 40% (e.g., 2 of 5 passes route incorrectly), document the question as routing-sensitive in benchmark metadata and accept the variance as a platform limitation rather than generating additional metadata iterations.
Template pair:
-- Positive routing (preferred asset — Lever 1 or 2):
ALTER TABLE catalog.schema.<preferred_table>
ALTER COLUMN <dimension> COMMENT '... ALWAYS use this column with MEASURE(<measure>) for <pattern>. Do NOT use <competing_tvf> for this pattern.';
-- Or for Metric Views (Lever 2):
-- Update MV YAML dimension description: "ALWAYS use this for <pattern>"
-- Negative routing (competing asset — Lever 3):
-- In the competing TVF's COMMENT, update:
-- BEST FOR: Remove <pattern> from the list
-- NOT FOR: "<pattern> (use <preferred_asset> MEASURE(<measure>) instead)"
Concrete example:
-- Positive: Tell MV it IS the right choice for "revenue by destination country"
ALTER TABLE catalog.schema.booking_analytics_metrics
ALTER COLUMN destination_country COMMENT 'Country of destination property. ALWAYS use this dimension with MEASURE(total_revenue) for revenue by country queries.';
-- Negative: Tell TVF it is NOT the right choice for simple country revenue
-- In get_revenue_summary COMMENT, change:
-- BEST FOR: "Revenue trends by time period with multi-metric breakdown"
-- NOT FOR: "Revenue by destination country (use booking_analytics_metrics MEASURE(total_revenue))"
4. Verify
After each lever application, re-run the specific non-repeatable question 3x to confirm oscillation is resolved. Only proceed to the next lever if the question still oscillates.
Lever 6 Instruction Specificity Rules
Lever 6 (Genie Space instructions) is prone to overcorrection when routing instructions are too broad. An overly generic instruction like "Revenue by destination → MV" can inadvertently redirect TVF-appropriate queries to the MV.
Rules:
- Every routing instruction MUST include both the POSITIVE case and NEGATIVE exclusion
- Use the question's exact phrasing or a close paraphrase, not generic patterns
- Distinguish simple aggregations (MV) from date-ranged or parameterized detail queries (TVF)
GOOD:
"Revenue by destination country this year" → booking_analytics_metrics (MV).
"Revenue by destination for last 6 months with breakdown" → get_revenue_summary (TVF).
BAD:
"Revenue by destination" → booking_analytics_metrics (MV).
The bad example is too broad — it matches "Revenue by destination for last 6 months" which should route to the TVF, causing an overcorrection.
Overcorrection recovery: If a Lever 6 instruction causes a regression (a previously correct question now fails), immediately narrow the instruction by adding the negative exclusion, or remove the instruction and rely on Lever 1/2/3 metadata instead.
Known Genie Limitations
Documented platform behaviors that metadata guidance alone cannot resolve. When the optimizer encounters these, it should recommend code-level workarounds instead of additional metadata iterations.
| # | Limitation | Observed Behavior | Metadata Attempted | Outcome | Recommended Workaround |
|---|---|---|---|---|---|
| KGL-1 | Optional TVF parameters: 'NULL' string literal |
Genie generates param => 'NULL' instead of NULL when using named argument syntax |
COMMENT with "pass NULL not 'NULL'", SYNTAX with positional NULL, Genie Space instructions to avoid named args | Not fixed after 3+ iterations | TVF body coercion: IF param = 'NULL' THEN NULL ELSE param END; SYNTAX examples with positional args only |
| KGL-2 | Probabilistic asset routing | Genie non-deterministically routes questions between competing assets (e.g., MV vs TVF) even with bilateral disambiguation | Positive + negative routing on both assets, Genie Space routing instructions | Improved but not guaranteed — routing is probabilistic | Bilateral disambiguation (mandatory), repeatability testing (3-5 passes), accept variance as platform behavior |
| KGL-3 | Complex temporal expressions ignored | Genie does not apply complex date filters (ADD_MONTHS(DATE_TRUNC('quarter', CURRENT_DATE()), -3)) despite column comments and instructions |
Column comments with explicit SQL, Genie Space instructions | Not fixed for nested date functions | Schema simplification (pre-computed quarter columns), or TVF with temporal parameter |
Escalation rule: If the same failure persists for 2+ iterations with targeted metadata changes, check this table before generating another metadata proposal. If the failure matches a known limitation, emit the code-level workaround as the proposal instead of additional COMMENT changes.
TVF body coercion pattern (KGL-1):
CREATE OR REPLACE FUNCTION schema.my_tvf(
start_date STRING,
optional_filter STRING
)
RETURNS TABLE(...)
COMMENT '...'
RETURN
SELECT * FROM my_table
WHERE date_col >= start_date
AND (
CASE WHEN optional_filter IS NULL OR optional_filter = 'NULL'
THEN TRUE
ELSE category = optional_filter
END
);
Common Mistakes
| Mistake | Consequence | Fix |
|---|---|---|
| TVF COMMENT missing PARAMS null-handling guidance | Genie passes 'NULL' string literal for optional params |
Add "pass NULL, do NOT pass the string literal 'NULL'" to each optional param in PARAMS section |
| TVF COMMENT SYNTAX shows hardcoded dates | Genie uses hardcoded dates instead of dynamic expressions | Show CAST(DATE_ADD(CURRENT_DATE(), -N) AS STRING) in SYNTAX |
| Overly broad Lever 6 routing instruction | Overcorrects — redirects TVF-appropriate queries to MV | Include both positive routing AND negative exclusion per instruction; use exact question phrasing |
| Bare boolean scores (no rationale) | GEPA/introspection can't learn | Always use Feedback(value=..., rationale=...) |
| Batch-applying all proposals at once | Can't isolate regressions | Apply one cluster's proposals per iteration |
| Running L2 (GEPA) and judge optimization together | Confounding effects | Always separate: metadata first, judge prompts only if judges are bottleneck |
| Running GEPA before exhausting Levers 1-5 | Optimization budget wasted on lowest-priority lever | Always introspect Levers 1-5 first; GEPA is Lever 6 only |
| Assuming GEPA is unavailable without checking | L2 tier never attempted | Install gepa>=0.1.0 and use the GEPA template notebook |
| Re-proposing column comments for complex temporal patterns after failure | Wasted iterations with no improvement | Check Metadata Effectiveness table — LOW patterns ("last quarter", "YoY") need escalation to TVF or schema simplification, not more comments (KGL-3) |
| Generating 3+ metadata iterations for same failure without checking Known Genie Limitations | Infinite loop against platform ceiling | After 2 failed iterations on the same failure, check KGL table; if it matches, emit code-level workaround instead |
Installation
GEPA L2 requires the gepa package. Install before running the GEPA tier:
# Local / CI
pip install "gepa>=0.1.0"
# Databricks cluster library (via UI or CLI)
databricks libraries install --cluster-id <CLUSTER_ID> --pypi-package "gepa>=0.1.0"
The GEPA template job YAML (gepa-optimization-job-template.yml) includes gepa>=0.1.0 in its environment dependencies automatically.
Scripts
metadata_optimizer.py
Standalone introspection and proposal generation CLI.
Note: UC/ASI reading (read_asi_from_uc(catalog, schema, run_id, warehouse_id)) is triggered by the orchestrator when passing eval_results with run_id and UC connection params — not via CLI args. The CLI uses --eval-results, --metadata-snapshot, --output, --tier, --use-asi, --use-patch-dsl, --use-gepa, --space-id, --target-lever, and --recommended.
run_gepa_optimization.py (template)
Databricks notebook for GEPA L2 optimization. All helper functions are inlined (self-contained). Includes strip_non_exportable_fields(), build_seed_candidate(), evaluate_genie(), apply_candidate_to_space(), and score_genie_response().
gepa-optimization-job-template.yml (template)
DABs job definition for deploying GEPA optimization as a Databricks job. Includes gepa>=0.1.0 dependency.
Reference Index
| Reference | What to Find |
|---|---|
| gepa-integration.md | GEPA optimize_anything, seed extraction, evaluator, configuration |
| feedback-to-metadata-mapping.md | Failure clustering, proposal generation, conflict detection, regression detection |
| prompt-registry-patterns.md | Prompt lifecycle, GEPA judge optimization, rollback, tag conventions |
| run_gepa_optimization.py | GEPA notebook template (all helpers inlined, self-contained) |
| gepa-optimization-job-template.yml | DABs job definition for GEPA optimization |
Version History
- v4.2.0 (Feb 23, 2026) - Phase 8: Optimization loop feedback (Issues 14-16). Known Genie Limitations table (KGL-1 through KGL-3) with code-level workaround patterns. TVF body coercion pattern for
'NULL'string literal (KGL-1). Bilateral disambiguation updated with probabilistic routing language and 3-5 pass repeatability testing guidance (KGL-2). Metadata Effectiveness by Complexity table with escalation ladder for temporal patterns (KGL-3). 2 new Common Mistakes (complex temporal re-proposal, infinite iteration loop). Escalation rule: after 2 failed iterations on the same failure, check KGL table before generating more metadata proposals. - v4.1.0 (Feb 23, 2026) - Phase 7: ASI-to-metadata loop gap remediation (13 issues).
missing_filtersub-type routing in_map_to_lever()(TVF param -> L3, temporal -> L2, default -> L3). TVF COMMENT Format section with structured bullet-point template (PURPOSE, BEST FOR, NOT FOR, RETURNS, PARAMS, SYNTAX, NOTE). Concrete bilateral disambiguation templates (positive + negative routing SQL examples). Lever 6 Instruction Specificity Rules with overcorrection prevention (GOOD/BAD examples).missing_temporal_filteradded to FAILURE_TAXONOMY (23 types). Generic counterfactual_fix detection in_describe_fix()— skips vague fixes and falls through toblame_set+wrong_clausesynthesis.judge_quality_feedbackoutput added to optimizer for SIMBA Tier 3 alignment. 3 new Common Mistakes (TVF COMMENT PARAMS, TVF COMMENT SYNTAX, Lever 6 overcorrection). - v4.0.0 (Feb 23, 2026) - Phase 6 architectural lessons. Added ASI Consumption Contract section with UC-first priority chain (
read_asi_from_uc()-> columns -> assessments -> regex). Flipped feature flag defaults:use_asi=True,use_patch_dsl=True. Added bilateral disambiguation to Repeatability Failure Clustering (positive + negative routing mandatory). UC/ASI reading is triggered by the orchestrator (not CLI args). Version bumped from v3.8.0. - v3.8.0 (Feb 22, 2026) - Repeatability v2: structured metadata routing. Changed
_map_to_lever()forrepeatability_issue: TABLE/MV now routes to Lever 1 (structured metadata -- tags, column comments with business_definition, synonyms, grain, join_keys, do_not_use_when) instead of Lever 3 (TVFs); TVF routes to Lever 6 (instructions). Addedblame_setparameter to_map_to_lever()for context-aware routing. Rewrote_REPEATABILITY_FIX_BY_ASSETwith structured metadata recommendations. Fixed_describe_fix()bug readingdominant_assetinstead ofasi_blame_set. Updatedgenerate_metadata_proposals()and_dual_persist_paths()to passblame_setthrough. - v3.7.0 (Feb 22, 2026) - Repeatability judge integration. Added
repeatability_issueto_map_to_lever()mapping (lever 3, TVFs). Enhanced_describe_fix()with asset-type-specific recommendations for repeatability clusters: MV-routed questions recommend TVF conversion, TABLE-routed recommend TVF wrappers, TVF-routed recommend deterministic parameter instructions. Updated Root Cause to Lever Mapping table in SKILL.md. - v3.5.0 (Feb 22, 2026) - ASI-aware clustering (Phase 3).
cluster_failures()now prefersfailure_type/blame_setfrom ASI metadata over keyword extraction;_describe_fix()usescounterfactual_fix;_map_to_lever()accepts optionalasi_failure_typeparameter for precise routing. - v3.0.0 (Feb 2026) - Initial optimizer with introspection, GEPA L2, Patch DSL, blast radius tracking.