Genie Space Export/Import API

Overview

This skill provides comprehensive patterns for programmatically creating, exporting, and importing Databricks Genie Spaces via the REST API. It covers the complete GenieSpaceExport JSON schema, API endpoints, common deployment errors, and production-ready workflows including variable substitution and asset inventory-driven generation.

When to Use This Skill

Use this skill when you need to:

Programmatically deploy Genie Spaces via REST API (CI/CD pipelines, environment promotion)
Export Genie Space configurations for version control, backup, or migration
Troubleshoot API deployment errors (BAD_REQUEST, INVALID_PARAMETER_VALUE, INTERNAL_ERROR)
Implement cross-workspace deployment with template variable substitution
Generate Genie Spaces from asset inventories to prevent non-existent table errors
Validate Genie Space JSON structure before deployment
Understand the complete GenieSpaceExport schema (config, data_sources, instructions, benchmarks)

Quick Reference

API Operations

Operation	Method	Endpoint	Use Case
List Spaces	GET	`/api/2.0/genie/spaces`	Discover existing spaces
Get Space	GET	`/api/2.0/genie/spaces/{space_id}?include_serialized_space=true`	Export config, backup
Create Space	POST	`/api/2.0/genie/spaces`	New deployment, CI/CD
Update Space	PATCH	`/api/2.0/genie/spaces/{space_id}`	Modify config, add benchmarks
Delete Space	DELETE	`/api/2.0/genie/spaces/{space_id}`	Cleanup, teardown

API Limits

Resource	Limit	Enforcement
`instructions.sql_functions`	Max 50	Truncate in generation script
`benchmarks.questions`	Max 50	Truncate in generation script
`data_sources.tables`	No hard limit	Keep ~25-30 for performance
`data_sources.metric_views`	No hard limit	Keep ~5-10 per space

Core Workflow

Initial Deployment:

List spaces (check if already exists)
Load configuration from JSON file
Substitute template variables (${catalog}, ${gold_schema}, etc.)
Create space with full configuration
Get space to verify deployment

Incremental Updates:

Get current space configuration
Modify specific sections (e.g., add benchmarks)
Update space with PATCH (partial update)

Migration/Backup:

Get space with include_serialized_space=true
Save JSON to version control
Create space in new environment (with variable substitution)

Key Patterns

1. JSON Structure Requirements

CRITICAL: The serialized_space field must be a JSON string (escaped), not a nested object:

payload = {
    "title": "My Space",
    "warehouse_id": "abc123",
    "serialized_space": json.dumps(genie_config)  # ✅ String, not dict
}

Section 4: ID Generation

All IDs MUST be uuid.uuid4().hex — a 32-character lowercase hex string with no dashes.

import uuid

def generate_id() -> str:
    """Generate a Genie Space compatible ID (32 hex chars, no dashes)."""
    return uuid.uuid4().hex  # e.g., "a1b2c3d4e5f6789012345678abcdef01"

Required ID fields (every one must be a fresh uuid.uuid4().hex):

space.id
space.tables[].id
space.sql_functions[].id
space.example_question_sqls[].id
space.materialized_views[].id

❌ WRONG IDs (will cause import failures):

"genie_" + uuid.uuid4().hex[:24]    # ❌ Prefixed, wrong length
"aaaa" * 8                           # ❌ Not random
str(uuid.uuid4())                    # ❌ Contains dashes (36 chars)
hashlib.md5(name.encode()).hexdigest()  # ❌ Deterministic, not UUID4

✅ CORRECT: Always use uuid.uuid4().hex — nothing else.

Section 5: Array Format Requirements

ALL string-content fields in the Genie Space JSON MUST be single-element arrays, not plain strings.

Field	❌ Wrong	✅ Correct
`question`	`"What is revenue?"`	`["What is revenue?"]`
`content` (in answer)	`"SELECT ..."`	`["SELECT ..."]`
`description` (tables)	`"Orders table"`	`["Orders table"]`
`description` (MVs)	`"Revenue metrics"`	`["Revenue metrics"]`
`description` (TVFs)	`"Date range query"`	`["Date range query"]`

Rule: If a field contains human-readable text or SQL, wrap it in a single-element array ["value"].

Exception: format in answer objects is a plain string: "SQL" or "INSTRUCTIONS".

4. Template Variable Substitution

NEVER hardcode schema paths. Use template variables:

{
  "data_sources": {
    "tables": [
      {"identifier": "${catalog}.${gold_schema}.dim_store"}  // ✅ Template
    ]
  }
}

Substitute at runtime:

def substitute_variables(data: dict, variables: dict) -> dict:
    json_str = json.dumps(data)
    json_str = json_str.replace("${catalog}", variables.get('catalog', ''))
    json_str = json_str.replace("${gold_schema}", variables.get('gold_schema', ''))
    return json.loads(json_str)

5. Asset Inventory-Driven Generation

NEVER manually edit data_sources. Generate from verified inventory:

# Load inventory
with open('actual_assets_inventory.json') as f:
    inventory = json.load(f)

# Generate data_sources from inventory
genie_config['data_sources']['tables'] = [
    {"identifier": table_id}
    for table_id in inventory['genie_space_mappings']['cost_intelligence']['tables']
]

Benefits:

✅ Prevents "table doesn't exist" errors
✅ Enforces API limits automatically
✅ Single source of truth for assets

6. Column Configs Warning

column_configs triggers Unity Catalog validation that can fail for complex spaces:

{
  "data_sources": {
    "metric_views": [
      {
        "identifier": "catalog.schema.mv_sales"
        // ✅ Start without column_configs for reliable deployment
      }
    ]
  }
}

Trade-off:

Without column_configs: Reliable deployment, less LLM context
With column_configs: More LLM context, higher risk of INTERNAL_ERROR

7. Field Validation Rules

config.sample_questions:

✅ Array of objects (not strings)
✅ Each object: {id: string, question: string[]}
❌ NO name, description fields

data_sources.metric_views:

✅ identifier field (full 3-part UC name)
✅ Optional: description, column_configs
❌ NO id, name, full_name fields

instructions.sql_functions:

✅ id field (32 hex chars) - REQUIRED
✅ identifier field (full 3-part function name) - REQUIRED
❌ NO other fields (name, signature, description)

Section 8: Array Sorting Requirements

CRITICAL: All arrays in the Genie Space JSON MUST be sorted before any PATCH request. The Genie API uses protobuf serialization which requires deterministic ordering. Unsorted arrays produce: Invalid export proto: data_sources.tables must be sorted by identifier.

Sort keys by array path:

Array Path	Sort Key	Direction
`data_sources.tables`	`identifier`	Ascending
`data_sources.metric_views`	`identifier`	Ascending
`instructions.sql_functions`	`(id, identifier)`	Ascending
`instructions.text_instructions`	`id`	Ascending
`instructions.example_question_sqls`	`id`	Ascending
`config.sample_questions`	`id`	Ascending
`benchmarks.questions`	`id`	Ascending

Implementation — sort_genie_config():

def sort_genie_config(config: dict) -> dict:
    """Sort all arrays in Genie config — API rejects unsorted data."""
    if "data_sources" in config:
        for key in ["tables", "metric_views"]:
            if key in config["data_sources"]:
                config["data_sources"][key] = sorted(
                    config["data_sources"][key],
                    key=lambda x: x.get("identifier", ""),
                )
    if "instructions" in config:
        if "sql_functions" in config["instructions"]:
            config["instructions"]["sql_functions"] = sorted(
                config["instructions"]["sql_functions"],
                key=lambda x: (x.get("id", ""), x.get("identifier", "")),
            )
        for key in ["text_instructions", "example_question_sqls"]:
            if key in config["instructions"]:
                config["instructions"][key] = sorted(
                    config["instructions"][key],
                    key=lambda x: x.get("id", ""),
                )
    if "config" in config and "sample_questions" in config["config"]:
        config["config"]["sample_questions"] = sorted(
            config["config"]["sample_questions"],
            key=lambda x: x.get("id", ""),
        )
    if "benchmarks" in config and "questions" in config["benchmarks"]:
        config["benchmarks"]["questions"] = sorted(
            config["benchmarks"]["questions"],
            key=lambda x: x.get("id", ""),
        )
    return config

Always call sort_genie_config() BEFORE submitting to the API. The canonical implementation lives in 04-genie-optimization-applier/scripts/optimization_applier.py.

Section 9: Idempotent Deployment (Update-or-Create)

To prevent duplicate Genie Spaces on re-deployment, implement an update-or-create pattern:

Store space IDs in databricks.yml variables:

variables:
  genie_space_id_<space_name>:
    description: "Existing Genie Space ID (empty for first deployment)"
    default: ""

Deployment logic:

space_id = dbutils.widgets.get("genie_space_id_<space_name>")

if space_id:
    # UPDATE existing space (PATCH without title to avoid " (updated)" suffix)
    payload = {"serialized_space": json.dumps(space_json)}
    # Do NOT include "title" in PATCH to avoid title mutation
    response = requests.patch(f"{base_url}/api/2.0/genie/spaces/{space_id}", ...)
else:
    # CREATE new space
    response = requests.post(f"{base_url}/api/2.0/genie/spaces", ...)
    new_space_id = response.json()["space"]["id"]
    print(f"Created new space: {new_space_id}")
    print(f"Set variable: genie_space_id_<space_name> = {new_space_id}")

⚠️ PATCH without title: Including title in a PATCH request causes the API to append " (updated)" to the title. Omit title from PATCH payload to preserve the original name.
After first deployment: Record the returned space IDs and set them as databricks.yml variable defaults for subsequent deployments.

Common Errors & Quick Fixes

Error	Cause	Quick Fix
`BAD_REQUEST: Invalid JSON`	sample_questions as strings	Convert to objects with `id` and `question[]`
`BAD_REQUEST: Invalid JSON`	metric_views with `full_name`	Use `identifier` instead
`INTERNAL_ERROR: Failed to retrieve schema`	Missing `id` in sql_functions	Add `id` field (32 hex chars)
`INVALID_PARAMETER_VALUE: Expected array`	`question` is string	Wrap in array: `["question"]`
`Exceeded maximum number (50)`	Too many TVFs/benchmarks	Truncate to 50 in generation script
`expected_sql` field not recognized	Used `expected_sql` instead of `answer`	Use `answer: [{format: "SQL", content: ["SELECT ..."]}]`
`Invalid export proto: data_sources.tables must be sorted by identifier`	Arrays not sorted — sort key is `identifier` (not `table_name`) for tables/metric_views, `id` for all others	Call `sort_genie_config()` before every PATCH (see Section 8)
Invalid ID format	ID is not 32-char hex, contains dashes, or is prefixed	Use `uuid.uuid4().hex` exclusively

See Troubleshooting Guide for detailed fix scripts.

Reference Files

API Reference: Complete API endpoint documentation, request/response schemas, authentication details, Databricks CLI usage
Workflow Patterns: Detailed GenieSpaceExport schema (config, data_sources, instructions, benchmarks), ID generation, serialization patterns, variable substitution, asset inventory-driven generation, complete examples
Troubleshooting: Common production errors with Python fix scripts, validation checklists, deployment checklist, error recovery patterns, field-level format requirements

Assets

assets/templates/genie-deployment-job-template.yml — Standalone Asset Bundle job using notebook_task for Genie Space deployment (for combined deployment, use the orchestrator's semantic-layer-job-template.yml)
assets/templates/deploy_genie_spaces.py — Databricks notebook template for Asset Bundle notebook_task deployment. Uses dbutils.widgets.get() for parameters. Copy to src/{project}_semantic/deploy_genie_spaces.py and customize.

CLI vs Notebook: scripts/import_genie_space.py is the CLI tool (uses argparse) for local/CI use. assets/templates/deploy_genie_spaces.py is the notebook template (uses dbutils.widgets.get()) for Asset Bundle notebook_task deployment.

Scripts

export_genie_space.py: Export Genie Space configurations

python scripts/export_genie_space.py --host <workspace> --token <token> --list
python scripts/export_genie_space.py --host <workspace> --token <token> --space-id <id> --output space.json

import_genie_space.py: Create/update Genie Spaces from JSON

python scripts/import_genie_space.py --host <workspace> --token <token> create \
  --config space.json --title "My Space" --description "..." --warehouse-id <id>

python scripts/import_genie_space.py --host <workspace> --token <token> update \
  --space-id <id> --title "Updated Title"

Production Deployment Checklist

Validate JSON Structure

python scripts/validate_against_reference.py

Validate SQL Queries (if benchmarks present)

databricks bundle run -t dev genie_benchmark_validation_job

Deploy Genie Spaces

databricks bundle deploy -t dev
databricks bundle run -t dev genie_spaces_deployment_job

Verify in UI
- Navigate to Genie Spaces
- Test sample questions
- Verify data sources load correctly

Related Resources

Official Documentation

Related Skills

genie-space-patterns - UI-based Genie Space setup
metric-views-patterns - Metric view YAML creation
databricks-table-valued-functions - TVF patterns for Genie

Genie API Notes to Carry Forward

After completing Genie Space API deployment, carry these notes to the next worker:

Deployed Space IDs: Map of space name → space ID (32-char hex) for each deployed space
Deployment method: Whether spaces were created (POST) or updated (PATCH)
Variable settings for re-deployment: genie_space_id_<name> values to set in databricks.yml for idempotent future deployments
Validation results: Benchmark SQL validation pass/fail counts per space
Cross-environment status: Which environments (dev/staging/prod) have been deployed to

Common Mistakes

Mistake	Consequence	Fix
GET space without `?include_serialized_space=true`	Response contains only top-level metadata (title, description, space_id); `data_assets`, `general_instructions`, and nested config are omitted — space appears empty	Always append `?include_serialized_space=true` to the Get Space endpoint

Next Step

After API deployment is complete:

If this is the first deployment: Record space IDs and set them as databricks.yml variable defaults.
If benchmarks need tuning: Proceed to semantic-layer/05-genie-optimization-orchestrator/SKILL.md for benchmark testing and the 6-lever optimization loop.
If deploying to additional environments: Re-run the deploy notebook with target environment variables.

Version History

v3.6.0 (Feb 22, 2026) — Fixed Section 8 array sorting: corrected sort keys from table_name/materialized_view_name/function_name to identifier/id (matching actual API protobuf requirements). Replaced sort_all_arrays() with sort_genie_config() (canonical implementation in applier). Updated Common Errors with specific error message Invalid export proto: data_sources.tables must be sorted by identifier. Added missing arrays (text_instructions, sample_questions, benchmarks.questions) to sort table.
v2.0 (Feb 2026) — Array sorting requirements (Section 8); idempotent deployment pattern (Section 9); expanded array format table; strengthened ID generation guidance; 3 new common errors; deploy template major rewrite; benchmark SQL validation templates added; Notes to Carry Forward and Next Step for progressive disclosure
v3.0 (January 2026) - Inventory-driven programmatic generation, template variables, 100% deployment success
v2.0 (January 2026) - Production deployment patterns, format validation, 8 common error fixes
v1.0 (January 2026) - Initial schema documentation and API patterns

genie-space-export-import-api