genie-space-export-import-api
Genie Space Export/Import API
Overview
This skill provides comprehensive patterns for programmatically creating, exporting, and importing Databricks Genie Spaces via the REST API. It covers the complete GenieSpaceExport JSON schema, API endpoints, common deployment errors, and production-ready workflows including variable substitution and asset inventory-driven generation.
When to Use This Skill
Use this skill when you need to:
- Programmatically deploy Genie Spaces via REST API (CI/CD pipelines, environment promotion)
- Export Genie Space configurations for version control, backup, or migration
- Troubleshoot API deployment errors (
BAD_REQUEST,INVALID_PARAMETER_VALUE,INTERNAL_ERROR) - Implement cross-workspace deployment with template variable substitution
- Generate Genie Spaces from asset inventories to prevent non-existent table errors
- Validate Genie Space JSON structure before deployment
- Understand the complete GenieSpaceExport schema (config, data_sources, instructions, benchmarks)
Quick Reference
API Operations
| Operation | Method | Endpoint | Use Case |
|---|---|---|---|
| List Spaces | GET | /api/2.0/genie/spaces |
Discover existing spaces |
| Get Space | GET | /api/2.0/genie/spaces/{space_id}?include_serialized_space=true |
Export config, backup |
| Create Space | POST | /api/2.0/genie/spaces |
New deployment, CI/CD |
| Update Space | PATCH | /api/2.0/genie/spaces/{space_id} |
Modify config, add benchmarks |
| Delete Space | DELETE | /api/2.0/genie/spaces/{space_id} |
Cleanup, teardown |
API Limits
| Resource | Limit | Enforcement |
|---|---|---|
instructions.sql_functions |
Max 50 | Truncate in generation script |
benchmarks.questions |
Max 50 | Truncate in generation script |
data_sources.tables |
No hard limit | Keep ~25-30 for performance |
data_sources.metric_views |
No hard limit | Keep ~5-10 per space |
Core Workflow
Initial Deployment:
- List spaces (check if already exists)
- Load configuration from JSON file
- Substitute template variables (
${catalog},${gold_schema}, etc.) - Create space with full configuration
- Get space to verify deployment
Incremental Updates:
- Get current space configuration
- Modify specific sections (e.g., add benchmarks)
- Update space with PATCH (partial update)
Migration/Backup:
- Get space with
include_serialized_space=true - Save JSON to version control
- Create space in new environment (with variable substitution)
Key Patterns
1. JSON Structure Requirements
CRITICAL: The serialized_space field must be a JSON string (escaped), not a nested object:
payload = {
"title": "My Space",
"warehouse_id": "abc123",
"serialized_space": json.dumps(genie_config) # ✅ String, not dict
}
Section 4: ID Generation
All IDs MUST be uuid.uuid4().hex — a 32-character lowercase hex string with no dashes.
import uuid
def generate_id() -> str:
"""Generate a Genie Space compatible ID (32 hex chars, no dashes)."""
return uuid.uuid4().hex # e.g., "a1b2c3d4e5f6789012345678abcdef01"
Required ID fields (every one must be a fresh uuid.uuid4().hex):
space.idspace.tables[].idspace.sql_functions[].idspace.example_question_sqls[].idspace.materialized_views[].id
❌ WRONG IDs (will cause import failures):
"genie_" + uuid.uuid4().hex[:24] # ❌ Prefixed, wrong length
"aaaa" * 8 # ❌ Not random
str(uuid.uuid4()) # ❌ Contains dashes (36 chars)
hashlib.md5(name.encode()).hexdigest() # ❌ Deterministic, not UUID4
✅ CORRECT: Always use uuid.uuid4().hex — nothing else.
Section 5: Array Format Requirements
ALL string-content fields in the Genie Space JSON MUST be single-element arrays, not plain strings.
| Field | ❌ Wrong | ✅ Correct |
|---|---|---|
question |
"What is revenue?" |
["What is revenue?"] |
content (in answer) |
"SELECT ..." |
["SELECT ..."] |
description (tables) |
"Orders table" |
["Orders table"] |
description (MVs) |
"Revenue metrics" |
["Revenue metrics"] |
description (TVFs) |
"Date range query" |
["Date range query"] |
Rule: If a field contains human-readable text or SQL, wrap it in a single-element array ["value"].
Exception: format in answer objects is a plain string: "SQL" or "INSTRUCTIONS".
4. Template Variable Substitution
NEVER hardcode schema paths. Use template variables:
{
"data_sources": {
"tables": [
{"identifier": "${catalog}.${gold_schema}.dim_store"} // ✅ Template
]
}
}
Substitute at runtime:
def substitute_variables(data: dict, variables: dict) -> dict:
json_str = json.dumps(data)
json_str = json_str.replace("${catalog}", variables.get('catalog', ''))
json_str = json_str.replace("${gold_schema}", variables.get('gold_schema', ''))
return json.loads(json_str)
5. Asset Inventory-Driven Generation
NEVER manually edit data_sources. Generate from verified inventory:
# Load inventory
with open('actual_assets_inventory.json') as f:
inventory = json.load(f)
# Generate data_sources from inventory
genie_config['data_sources']['tables'] = [
{"identifier": table_id}
for table_id in inventory['genie_space_mappings']['cost_intelligence']['tables']
]
Benefits:
- ✅ Prevents "table doesn't exist" errors
- ✅ Enforces API limits automatically
- ✅ Single source of truth for assets
6. Column Configs Warning
column_configs triggers Unity Catalog validation that can fail for complex spaces:
{
"data_sources": {
"metric_views": [
{
"identifier": "catalog.schema.mv_sales"
// ✅ Start without column_configs for reliable deployment
}
]
}
}
Trade-off:
- Without column_configs: Reliable deployment, less LLM context
- With column_configs: More LLM context, higher risk of
INTERNAL_ERROR
7. Field Validation Rules
config.sample_questions:
- ✅ Array of objects (not strings)
- ✅ Each object:
{id: string, question: string[]} - ❌ NO
name,descriptionfields
data_sources.metric_views:
- ✅
identifierfield (full 3-part UC name) - ✅ Optional:
description,column_configs - ❌ NO
id,name,full_namefields
instructions.sql_functions:
- ✅
idfield (32 hex chars) - REQUIRED - ✅
identifierfield (full 3-part function name) - REQUIRED - ❌ NO other fields (
name,signature,description)
Section 8: Array Sorting Requirements
CRITICAL: All arrays in the Genie Space JSON MUST be sorted before any PATCH request. The Genie API uses protobuf serialization which requires deterministic ordering. Unsorted arrays produce: Invalid export proto: data_sources.tables must be sorted by identifier.
Sort keys by array path:
| Array Path | Sort Key | Direction |
|---|---|---|
data_sources.tables |
identifier |
Ascending |
data_sources.metric_views |
identifier |
Ascending |
instructions.sql_functions |
(id, identifier) |
Ascending |
instructions.text_instructions |
id |
Ascending |
instructions.example_question_sqls |
id |
Ascending |
config.sample_questions |
id |
Ascending |
benchmarks.questions |
id |
Ascending |
Implementation — sort_genie_config():
def sort_genie_config(config: dict) -> dict:
"""Sort all arrays in Genie config — API rejects unsorted data."""
if "data_sources" in config:
for key in ["tables", "metric_views"]:
if key in config["data_sources"]:
config["data_sources"][key] = sorted(
config["data_sources"][key],
key=lambda x: x.get("identifier", ""),
)
if "instructions" in config:
if "sql_functions" in config["instructions"]:
config["instructions"]["sql_functions"] = sorted(
config["instructions"]["sql_functions"],
key=lambda x: (x.get("id", ""), x.get("identifier", "")),
)
for key in ["text_instructions", "example_question_sqls"]:
if key in config["instructions"]:
config["instructions"][key] = sorted(
config["instructions"][key],
key=lambda x: x.get("id", ""),
)
if "config" in config and "sample_questions" in config["config"]:
config["config"]["sample_questions"] = sorted(
config["config"]["sample_questions"],
key=lambda x: x.get("id", ""),
)
if "benchmarks" in config and "questions" in config["benchmarks"]:
config["benchmarks"]["questions"] = sorted(
config["benchmarks"]["questions"],
key=lambda x: x.get("id", ""),
)
return config
Always call sort_genie_config() BEFORE submitting to the API. The canonical implementation lives in 04-genie-optimization-applier/scripts/optimization_applier.py.
Section 9: Idempotent Deployment (Update-or-Create)
To prevent duplicate Genie Spaces on re-deployment, implement an update-or-create pattern:
- Store space IDs in
databricks.ymlvariables:
variables:
genie_space_id_<space_name>:
description: "Existing Genie Space ID (empty for first deployment)"
default: ""
- Deployment logic:
space_id = dbutils.widgets.get("genie_space_id_<space_name>")
if space_id:
# UPDATE existing space (PATCH without title to avoid " (updated)" suffix)
payload = {"serialized_space": json.dumps(space_json)}
# Do NOT include "title" in PATCH to avoid title mutation
response = requests.patch(f"{base_url}/api/2.0/genie/spaces/{space_id}", ...)
else:
# CREATE new space
response = requests.post(f"{base_url}/api/2.0/genie/spaces", ...)
new_space_id = response.json()["space"]["id"]
print(f"Created new space: {new_space_id}")
print(f"Set variable: genie_space_id_<space_name> = {new_space_id}")
-
⚠️ PATCH without title: Including
titlein a PATCH request causes the API to append " (updated)" to the title. Omittitlefrom PATCH payload to preserve the original name. -
After first deployment: Record the returned space IDs and set them as
databricks.ymlvariable defaults for subsequent deployments.
Common Errors & Quick Fixes
| Error | Cause | Quick Fix |
|---|---|---|
BAD_REQUEST: Invalid JSON |
sample_questions as strings | Convert to objects with id and question[] |
BAD_REQUEST: Invalid JSON |
metric_views with full_name |
Use identifier instead |
INTERNAL_ERROR: Failed to retrieve schema |
Missing id in sql_functions |
Add id field (32 hex chars) |
INVALID_PARAMETER_VALUE: Expected array |
question is string |
Wrap in array: ["question"] |
Exceeded maximum number (50) |
Too many TVFs/benchmarks | Truncate to 50 in generation script |
expected_sql field not recognized |
Used expected_sql instead of answer |
Use answer: [{format: "SQL", content: ["SELECT ..."]}] |
Invalid export proto: data_sources.tables must be sorted by identifier |
Arrays not sorted — sort key is identifier (not table_name) for tables/metric_views, id for all others |
Call sort_genie_config() before every PATCH (see Section 8) |
| Invalid ID format | ID is not 32-char hex, contains dashes, or is prefixed | Use uuid.uuid4().hex exclusively |
See Troubleshooting Guide for detailed fix scripts.
Reference Files
- API Reference: Complete API endpoint documentation, request/response schemas, authentication details, Databricks CLI usage
- Workflow Patterns: Detailed GenieSpaceExport schema (config, data_sources, instructions, benchmarks), ID generation, serialization patterns, variable substitution, asset inventory-driven generation, complete examples
- Troubleshooting: Common production errors with Python fix scripts, validation checklists, deployment checklist, error recovery patterns, field-level format requirements
Assets
assets/templates/genie-deployment-job-template.yml— Standalone Asset Bundle job usingnotebook_taskfor Genie Space deployment (for combined deployment, use the orchestrator'ssemantic-layer-job-template.yml)assets/templates/deploy_genie_spaces.py— Databricks notebook template for Asset Bundlenotebook_taskdeployment. Usesdbutils.widgets.get()for parameters. Copy tosrc/{project}_semantic/deploy_genie_spaces.pyand customize.
CLI vs Notebook:
scripts/import_genie_space.pyis the CLI tool (usesargparse) for local/CI use.assets/templates/deploy_genie_spaces.pyis the notebook template (usesdbutils.widgets.get()) for Asset Bundlenotebook_taskdeployment.
Scripts
-
export_genie_space.py: Export Genie Space configurations
python scripts/export_genie_space.py --host <workspace> --token <token> --list python scripts/export_genie_space.py --host <workspace> --token <token> --space-id <id> --output space.json -
import_genie_space.py: Create/update Genie Spaces from JSON
python scripts/import_genie_space.py --host <workspace> --token <token> create \ --config space.json --title "My Space" --description "..." --warehouse-id <id> python scripts/import_genie_space.py --host <workspace> --token <token> update \ --space-id <id> --title "Updated Title"
Production Deployment Checklist
-
Validate JSON Structure
python scripts/validate_against_reference.py -
Validate SQL Queries (if benchmarks present)
databricks bundle run -t dev genie_benchmark_validation_job -
Deploy Genie Spaces
databricks bundle deploy -t dev databricks bundle run -t dev genie_spaces_deployment_job -
Verify in UI
- Navigate to Genie Spaces
- Test sample questions
- Verify data sources load correctly
Related Resources
Official Documentation
Related Skills
genie-space-patterns- UI-based Genie Space setupmetric-views-patterns- Metric view YAML creationdatabricks-table-valued-functions- TVF patterns for Genie
Genie API Notes to Carry Forward
After completing Genie Space API deployment, carry these notes to the next worker:
- Deployed Space IDs: Map of space name → space ID (32-char hex) for each deployed space
- Deployment method: Whether spaces were created (POST) or updated (PATCH)
- Variable settings for re-deployment:
genie_space_id_<name>values to set indatabricks.ymlfor idempotent future deployments - Validation results: Benchmark SQL validation pass/fail counts per space
- Cross-environment status: Which environments (dev/staging/prod) have been deployed to
Common Mistakes
| Mistake | Consequence | Fix |
|---|---|---|
GET space without ?include_serialized_space=true |
Response contains only top-level metadata (title, description, space_id); data_assets, general_instructions, and nested config are omitted — space appears empty |
Always append ?include_serialized_space=true to the Get Space endpoint |
Next Step
After API deployment is complete:
- If this is the first deployment: Record space IDs and set them as
databricks.ymlvariable defaults. - If benchmarks need tuning: Proceed to
semantic-layer/05-genie-optimization-orchestrator/SKILL.mdfor benchmark testing and the 6-lever optimization loop. - If deploying to additional environments: Re-run the deploy notebook with target environment variables.
Version History
- v3.6.0 (Feb 22, 2026) — Fixed Section 8 array sorting: corrected sort keys from
table_name/materialized_view_name/function_nametoidentifier/id(matching actual API protobuf requirements). Replacedsort_all_arrays()withsort_genie_config()(canonical implementation in applier). Updated Common Errors with specific error messageInvalid export proto: data_sources.tables must be sorted by identifier. Added missing arrays (text_instructions,sample_questions,benchmarks.questions) to sort table. - v2.0 (Feb 2026) — Array sorting requirements (Section 8); idempotent deployment pattern (Section 9); expanded array format table; strengthened ID generation guidance; 3 new common errors; deploy template major rewrite; benchmark SQL validation templates added; Notes to Carry Forward and Next Step for progressive disclosure
- v3.0 (January 2026) - Inventory-driven programmatic generation, template variables, 100% deployment success
- v2.0 (January 2026) - Production deployment patterns, format validation, 8 common error fixes
- v1.0 (January 2026) - Initial schema documentation and API patterns