project-planning
Project Plan Methodology for Databricks Solutions
Planning Mode
Default: Data Product Acceleration — full breadth, all domains, all artifacts. This is the standard behavior described in this entire skill document below.
Workshop mode is available for Learning & Enablement scenarios with hard artifact caps. It is NEVER activated unless the user includes the exact phrase planning_mode: workshop in their prompt.
Mode Detection Rules
- Default is ALWAYS
acceleration. If the user does not explicitly declare workshop mode, use acceleration. - Workshop mode requires EXPLICIT opt-in. The user must include one of these EXACT phrases:
planning_mode: workshop"workshop mode""use workshop mode"
- Do NOT infer workshop mode from words like "small", "simple", "demo", "limited", "quick", "basic", "training", or "few". These are NOT triggers. A user may want a narrow-scope acceleration plan — that's still acceleration mode with fewer use cases.
- When in doubt, ask. If the user's intent is ambiguous (e.g., "Create a plan for a workshop"), ask: "Would you like full Data Product Acceleration mode (default) or Workshop mode with limited artifacts? To use workshop mode, include
planning_mode: workshopin your request." - Confirm mode at the start. The first line of any plan output should state the active mode:
**Planning Mode:** Data Product Acceleration (default)**Planning Mode:** Workshop (explicit opt-in — artifact caps active)
- When workshop mode is activated, read
references/workshop-mode-profile.mdfor artifact caps, phase scope, and selection criteria. Do NOT read that reference otherwise. - Propagate mode to manifests. Add
planning_mode: workshoporplanning_mode: accelerationto all generated manifest YAML files. Downstream orchestrators seeingworkshopMUST NOT expand beyond the listed artifacts via self-discovery.
Overview
Comprehensive methodology for creating multi-phase project plans for Databricks data platform solutions. This skill combines interactive project planning with architectural methodology, including templates, worked examples, and quality standards.
Key Assumption: Planning starts AFTER Bronze ingestion and Gold layer design are complete. These are prerequisites, not phases.
When to Use This Skill
Use this skill when:
- Creating architectural plans for Databricks data platform projects
- Building observability, analytics, or monitoring solutions
- Planning multi-artifact solutions (TVFs, Metric Views, Dashboards, Genie Spaces, Alerts, ML Models)
- Developing agent-based frameworks for platform management
- Creating frontend applications for data platform interaction
- Starting a new project after Gold layer is complete
Quick Start (5 Minutes)
Fast Track: Create Your Project Plan
# 1. Verify prerequisites are complete:
# - Bronze ingestion ✅
# - Silver DLT streaming ✅
# - Gold dimensional model ✅
# 2. Run this prompt with your project info:
"Create a phased project plan for {project_name} with:
- Gold tables: {n} tables ({d} dimensions + {f} facts)
- Use cases: {use_case_1, use_case_2, use_case_3, etc.}
- Target audience: {executives, analysts, data scientists}
- Agent domains: {domain1, domain2, domain3, domain4, domain5}"
# 3. Output: Complete plan structure in plans/ folder
Key Decisions (Answer These First)
| Decision | Options | Your Choice |
|---|---|---|
| Agent Domains | Derive from business questions (typically 2-5) | __________ |
| Phase 1 Addendums | TVFs, Metric Views, Dashboards, Monitoring, Genie, Alerts, ML | __________ |
| Phase 2 Scope | AI Agents (optional) or skip | __________ |
| Phase 3 Scope | Frontend App (optional) or skip | __________ |
| Genie Space Count | Based on asset count vs 25-asset limit (see Rationalization) | __________ |
| Agent Architecture | Agents use Genie Spaces (recommended) or Direct SQL | __________ |
| Agent-Genie Mapping | 1:1, consolidated, or unified (based on asset volume) | __________ |
Working Memory Management
This orchestrator spans 3 phases. To maintain coherence without context pollution:
After each phase, persist a brief summary note capturing:
- Phase 1: Domain list with Gold table mappings, addendum selections, business questions per domain, artifact count estimates
- Phase 2: Plan document file paths, cross-references verified, total artifact counts by type
- Phase 3: Manifest file paths (semantic-layer, observability, ml, genai-agents), validation results, summary counts
What to keep in working memory: Current phase's template, domain list + artifact inventory, and previous phase's summary. Discard intermediate outputs — they are on disk. Read templates from assets/templates/ and references just-in-time, not upfront.
Step-by-Step Workflow
Phase 1: Requirements Gathering
Project Information
| Field | Your Value |
|---|---|
| Project Name | {project_name} |
| Business Domain | {hospitality, retail, healthcare, finance, etc.} |
| Primary Use Cases | {use_case_1, use_case_2, use_case_3, etc.} |
| Target Stakeholders | {executives, analysts, data scientists, operations} |
Prerequisites Status
| Layer | Count | Status |
|---|---|---|
| Bronze Tables | {n} | ✅ Complete |
| Silver Tables | {m} | ✅ Complete |
| Gold Dimensions | {d} | ✅ Complete |
| Gold Facts | {f} | ✅ Complete |
Define Agent Domains
Derive domains from your business questions and Gold table groupings (see Artifact Rationalization Framework). Do not force a fixed number — let the data model and use cases determine natural boundaries.
| Domain | Icon | Focus Area | Key Gold Tables | Est. Business Questions |
|---|---|---|---|---|
| {Domain 1} | {emoji} | {focus} | {tables} | {count} |
| {Domain 2} | {emoji} | {focus} | {tables} | {count} |
| ... | ... | ... | ... | ... |
Sizing check: If a domain has < 3 business questions, consider merging it. If two domains share > 70% of Gold tables, consolidate.
See Industry Domain Patterns for examples by industry.
Phase 1 Addendum Selection
| # | Addendum | Include? | Artifact Count |
|---|---|---|---|
| 1.1 | ML Models | {Yes/No} | {count} |
| 1.2 | Table-Valued Functions | {Yes/No} | {count} |
| 1.3 | Metric Views | {Yes/No} | {count} |
| 1.4 | Lakehouse Monitoring | {Yes/No} | {count} |
| 1.5 | AI/BI Dashboards | {Yes/No} | {count} |
| 1.6 | Genie Spaces | {Yes/No} | {count} |
| 1.7 | Alerting Framework | {Yes/No} | {count} |
Key Business Questions by Domain
List 5-10 key questions per domain that the solution must answer:
{Domain 1}:
- {Question 1}
- {Question 2}
- {Question 3}
- {Question 4}
- {Question 5}
Use Case Catalog
After defining business questions and selecting addendums, consolidate into a Use Case Catalog — one entry per distinct analytical or operational problem the solution will address. Each use case ties business questions to the Gold tables and artifacts that solve them. Use assets/templates/use-case-catalog-template.md for the full format.
| UC# | Use Case Name | Domain | Gold Tables | Artifact Types | Example Question |
|---|---|---|---|---|---|
| UC-001 | {Descriptive Name} | {Domain} | fact_*, dim_* |
TVF, MV, Dashboard | "{Natural language question}?" |
| UC-002 | ... | ... | ... | ... | ... |
Use Case Catalog Rules:
- Every use case MUST include 3-5 business questions phrased in natural language
- Every business question from the domain sections above MUST map to at least one use case
- Every artifact in the addendum summaries MUST trace back to at least one use case question
- Questions should be phrased as stakeholders would ask them (these become Genie benchmark candidates)
- Group related questions into a single use case when they share the same Gold tables and grain
See Worked Example: Wanderbricks for 3 fully worked-out use case cards.
Stakeholder Checkpoint: After generating the use case catalog, pause and present the Use Case Summary table to the user for confirmation before proceeding to addendum generation. If the user requests changes, update the catalog and domain questions before continuing.
Phase 2: Plan Document Generation
Create plan documents using templates in the following order:
- README —
assets/templates/plans-readme-template.md(plan index) - Prerequisites —
assets/templates/prerequisites-template.md(data layer summary) - Use Case Catalog —
assets/templates/use-case-catalog-template.md(consolidated use case definitions) - Phase 1 Master —
assets/templates/phase1-use-cases-template.md(analytics artifacts) - Addendums (selected in Phase 1):
- TVFs —
assets/templates/phase1-tvfs-template.md - Alerting —
assets/templates/phase1-alerting-template.md - Genie Spaces —
assets/templates/phase1-genie-spaces-template.md
- TVFs —
- Phase 2 —
assets/templates/phase2-agent-framework-template.md(AI agents) - Phase 3 —
assets/templates/phase3-frontend-template.md(user interface)
Phase 3: Manifest Generation (Plan-as-Contract)
After creating plan documents, generate machine-readable YAML manifests that downstream orchestrators consume as implementation contracts.
Why manifests? The "Extract, Don't Generate" principle applies to the planning-to-implementation handoff. Manifests ensure downstream orchestrators implement exactly what was planned — no missed artifacts, no naming inconsistencies.
MANDATORY: Read the manifest generation guide:
| # | Reference Path | What It Provides |
|---|---|---|
| 1 | references/manifest-generation-guide.md |
Full manifest workflow, validation, consumption pattern |
Steps:
- Review Gold layer YAML schemas in
gold_layer_design/yaml/ - For each plan addendum, extract the concrete artifact definitions
- Generate 4 YAML manifests using templates from
assets/templates/manifests/:plans/manifests/semantic-layer-manifest.yaml— TVFs, Metric Views, Genie Spacesplans/manifests/observability-manifest.yaml— Monitors, Dashboards, Alertsplans/manifests/ml-manifest.yaml— Feature Tables, Models, Experimentsplans/manifests/genai-agents-manifest.yaml— Agents, Tools, Eval Datasets
- For each artifact in a manifest, add
use_case_refslisting the UC# it implements (fromplans/use-case-catalog.md) - Validate all table/column references exist in Gold YAML
- Verify summary counts match actual artifact counts
- Run
python scripts/validate_use_case_coverage.py plans/use-case-catalog.mdto verify coverage - Commit manifests alongside plan documents
Key principle: Every artifact in a manifest MUST trace back to (a) a Gold layer table and (b) a business question from the plan addendum.
Output Structure:
plans/
├── use-case-catalog.md # Consolidated use case definitions
├── manifests/
│ ├── semantic-layer-manifest.yaml # → consumed by semantic-layer/00-*
│ ├── observability-manifest.yaml # → consumed by monitoring/00-*
│ ├── ml-manifest.yaml # → consumed by ml/00-*
│ └── genai-agents-manifest.yaml # → consumed by genai-agents/00-*
Downstream consumption: Each downstream orchestrator (stages 6-9) has a Phase 0: Read Plan step that reads its manifest. If the manifest doesn't exist (e.g., user skipped Planning), the orchestrator falls back to self-discovery from Gold tables.
Plan Structure Framework
Standard Project Phases
plans/
├── README.md # Index and overview
├── use-case-catalog.md # Consolidated use case definitions
├── prerequisites.md # Bronze/Silver/Gold summary (optional)
├── phase1-use-cases.md # Analytics artifacts (master)
│ ├── phase1-addendum-1.1-ml-models.md
│ ├── phase1-addendum-1.2-tvfs.md
│ ├── phase1-addendum-1.3-metric-views.md
│ ├── phase1-addendum-1.4-lakehouse-monitoring.md
│ ├── phase1-addendum-1.5-aibi-dashboards.md
│ ├── phase1-addendum-1.6-genie-spaces.md
│ └── phase1-addendum-1.7-alerting.md
├── phase2-agent-framework.md # AI Agents
├── phase3-frontend-app.md # User Interface
└── manifests/ # Machine-readable contracts
├── semantic-layer-manifest.yaml # → semantic-layer/00-*
├── observability-manifest.yaml # → monitoring/00-*
├── ml-manifest.yaml # → ml/00-*
└── genai-agents-manifest.yaml # → genai-agents/00-*
Phase Dependencies
Prerequisites (Bronze → Silver → Gold) → Phase 1 (Use Cases) → Phase 2 (Agents) → Phase 3 (Frontend)
[COMPLETE] ↓
All Addendums
Agent Domain Framework
Core Principle
ALL artifacts across ALL phases MUST be organized by Agent Domain. This ensures:
- Consistent categorization across 100+ artifacts
- Clear ownership by future AI agents
- Easy discoverability for users
- Aligned tooling for each domain
Agent Domain Application
Every artifact (TVF, Metric View, Dashboard, Alert, ML Model, Monitor, Genie Space) must:
- Be tagged with its Agent Domain
- Use the domain's Gold tables
- Answer domain-specific questions
- Be grouped with related domain artifacts in documentation
Example Pattern:
## {Domain}: get_{metric}_by_{dimension}
**Agent Domain:** {Domain}
**Gold Tables:** `fact_{entity}`, `dim_{entity}`
**Business Questions:** "What are the top {metric} by {dimension}?"
See Industry Domain Patterns for domain templates by industry.
Agent Layer Architecture Pattern
Core Principle: Agents Use Genie Spaces as Query Interface
AI Agents DO NOT query data assets directly. Instead, they use Genie Spaces as their natural language query interface. Genie Spaces translate natural language to SQL and route to appropriate tools.
USERS (Natural Language)
↓
PHASE 2: AI AGENT LAYER (LangChain/LangGraph)
├── Orchestrator Agent (intent classification)
└── Specialized Agents (1 per domain)
↓
PHASE 1.6: GENIE SPACES (NL Query Execution)
├── {Domain 1} Intelligence Genie Space
├── {Domain 2} Intelligence Genie Space
└── Unified {Project} Monitor
↓
PHASE 1: DATA ASSETS (Agent Tools)
├── Metric Views (pre-aggregated - use FIRST)
├── TVFs (parameterized queries)
├── ML Predictions (ML-powered insights)
└── Lakehouse Monitors (drift detection)
↓
PREREQUISITES: GOLD LAYER (Foundation)
Deployment Order (Critical!)
Genie Spaces MUST be deployed BEFORE agents can use them.
Phase 1.1-1.5 (Data Assets) → Phase 1.6 (Genie Spaces) → Phase 2 (Agents)
↓ ↓ ↓
Build foundation Create NL interface Consume interface
For detailed architecture, design patterns, "Why Genie Spaces" comparison, and testing strategy, see Agent Layer Architecture.
Artifact Rationalization Framework
MANDATORY: Read references/rationalization-framework.md for complete sizing guides, decision matrices, and naming conventions.
Core Principle: Every artifact must trace to a specific business question. Do not create artifacts to fill quotas.
Critical constraints (always enforce, even without reading the reference):
- Genie Spaces: max 25 assets per space; 10-25 per space is optimal; <10 = merge spaces
- TVFs: only when Metric Views cannot answer the question (requires parameterized multi-table logic)
- Metric Views: one per distinct analytical grain, not per domain
- Domains: emerge from business questions (min 3 questions per domain); merge if >70% Gold table overlap
- Naming:
get_{domain}_{metric}for TVFs,{domain}_analytics_metricsfor Metric Views
SQL Query Standards
ALWAYS use Gold layer tables, NEVER system tables directly. Reference pattern: ${catalog}.${gold_schema}.table_name
- Date parameters:
STRINGtype (Genie compatible), cast at query time:CAST(start_date AS DATE) - SCD Type 2 joins:
LEFT JOIN dim_{entity} d ON f.{entity}_id = d.{entity}_id AND d.is_current = TRUE
Documentation Quality Standards
LLM-Friendly Comments — All artifacts must include: what it does, when to use it, example questions it answers. Pattern: COMMENT 'LLM: Returns top N {metric}... Example questions: "What are the top 10...?"'
Summary Tables — Every addendum must include: overview table (all artifacts with domain, dependencies, status), by-domain sections, count summary, and success criteria.
Common Mistakes to Avoid
| Mistake | Correct Approach |
|---|---|
Querying system.* tables directly |
Always use Gold layer: ${catalog}.${gold_schema}.fact_* |
| Omitting Agent Domain on artifacts | Every artifact must be tagged: ## {Domain}: get_{metric} |
| Adding a TVF without cross-addendum check | Also consider: Metric View counterpart? Alert? Dashboard? |
Using DATE type in TVF parameters |
Use STRING COMMENT 'Format: YYYY-MM-DD' (Genie compatible) |
| Deploying agents before Genie Spaces | Genie Spaces MUST be deployed first — agents consume them |
| Genie Space with 25+ assets | Split by domain cohesion; each space 10-25 assets |
| One Genie Space per domain when assets are thin | Consolidate thin domains (<10 assets) into fewer spaces |
| TVF that duplicates a Metric View | TVFs only when multi-period/multi-table parameterized logic is needed |
| Forcing a fixed domain count | Let business questions determine domains — 2-3 focused > 5-6 thin |
Reference Files
- Phase Details — Full phase and addendum descriptions with deliverables
- Estimation Guide — Effort estimation, dependency management, risks
- Agent Layer Architecture — Detailed architecture, "Why Genie Spaces" comparison, design patterns, testing strategy, multi-agent query example
- Industry Domain Patterns — Domain templates for Hospitality, Retail, Healthcare, Finance, SaaS, and Databricks System Tables
- Worked Example: Wanderbricks — Complete 101-artifact project example with TVF SQL, Metric View YAML, Alert YAML
- Manifest Generation Guide — Plan-as-contract pattern: how to generate YAML manifests for downstream orchestrators
Assets
Plan Templates
- Project Plan Template — Generic phase template with SQL standards
- Prerequisites Template — Data layer summary (Bronze/Silver/Gold)
- Use Case Catalog Template — Consolidated use case definitions with business questions
- Phase 1 Use Cases Template — Master analytics artifacts
- Phase 1 TVFs Template — Table-Valued Functions addendum
- Phase 1 Alerting Template — Alerting framework addendum
- Phase 1 Genie Spaces Template — Genie Spaces addendum with Agent readiness
- Phase 2 Agent Framework Template — AI agents with Genie integration
- Phase 3 Frontend Template — User interface
- Plans README Template — plans/ folder index
Manifest Templates (Plan-as-Contract)
- Semantic Layer Manifest — TVFs, Metric Views, Genie Spaces contract
- Observability Manifest — Monitors, Dashboards, Alerts contract
- ML Manifest — Feature Tables, Models, Experiments contract
- GenAI Agents Manifest — Agents, Tools, Eval Datasets contract
Validation Checklist
Structure
- Follows standard template
- Has Overview with Status, Dependencies, Effort
- Organized by Agent Domain
- Includes code examples
- Has Success Criteria table
- Has References section
Content Quality
- All queries use Gold layer tables (not system tables)
- All artifacts tagged with Agent Domain
- LLM-friendly comments on all artifacts
- Examples use
${catalog}.${gold_schema}variables - Summary tables are accurate and complete
Cross-References
- Main phase document links to addendums
- Addendums link back to main phase
- Related artifacts cross-reference each other
- Dependencies are documented
Use Case Traceability
- Use case catalog exists with one entry per distinct business problem
- Every use case includes 3-5 business questions in natural language
- Every business question from domain sections maps to at least one use case
- Every artifact in addendum summaries traces back to at least one use case question
- Use case catalog cross-references addendum documents
Completeness
- Domains derived from business questions (not forced to a fixed count)
- Every TVF traces to a business question that Metric Views cannot answer
- Every Metric View covers a distinct analytical grain (no duplicates)
- Key business questions documented per domain (≥3 per domain)
- All Phase 1 addendums included
- User requirements addressed
- Reference patterns incorporated
Rationalization (Prevent Bloat)
- Each Genie Space has ≤ 25 data assets
- No Genie Space has < 10 assets (merge thin spaces)
- Genie Space count justified by asset volume (not just domain count)
- No TVF duplicates a Metric View query
- No domain has < 3 distinct business questions (merge small domains)
- Domains with >70% Gold table overlap are consolidated
Agent Layer Architecture (If Phase 2 Included)
- Agent-to-Genie Space mapping documented (1:1 recommended)
- Deployment order specified (Genie Spaces before Agents)
- Three-level testing strategy defined
- Orchestrator agent included for multi-domain coordination
- Genie Space instructions documented (become agent system prompts)
- Agent tool definitions reference Genie Spaces (not direct SQL)
Key Learnings
- Agent Domain framework provides consistent organization across all artifacts — every artifact gets a domain tag
- Gold layer references only — never query
system.*tables directly; use${catalog}.${gold_schema}.* - Cross-addendum updates — user requirements span multiple addendums; update all affected documents
- LLM-friendly comments are critical for Genie/AI/BI integration — include example questions
- Agents use Genie Spaces as abstraction — agents don't write SQL; Genie handles NL-to-SQL translation, optimization, and guardrails
- 1:1 Agent-to-Genie mapping recommended; Orchestrator agent uses Unified Genie Space for intent classification
- Deploy Genie Spaces before agents — three-level testing: assets → Genie → Agents
- Genie Space 25-asset hard limit — plan space count from total asset volume, not domain count; fewer focused spaces > many thin ones
- Rationalize before creating — every artifact must trace to a business question; TVFs only when Metric Views can't answer
- Domains emerge from data — business questions and Gold table groupings determine natural domain boundaries
References
Official Documentation
- Databricks Docs
- Unity Catalog
- Delta Live Tables
- Lakehouse Monitoring
- Metric Views
- Genie Spaces
- Model Serving
- Foundation Models (DBRX)
- Databricks System Tables
- SQL Alerts
- Table-Valued Functions
Related Skills
- databricks-table-valued-functions
- metric-views-patterns
- lakehouse-monitoring-comprehensive
- databricks-aibi-dashboards
- genie-space-patterns — Genie Space setup for agents
Agent Framework Technologies
Pipeline Progression
Previous stage: gold/01-gold-layer-setup → Gold layer tables and merge scripts should be complete
Next stage: After completing the project plan for remaining phases, proceed to:
semantic-layer/00-semantic-layer-setup— Build Metric Views, TVFs, and Genie Spaces on top of Gold
Post-Completion: Skill Usage Summary (MANDATORY)
After completing all phases of this orchestrator, output a Skill Usage Summary reflecting what you ACTUALLY did — not a pre-written summary.
What to Include
- Every skill
SKILL.mdorreferences/file you read (via the Read tool), in the order you read them - Which phase you were in when you read it
- Whether it was a Common, Reference, or Template file
- A one-line description of what you specifically used it for in this session
Format
| # | Phase | Skill / Reference Read | Type | What It Was Used For |
|---|---|---|---|---|
| 1 | Phase N | path/to/SKILL.md |
Common / Reference / Template | One-line description |
Summary Footer
End with:
- Totals: X common skills, Y reference files, Z templates read across N phases
- Manifests emitted: List each manifest file generated and its artifact count
- Skipped: List any expected references or templates that you did NOT need to read, and why
- Unplanned: List any skills you read that were NOT listed in the dependency table (e.g., for troubleshooting, edge cases, or user-requested detours)