deployment-automation
Deployment Automation Patterns
Production-grade patterns for automating GenAI agent deployment with MLflow job triggers, dataset lineage, evaluation-then-promote workflows, and proper experiment organization.
When to Use
- Setting up CI/CD pipelines for GenAI agents
- Automating model deployment with evaluation gates
- Linking evaluation datasets for traceability
- Implementing evaluation-then-promote workflows
- Organizing MLflow experiments for agent development
- Troubleshooting deployment job failures
Deployment Job Trigger (MODEL_VERSION_CREATED)
Deployment jobs automatically trigger when a new model version is created, enabling CI/CD workflows.
# Asset Bundle configuration
resources:
jobs:
deploy_agent_job:
name: deploy-agent (serverless)
trigger:
type: MODEL_VERSION_CREATED
model_name: "health_monitor_agent"
stages: ["None"] # Trigger on new versions
tasks:
- task_key: evaluate_and_deploy
# ... evaluation and deployment logic
For complete deployment job patterns, see: references/deployment-job-patterns.md
Dataset Linking Overview (mlflow.log_input)
CRITICAL: Always link evaluation datasets using mlflow.log_input() for traceability.
import mlflow
from mlflow.data import from_spark
# Load evaluation dataset
eval_df = spark.table("gold.evaluation.agent_eval_dataset")
# Link dataset to run
with mlflow.start_run():
mlflow.log_input(
from_spark(eval_df),
context="evaluation"
)
# Run evaluation
results = mlflow.genai.evaluate(...)
Why this matters:
- Enables dataset lineage tracking
- Links evaluation results to specific dataset versions
- Required for production audit trails
- Enables dataset impact analysis
For complete dataset lineage patterns, see: references/dataset-lineage.md
Three Experiments (dev, eval, deploy)
Organize agent development across three experiments for clear separation of concerns.
| Experiment | Purpose | Run Naming |
|---|---|---|
| EXPERIMENT_DEVELOPMENT | Agent development and testing | dev_YYYYMMDD_HHMMSS |
| EXPERIMENT_EVALUATION | Pre-deployment evaluation | eval_pre_deploy_YYYYMMDD_HHMMSS |
| EXPERIMENT_DEPLOYMENT | Production deployment tracking | deploy_YYYYMMDD_HHMMSS |
EXPERIMENT_DEVELOPMENT = "/Shared/health_monitor_agent/development"
EXPERIMENT_EVALUATION = "/Shared/health_monitor_agent/evaluation"
EXPERIMENT_DEPLOYMENT = "/Shared/health_monitor_agent/deployment"
For complete experiment organization patterns, see: references/experiment-organization.md
Run Naming Conventions
ALWAYS use consistent run naming for programmatic querying and CI/CD integration.
from datetime import datetime
# Development runs
run_name = f"dev_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
# Evaluation runs
run_name = f"eval_pre_deploy_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
# Deployment runs
run_name = f"deploy_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
Why this matters:
- Enables automated threshold checking
- CI/CD pipelines can query latest results
- Clear audit trail for deployments
Promotion Workflow Overview
Evaluation-then-promote pattern ensures only high-quality models reach production.
def evaluate_and_promote(model_uri: str, eval_dataset: DataFrame):
"""
Evaluate model, then promote if thresholds met.
"""
# Step 1: Run evaluation
results = mlflow.genai.evaluate(
model=model_uri,
data=eval_dataset,
evaluators=evaluators
)
# Step 2: Check thresholds
if check_thresholds(results):
# Step 3: Promote to production
mlflow.set_registered_model_alias(
name="health_monitor_agent",
alias="production",
version=model_version
)
else:
raise DeploymentThresholdError("Evaluation thresholds not met")
For complete promotion patterns, see: references/model-promotion.md
Validation Checklist
Before deploying automated deployment workflows:
Deployment Job Configuration
- Job configured with
MODEL_VERSION_CREATEDtrigger - Model name matches registered model name
- Stages configured correctly (typically
["None"]) - Job runs in serverless environment
Dataset Lineage
- ✅
mlflow.log_input()used for evaluation datasets - ✅
from_spark()used for Spark DataFrames - ✅ Context set to "evaluation"
- Evaluation dataset stored in Unity Catalog
Experiment Organization
- Three experiments created (dev, eval, deploy)
- Run naming conventions followed
- Standard tags applied to runs
- Experiment paths use
/Shared/prefix
Promotion Workflow
- Threshold checking implemented
- Alias management configured (champion, production, staging)
- Error handling for threshold failures
- Deployment logging implemented
Reference Files
references/deployment-job-patterns.md- Complete deployment job flow and trigger configurationreferences/dataset-lineage.md-mlflow.log_input()patterns and dataset trackingreferences/experiment-organization.md- Three-experiment structure and run namingreferences/model-promotion.md- Alias management and promotion logicassets/templates/deployment-job.yml- Asset Bundle YAML template for deployment jobs
References
Official Documentation
Related Skills
mlflow-genai-evaluation- Agent evaluation patternsresponses-agent-patterns- ResponsesAgent implementationdatabricks-asset-bundles- Asset Bundle configuration patterns
Version History
| Date | Changes |
|---|---|
| Feb 6, 2026 | Initial version: Deployment automation with dataset lineage |