deployment-automation

SKILL.md

Deployment Automation Patterns

Production-grade patterns for automating GenAI agent deployment with MLflow job triggers, dataset lineage, evaluation-then-promote workflows, and proper experiment organization.

When to Use

  • Setting up CI/CD pipelines for GenAI agents
  • Automating model deployment with evaluation gates
  • Linking evaluation datasets for traceability
  • Implementing evaluation-then-promote workflows
  • Organizing MLflow experiments for agent development
  • Troubleshooting deployment job failures

Deployment Job Trigger (MODEL_VERSION_CREATED)

Deployment jobs automatically trigger when a new model version is created, enabling CI/CD workflows.

# Asset Bundle configuration
resources:
  jobs:
    deploy_agent_job:
      name: deploy-agent (serverless)
      trigger:
        type: MODEL_VERSION_CREATED
        model_name: "health_monitor_agent"
        stages: ["None"]  # Trigger on new versions
      tasks:
        - task_key: evaluate_and_deploy
          # ... evaluation and deployment logic

For complete deployment job patterns, see: references/deployment-job-patterns.md


Dataset Linking Overview (mlflow.log_input)

CRITICAL: Always link evaluation datasets using mlflow.log_input() for traceability.

import mlflow
from mlflow.data import from_spark

# Load evaluation dataset
eval_df = spark.table("gold.evaluation.agent_eval_dataset")

# Link dataset to run
with mlflow.start_run():
    mlflow.log_input(
        from_spark(eval_df),
        context="evaluation"
    )
    
    # Run evaluation
    results = mlflow.genai.evaluate(...)

Why this matters:

  • Enables dataset lineage tracking
  • Links evaluation results to specific dataset versions
  • Required for production audit trails
  • Enables dataset impact analysis

For complete dataset lineage patterns, see: references/dataset-lineage.md


Three Experiments (dev, eval, deploy)

Organize agent development across three experiments for clear separation of concerns.

Experiment Purpose Run Naming
EXPERIMENT_DEVELOPMENT Agent development and testing dev_YYYYMMDD_HHMMSS
EXPERIMENT_EVALUATION Pre-deployment evaluation eval_pre_deploy_YYYYMMDD_HHMMSS
EXPERIMENT_DEPLOYMENT Production deployment tracking deploy_YYYYMMDD_HHMMSS
EXPERIMENT_DEVELOPMENT = "/Shared/health_monitor_agent/development"
EXPERIMENT_EVALUATION = "/Shared/health_monitor_agent/evaluation"
EXPERIMENT_DEPLOYMENT = "/Shared/health_monitor_agent/deployment"

For complete experiment organization patterns, see: references/experiment-organization.md


Run Naming Conventions

ALWAYS use consistent run naming for programmatic querying and CI/CD integration.

from datetime import datetime

# Development runs
run_name = f"dev_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

# Evaluation runs
run_name = f"eval_pre_deploy_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

# Deployment runs
run_name = f"deploy_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

Why this matters:

  • Enables automated threshold checking
  • CI/CD pipelines can query latest results
  • Clear audit trail for deployments

Promotion Workflow Overview

Evaluation-then-promote pattern ensures only high-quality models reach production.

def evaluate_and_promote(model_uri: str, eval_dataset: DataFrame):
    """
    Evaluate model, then promote if thresholds met.
    """
    # Step 1: Run evaluation
    results = mlflow.genai.evaluate(
        model=model_uri,
        data=eval_dataset,
        evaluators=evaluators
    )
    
    # Step 2: Check thresholds
    if check_thresholds(results):
        # Step 3: Promote to production
        mlflow.set_registered_model_alias(
            name="health_monitor_agent",
            alias="production",
            version=model_version
        )
    else:
        raise DeploymentThresholdError("Evaluation thresholds not met")

For complete promotion patterns, see: references/model-promotion.md


Validation Checklist

Before deploying automated deployment workflows:

Deployment Job Configuration

  • Job configured with MODEL_VERSION_CREATED trigger
  • Model name matches registered model name
  • Stages configured correctly (typically ["None"])
  • Job runs in serverless environment

Dataset Lineage

  • mlflow.log_input() used for evaluation datasets
  • from_spark() used for Spark DataFrames
  • Context set to "evaluation"
  • Evaluation dataset stored in Unity Catalog

Experiment Organization

  • Three experiments created (dev, eval, deploy)
  • Run naming conventions followed
  • Standard tags applied to runs
  • Experiment paths use /Shared/ prefix

Promotion Workflow

  • Threshold checking implemented
  • Alias management configured (champion, production, staging)
  • Error handling for threshold failures
  • Deployment logging implemented

Reference Files

  • references/deployment-job-patterns.md - Complete deployment job flow and trigger configuration
  • references/dataset-lineage.md - mlflow.log_input() patterns and dataset tracking
  • references/experiment-organization.md - Three-experiment structure and run naming
  • references/model-promotion.md - Alias management and promotion logic
  • assets/templates/deployment-job.yml - Asset Bundle YAML template for deployment jobs

References

Official Documentation

Related Skills

  • mlflow-genai-evaluation - Agent evaluation patterns
  • responses-agent-patterns - ResponsesAgent implementation
  • databricks-asset-bundles - Asset Bundle configuration patterns

Version History

Date Changes
Feb 6, 2026 Initial version: Deployment automation with dataset lineage
Weekly Installs
1
GitHub Stars
2
First Seen
8 days ago
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1