mlflow-genai-foundation

SKILL.md

MLflow GenAI Foundation Patterns

When to Use

Use this skill when:

  • Creating new Databricks GenAI agents
  • Implementing MLflow tracing for agents
  • Setting up agent evaluation pipelines
  • Managing prompts with MLflow Prompt Registry
  • Troubleshooting AI Playground compatibility issues
  • Understanding foundational MLflow GenAI concepts

⚠️ CRITICAL: ResponsesAgent is MANDATORY for AI Playground

Databricks recommends ResponsesAgent over ChatAgent for all new agents.

Without proper model signatures, your agent will NOT work in AI Playground, Agent Evaluation, or Mosaic AI features.

Key Points:

  • ResponsesAgent automatically infers compatible model signatures
  • Manual signatures break AI Playground compatibility
  • Use input key (not messages) in input examples
  • Return ResponsesAgentResponse objects (not dicts)

See: responses-agent-patterns skill for complete implementation guide.


⚠️ CRITICAL: NO LLM Fallback for Data Queries

When an agent uses Genie Spaces or data retrieval tools, NEVER fall back to LLM when tools fail.

Why: LLM fallback generates hallucinated fake data that looks real but is completely fabricated.

Correct Pattern:

  • Return explicit error messages when tools fail
  • Include "I will NOT generate fake data" statement
  • Log errors to trace spans for visibility

See: multi-agent-genie-orchestration skill for complete pattern.


Model Signatures Overview

Why Signatures Matter

"Azure Databricks uses MLflow Model Signatures to define agents' input and output schema. Product features like the AI Playground assume that your agent has one of a set of supported model signatures." — Microsoft Docs

The Golden Rule

"If you follow the recommended approach to authoring agents, MLflow will automatically infer a signature for your agent that is compatible with Azure Databricks product features, with no additional work required on your part." — Microsoft Docs

What Breaks Compatibility

Issue Impact
Manual signature with wrong schema ❌ AI Playground won't load
PythonModel instead of ResponsesAgent ❌ No signature inference
messages input instead of input ❌ Request format mismatch
Legacy dict output instead of ResponsesAgentResponse ❌ Response parsing fails

See Model Signatures for detailed explanation.


Tracing Fundamentals

Automatic Tracing with autolog

Enable autolog at module level for automatic tracing:

import mlflow

# At the TOP of your main module
mlflow.langchain.autolog(
    log_models=True,
    log_input_examples=True,
    log_model_signatures=True,
    log_inputs=True
)

Manual Tracing with Decorators

Use @mlflow.trace for custom functions:

import mlflow

@mlflow.trace(name="my_function", span_type="AGENT")
def my_agent_function(query: str) -> dict:
    """Function is automatically traced."""
    result = process(query)
    return result

Manual Span Creation

For fine-grained control:

import mlflow

def complex_operation(data):
    with mlflow.start_span(name="outer_operation") as span:
        span.set_inputs({"data": data})
        
        with mlflow.start_span(name="inner_step", span_type="LLM") as inner:
            inner.set_inputs({"prompt": "..."})
            result = llm.invoke(...)
            inner.set_outputs({"response": result})
        
        span.set_outputs({"result": result})
        span.set_attributes({"custom_metric": 0.95})
    
    return result

See Tracing Patterns for complete guide.


Evaluation Basics

Built-in Scorers

from mlflow.metrics.genai import relevance, safety, Guidelines

results = mlflow.genai.evaluate(
    model=agent,
    data=evaluation_data,
    scorers=[
        Relevance(),
        Safety(),
        GuidelinesAdherence(guidelines=[
            "Include time context",
            "Format costs as USD",
            "Cite sources"
        ])
    ]
)

Custom Scorers with @scorer

from mlflow.genai import scorer, Score

@scorer
def custom_judge(inputs: dict, outputs: dict, expectations: dict = None) -> Score:
    """Custom judge for domain-specific accuracy."""
    response_text = _extract_response_text(outputs)  # Helper required
    score_value = calculate_score(response_text)
    
    return Score(
        value=score_value,
        rationale="Explanation of score"
    )

Production Monitoring

Use mlflow.genai.assess() for real-time assessment:

assessment = mlflow.genai.assess(
    inputs={"query": query},
    outputs={"response": response},
    scorers=[Relevance(), Safety()]
)

if assessment.scores["relevance"] < 0.6:
    trigger_quality_alert()

See Evaluation Basics for complete guide.


Prompt Registry Basics

Log Prompts

import mlflow.genai

mlflow.genai.log_prompt(
    prompt="""You are a helpful assistant.
    
User context: {user_context}
Query: {query}""",
    artifact_path="prompts/assistant",
    registered_model_name="my_app_assistant_prompt"
)

Load Prompts by Alias

# Load production prompt
prompt = mlflow.genai.load_prompt(
    "prompts:/my_app_assistant_prompt/production"
)

# Load by version
prompt_v1 = mlflow.genai.load_prompt(
    "prompts:/my_app_assistant_prompt/1"
)

Set Aliases

from mlflow import MlflowClient

client = MlflowClient()

client.set_registered_model_alias(
    name="my_app_assistant_prompt",
    alias="production",
    version="2"
)

See Prompt Registry Basics for complete guide.


Agent Logging Patterns

ResponsesAgent (Recommended)

import mlflow
from mlflow.pyfunc import ResponsesAgent

agent = MyResponsesAgent()

# CRITICAL: Set model before logging
mlflow.models.set_model(agent)

# Input example in ResponsesAgent format
input_example = {
    "input": [{"role": "user", "content": "What is the status?"}]
}

with mlflow.start_run():
    # DO NOT pass signature parameter - auto-inferred!
    mlflow.pyfunc.log_model(
        artifact_path="agent",
        python_model=agent,
        input_example=input_example,
        # signature=...  # ❌ NEVER include this!
        registered_model_name="my_agent",
        pip_requirements=[
            "mlflow>=3.0.0",
            "databricks-sdk>=0.28.0",
        ],
    )

See: responses-agent-patterns skill for complete implementation.


Common Mistakes Quick Reference

Mistake Impact Fix
Using PythonModel instead of ResponsesAgent ❌ No signature inference Use ResponsesAgent
Manual signature definition ❌ Breaks AI Playground Let MLflow auto-infer
Using messages instead of input ❌ Format mismatch Use input key
Returning dict instead of ResponsesAgentResponse ❌ Parsing fails Return ResponsesAgentResponse
Missing set_model() before log_model() ⚠️ May fail Call set_model() first
LLM fallback for data queries ❌ Hallucinated data Return explicit errors

Validation Checklist

🔴 ResponsesAgent & Model Signatures (CRITICAL)

  • Agent class inherits from mlflow.pyfunc.ResponsesAgent
  • predict method accepts single request parameter
  • predict returns ResponsesAgentResponse object
  • Input example uses input key (NOT messages)
  • NO signature parameter in log_model() call
  • Agent loads successfully in AI Playground

Tracing

  • mlflow.langchain.autolog() enabled at module level
  • All custom functions decorated with @mlflow.trace
  • Span types specified (AGENT, LLM, TOOL, etc.)
  • Inputs and outputs set for manual spans
  • Traces tagged with user_id, session_id, environment

Evaluation

  • Built-in scorers used where appropriate
  • Custom judges return Score objects
  • Evaluation metrics logged to MLflow
  • Production monitoring with assess() implemented

Prompts

  • All prompts logged to registry
  • Production alias set for deployment
  • Prompts loaded by alias in production code

Agent Logging

  • ResponsesAgent interface implemented (not ChatAgent)
  • set_model() called before log_model()
  • Model registered with proper name
  • Aliases set for dev/staging/production

References

Official Documentation

Related Skills

Weekly Installs
1
GitHub Stars
2
First Seen
8 days ago
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1