mlflow-genai-foundation
MLflow GenAI Foundation Patterns
When to Use
Use this skill when:
- Creating new Databricks GenAI agents
- Implementing MLflow tracing for agents
- Setting up agent evaluation pipelines
- Managing prompts with MLflow Prompt Registry
- Troubleshooting AI Playground compatibility issues
- Understanding foundational MLflow GenAI concepts
⚠️ CRITICAL: ResponsesAgent is MANDATORY for AI Playground
Databricks recommends ResponsesAgent over ChatAgent for all new agents.
Without proper model signatures, your agent will NOT work in AI Playground, Agent Evaluation, or Mosaic AI features.
Key Points:
ResponsesAgentautomatically infers compatible model signatures- Manual signatures break AI Playground compatibility
- Use
inputkey (notmessages) in input examples - Return
ResponsesAgentResponseobjects (not dicts)
See: responses-agent-patterns skill for complete implementation guide.
⚠️ CRITICAL: NO LLM Fallback for Data Queries
When an agent uses Genie Spaces or data retrieval tools, NEVER fall back to LLM when tools fail.
Why: LLM fallback generates hallucinated fake data that looks real but is completely fabricated.
Correct Pattern:
- Return explicit error messages when tools fail
- Include "I will NOT generate fake data" statement
- Log errors to trace spans for visibility
See: multi-agent-genie-orchestration skill for complete pattern.
Model Signatures Overview
Why Signatures Matter
"Azure Databricks uses MLflow Model Signatures to define agents' input and output schema. Product features like the AI Playground assume that your agent has one of a set of supported model signatures." — Microsoft Docs
The Golden Rule
"If you follow the recommended approach to authoring agents, MLflow will automatically infer a signature for your agent that is compatible with Azure Databricks product features, with no additional work required on your part." — Microsoft Docs
What Breaks Compatibility
| Issue | Impact |
|---|---|
| Manual signature with wrong schema | ❌ AI Playground won't load |
PythonModel instead of ResponsesAgent |
❌ No signature inference |
messages input instead of input |
❌ Request format mismatch |
Legacy dict output instead of ResponsesAgentResponse |
❌ Response parsing fails |
See Model Signatures for detailed explanation.
Tracing Fundamentals
Automatic Tracing with autolog
Enable autolog at module level for automatic tracing:
import mlflow
# At the TOP of your main module
mlflow.langchain.autolog(
log_models=True,
log_input_examples=True,
log_model_signatures=True,
log_inputs=True
)
Manual Tracing with Decorators
Use @mlflow.trace for custom functions:
import mlflow
@mlflow.trace(name="my_function", span_type="AGENT")
def my_agent_function(query: str) -> dict:
"""Function is automatically traced."""
result = process(query)
return result
Manual Span Creation
For fine-grained control:
import mlflow
def complex_operation(data):
with mlflow.start_span(name="outer_operation") as span:
span.set_inputs({"data": data})
with mlflow.start_span(name="inner_step", span_type="LLM") as inner:
inner.set_inputs({"prompt": "..."})
result = llm.invoke(...)
inner.set_outputs({"response": result})
span.set_outputs({"result": result})
span.set_attributes({"custom_metric": 0.95})
return result
See Tracing Patterns for complete guide.
Evaluation Basics
Built-in Scorers
from mlflow.metrics.genai import relevance, safety, Guidelines
results = mlflow.genai.evaluate(
model=agent,
data=evaluation_data,
scorers=[
Relevance(),
Safety(),
GuidelinesAdherence(guidelines=[
"Include time context",
"Format costs as USD",
"Cite sources"
])
]
)
Custom Scorers with @scorer
from mlflow.genai import scorer, Score
@scorer
def custom_judge(inputs: dict, outputs: dict, expectations: dict = None) -> Score:
"""Custom judge for domain-specific accuracy."""
response_text = _extract_response_text(outputs) # Helper required
score_value = calculate_score(response_text)
return Score(
value=score_value,
rationale="Explanation of score"
)
Production Monitoring
Use mlflow.genai.assess() for real-time assessment:
assessment = mlflow.genai.assess(
inputs={"query": query},
outputs={"response": response},
scorers=[Relevance(), Safety()]
)
if assessment.scores["relevance"] < 0.6:
trigger_quality_alert()
See Evaluation Basics for complete guide.
Prompt Registry Basics
Log Prompts
import mlflow.genai
mlflow.genai.log_prompt(
prompt="""You are a helpful assistant.
User context: {user_context}
Query: {query}""",
artifact_path="prompts/assistant",
registered_model_name="my_app_assistant_prompt"
)
Load Prompts by Alias
# Load production prompt
prompt = mlflow.genai.load_prompt(
"prompts:/my_app_assistant_prompt/production"
)
# Load by version
prompt_v1 = mlflow.genai.load_prompt(
"prompts:/my_app_assistant_prompt/1"
)
Set Aliases
from mlflow import MlflowClient
client = MlflowClient()
client.set_registered_model_alias(
name="my_app_assistant_prompt",
alias="production",
version="2"
)
See Prompt Registry Basics for complete guide.
Agent Logging Patterns
ResponsesAgent (Recommended)
import mlflow
from mlflow.pyfunc import ResponsesAgent
agent = MyResponsesAgent()
# CRITICAL: Set model before logging
mlflow.models.set_model(agent)
# Input example in ResponsesAgent format
input_example = {
"input": [{"role": "user", "content": "What is the status?"}]
}
with mlflow.start_run():
# DO NOT pass signature parameter - auto-inferred!
mlflow.pyfunc.log_model(
artifact_path="agent",
python_model=agent,
input_example=input_example,
# signature=... # ❌ NEVER include this!
registered_model_name="my_agent",
pip_requirements=[
"mlflow>=3.0.0",
"databricks-sdk>=0.28.0",
],
)
See: responses-agent-patterns skill for complete implementation.
Common Mistakes Quick Reference
| Mistake | Impact | Fix |
|---|---|---|
Using PythonModel instead of ResponsesAgent |
❌ No signature inference | Use ResponsesAgent |
| Manual signature definition | ❌ Breaks AI Playground | Let MLflow auto-infer |
Using messages instead of input |
❌ Format mismatch | Use input key |
Returning dict instead of ResponsesAgentResponse |
❌ Parsing fails | Return ResponsesAgentResponse |
Missing set_model() before log_model() |
⚠️ May fail | Call set_model() first |
| LLM fallback for data queries | ❌ Hallucinated data | Return explicit errors |
Validation Checklist
🔴 ResponsesAgent & Model Signatures (CRITICAL)
- Agent class inherits from
mlflow.pyfunc.ResponsesAgent -
predictmethod accepts singlerequestparameter -
predictreturnsResponsesAgentResponseobject - Input example uses
inputkey (NOTmessages) - NO
signatureparameter inlog_model()call - Agent loads successfully in AI Playground
Tracing
-
mlflow.langchain.autolog()enabled at module level - All custom functions decorated with
@mlflow.trace - Span types specified (AGENT, LLM, TOOL, etc.)
- Inputs and outputs set for manual spans
- Traces tagged with user_id, session_id, environment
Evaluation
- Built-in scorers used where appropriate
- Custom judges return
Scoreobjects - Evaluation metrics logged to MLflow
- Production monitoring with
assess()implemented
Prompts
- All prompts logged to registry
- Production alias set for deployment
- Prompts loaded by alias in production code
Agent Logging
-
ResponsesAgentinterface implemented (not ChatAgent) -
set_model()called beforelog_model() - Model registered with proper name
- Aliases set for dev/staging/production
References
- Model Signatures - Why signatures matter, AI Playground compatibility
- Tracing Patterns - autolog, decorators, manual spans
- Evaluation Basics - Built-in scorers, custom judges, production monitoring
- Prompt Registry Basics - Logging, loading, aliases
Official Documentation
- Model Signatures for Databricks Features (CRITICAL)
- MLflow GenAI Concepts
- MLflow Tracing
- MLflow Scorers
- Prompt Registry
Related Skills
- responses-agent-patterns - Complete ResponsesAgent implementation
- multi-agent-genie-orchestration - NO LLM fallback pattern
- mlflow-genai-evaluation - Advanced evaluation patterns