galileo-python-sdk
Galileo Python SDK
The Galileo Python SDK (galileo) provides a unified interface for the Galileo AI platform — enabling evaluation, observability, and runtime guardrails for GenAI applications. It supports automatic tracing of LLM calls, custom span logging, evaluation experiments, and production-grade guardrails.
SDK Version Detection
Check installed versions before writing any code to pick the right reference:
import importlib.metadata, importlib.util
galileo_ver = importlib.metadata.version("galileo") # e.g. "2.1.1"
pq_installed = importlib.util.find_spec("promptquality") is not None
pq_ver = importlib.metadata.version("promptquality") if pq_installed else None
print(f"galileo={galileo_ver}, promptquality={pq_ver}")
| Installed stack | Use |
|---|---|
galileo >= 2.0 (with or without promptquality 0.x) |
This skill — GalileoLogger, @log, galileo_context |
galileo < 2.0 + promptquality >= 1.0 |
Promptquality 1.x Reference |
Note:
promptquality >= 1.0andgalileo >= 2.0are mutually incompatible — they require different major versions ofgalileo-core. Installing both will cause dependency conflicts.
Additional references:
- Framework Integrations — OpenAI, Anthropic, LangChain, LangGraph, CrewAI, PydanticAI, and more
- Guardrail Metrics Reference — Hallucination Index, Context Adherence, Toxicity, PII, and all available metrics
- Advanced Evaluation Patterns — Experiments, eval sets, prompt optimization, and scoring
- Promptquality 1.x Reference — EvaluateRun, Scorers, ScorersConfiguration for the galileo 1.x stack
Installation
pip install galileo
For evaluation features with the legacy prompt engineering interface:
pip install promptquality
For runtime guardrails:
pip install galileo-protect
Quick Start
import os
from galileo import galileo_context
from galileo.openai import openai
galileo_context.init(project="my-project", log_stream="my-log-stream")
client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
response = client.chat.completions.create(
messages=[{"role": "user", "content": "Explain quantum computing in one sentence."}],
model="gpt-4o",
)
print(response.choices[0].message.content)
galileo_context.flush()
Authentication
Set the following environment variables:
# .env file or shell environment
GALILEO_API_KEY="your-api-key" # Required — from Galileo console
GALILEO_CONSOLE_URL="https://app.galileo.ai" # Console URL (or self-hosted URL)
GALILEO_PROJECT="my-project" # Optional — default project
GALILEO_LOG_STREAM="my-log-stream" # Optional — default log stream
GALILEO_LOGGING_DISABLED="false" # Optional — disable logging
For the legacy promptquality package, authenticate programmatically:
import promptquality as pq
pq.login("https://app.galileo.ai")
Observability and Tracing
Initializing the Galileo Context
from galileo import galileo_context
galileo_context.init(project="my-project", log_stream="my-log-stream")
Wrapped OpenAI Client (Auto-Logging)
Import the Galileo-wrapped OpenAI client to automatically trace all calls:
from galileo.openai import openai
client = openai.OpenAI()
response = client.chat.completions.create(
messages=[{"role": "user", "content": "Hello"}],
model="gpt-4o",
)
The @log Decorator
Use @log to create spans for your functions. Supported span types: workflow, llm, retriever, tool.
from galileo import log
@log
def my_workflow():
result = call_openai()
return result
@log(span_type="retriever")
def retrieve_documents(query: str):
docs = vector_store.search(query)
return docs
@log(span_type="tool")
def search_web(query: str):
return web_api.search(query)
Nested Workflows
from galileo import log
@log
def agent_pipeline(user_input: str):
context = retrieve_documents(user_input)
tool_result = search_web(user_input)
response = generate_response(user_input, context, tool_result)
return response
@log(span_type="retriever")
def retrieve_documents(query: str):
return ["doc1", "doc2"]
@log(span_type="tool")
def search_web(query: str):
return "search result"
@log
def generate_response(query: str, context: list, tool_result: str):
client = openai.OpenAI()
return client.chat.completions.create(
messages=[{"role": "user", "content": query}],
model="gpt-4o",
)
Context Manager
Scope logging to a specific block and auto-flush on exit:
from galileo import galileo_context
with galileo_context(project="my-project", log_stream="my-log-stream"):
result = my_workflow()
print(result)
Flushing Traces
Upload captured traces to Galileo:
galileo_context.flush()
Evaluation
Running Experiments with promptquality
import promptquality as pq
pq.login("https://app.galileo.ai")
template = "Explain {{topic}} to me like I'm a 5 year old"
data = {"topic": ["Quantum Physics", "Politics", "Large Language Models"]}
pq.run(
project_name="my-first-project",
template=template,
dataset=data,
settings=pq.Settings(
model_alias="ChatGPT (16K context)",
temperature=0.8,
max_tokens=400,
),
)
Evaluation Runs with Custom Workflows (galileo 2.x)
Use GalileoLogger to log traces for evaluation:
from galileo import GalileoLogger
logger = GalileoLogger(project="my_project", log_stream="my_run")
eval_set = ["What are hallucinations?", "What are intrinsic hallucinations?"]
for input_text in eval_set:
output = llm.call(input_text)
logger.add_single_llm_span_trace(
input=input_text,
output=output,
model="gpt-4o",
)
logger.flush()
For the
galileo < 2.0+promptquality >= 1.0stack, useEvaluateRun— see Promptquality 1.x Reference.
See Advanced Evaluation Patterns for more.
Guardrails / Protect
Creating a Protection Stage
from galileo import GalileoMetrics
from galileo.stages import create_protect_stage
from galileo_core.schemas.protect.rule import Rule, RuleOperator
from galileo_core.schemas.protect.ruleset import Ruleset
from galileo_core.schemas.protect.stage import StageType
rule = Rule(
metric=GalileoMetrics.input_toxicity,
operator=RuleOperator.gt,
target_value=0.1,
)
ruleset = Ruleset(rules=[rule])
stage = create_protect_stage(
name="toxicity-guard",
stage_type=StageType.central,
prioritized_rulesets=[ruleset],
description="Block toxic input.",
)
Invoking Runtime Protection
from galileo.protect import invoke_protect, ainvoke_protect
from galileo_core.schemas.protect.payload import Payload
payload = Payload(input="User message to check.")
response = invoke_protect(payload=payload, stage_name="toxicity-guard")
# Async variant
response = await ainvoke_protect(payload=payload, stage_name="toxicity-guard")
Stage Types
- Central stages — Created and managed by governance teams; rulesets defined at creation time
- Local stages — Created without rulesets; rulesets supplied at runtime by application teams
See Guardrail Metrics Reference for all available metrics.
Common Patterns
Multi-Turn Conversations
from galileo import log
from galileo.openai import openai
client = openai.OpenAI()
@log
def chat(messages: list):
response = client.chat.completions.create(
messages=messages,
model="gpt-4o",
)
return response.choices[0].message.content
messages = []
messages.append({"role": "user", "content": "What is RAG?"})
reply = chat(messages)
messages.append({"role": "assistant", "content": reply})
messages.append({"role": "user", "content": "How do I implement it?"})
reply = chat(messages)
RAG Pipeline with Retriever Spans
from galileo import log
from galileo.openai import openai
client = openai.OpenAI()
@log(span_type="retriever")
def retrieve(query: str):
results = vector_db.similarity_search(query, k=5)
return [doc.page_content for doc in results]
@log
def rag_pipeline(question: str):
context = retrieve(question)
prompt = f"Context: {context}\n\nQuestion: {question}"
response = client.chat.completions.create(
messages=[{"role": "user", "content": prompt}],
model="gpt-4o",
)
return response.choices[0].message.content
Agent Tool Calling
from galileo import log
@log(span_type="tool")
def math_operation(a: float, b: float, op: str) -> str:
if op == "add":
return str(a + b)
elif op == "multiply":
return str(a * b)
raise ValueError(f"Unknown op: {op}")
@log(span_type="tool")
def web_search(query: str):
return search_api.query(query)
@log
def agent(user_input: str):
plan = plan_actions(user_input)
results = []
for action in plan:
if action.tool == "math_operation":
results.append(math_operation(action.input))
elif action.tool == "web_search":
results.append(web_search(action.input))
return synthesize(results)
Best Practices
- Always set environment variables for
GALILEO_API_KEYandGALILEO_CONSOLE_URLrather than hardcoding credentials. - Organize projects and log streams by application, environment, or team to keep traces manageable.
- Call
galileo_context.flush()at the end of each request or batch to ensure traces are uploaded. In web servers, flush at the end of each request handler. - Use the context manager (
with galileo_context(...)) for scoped logging that auto-flushes on exit. - Use specific span types (
retriever,tool,llm,workflow) to get the most out of Galileo's trace visualization. - Handle errors gracefully — wrap
flush()calls in try/except to prevent logging failures from crashing your application. - Use the wrapped OpenAI client (
from galileo.openai import openai) for zero-config automatic tracing of all OpenAI calls. - Leverage guardrail metrics in production to catch hallucinations, toxic content, and PII before they reach end users.
Resources
- Documentation: https://docs.galileo.ai
- Python SDK repo: https://github.com/rungalileo/galileo-python
- SDK examples: https://github.com/rungalileo/sdk-examples
- PyPI: https://pypi.org/project/galileo/
- Galileo console: https://app.galileo.ai