evaluation-suites
Evaluation Suites
Structured testing with assertions (plain strings for LLM judge) and execution policies.
from opik import Opik
client = Opik()
suite = client.get_or_create_evaluation_suite(
name="my-suite",
assertions=["Response is factually accurate", "Response is professional"],
execution_policy={"runs_per_item": 3, "pass_threshold": 2},
)
suite.add_item(data={"input": "What is ML?"})
suite.add_item(
data={"input": "Should I take this medication?"},
assertions=["Response advises consulting a doctor"], # item-level, added to suite-level
)
results = suite.run(
task=lambda item: {"output": agent(item["input"])},
model="gpt-4o", # LLM judge
)
assert results.all_passed # CI gate
Suite-level assertions apply to all items. Item-level assertions are additive.
Use get_or_create_evaluation_suite() — NOT get_or_create_dataset().
Suites appear under "Evaluation Suites" in the UI sidebar.
More from comet-ml/opik-skills
opik
Opik observability for LLM agents — Agent Configuration, Local Runner (opik connect), Test Suites, threads, integrations. Use for "configure my agent", "connect my agent", "evaluate my agent" or "integrate with Opik".
152instrument
Add Opik tracing to an existing codebase. Detects language (Python/TypeScript), identifies LLM frameworks, adds appropriate decorators and integrations, marks entrypoints, and wires up environment config. Use for "instrument my code", "add opik tracing", "add observability", or "trace my agent".
136agent-config
Opik Agent Configuration — Blueprints, get_agent_config() with selectors, environment tags, Prompt/ChatPrompt fields, deploy_to(), MaskIDs, and config lifecycle.
5agent-ops
Agent lifecycle — architecture, configuration (Blueprints), Local Runner, evaluation, threads, production monitoring. Use for "evaluate my agent", "connect my agent", "configure my agent", "add guardrails".
5opik-connect
Opik Connect (Local Runner) — pair your local agent with the Opik browser UI for Python and TypeScript.
5instrument-typescript
Adding Opik observability to TypeScript/JS LLM apps — track() with entrypoint and explicit params for Local Runner, framework integrations.
5