google-agents-cli-eval
Installation
Summary
Evaluate ADK agents with metrics, evalsets, and the iterative eval-fix loop.
- Run evaluations with
agents-cli eval runusing configurable criteria (tool trajectory, response matching, rubric-based scoring, hallucination detection, safety checks) and match types (EXACT, IN_ORDER, ANY_ORDER) - Build evalsets with multi-turn conversation cases, expected tool trajectories, intermediate responses, and session state overrides
- Iterate through 5-10+ eval-fix cycles: diagnose failures, fix agent instructions or tool logic, rerun, and track progress with task lists
- Avoid common pitfalls: don't lower thresholds to hide failures, handle extra tool calls with IN_ORDER matching, ensure app name matches directory, and initialize state with callbacks to prevent KeyError crashes
SKILL.md
Agent Evaluation Guide
Requires:
agents-cli(uv tool install google-agents-cli) — install uv first if needed.
Scaffolded project? If you used
/google-agents-cli-scaffold, you already haveagents-cli eval run(chainsgenerate+grade),tests/eval/datasets/, andtests/eval/eval_config.yaml. Start with executingeval runand iterate from there.
Reference Files
| File | Contents |
|---|---|
references/dataset_schema.md |
Canonical EvaluationDataset schema — all field types, JSON examples for single-turn / multi-turn / multi-agent, common mistakes |
references/metrics-guide.md |
Complete metrics reference — all built-in metrics, match types, custom metrics, judge model config |
references/user-simulation.md |
Dynamic conversation testing — eval dataset synthesize flags, what scenarios are, compatible metrics |
references/builtin-tools-eval.md |
google_search and model-internal tools — trajectory behavior, metric compatibility |
references/multimodal-eval.md |
Multimodal inputs — eval dataset schema, built-in metric limitations, custom evaluator pattern |