agent-platform-eval-flywheel
Installation
SKILL.md
Agent Platform Eval Flywheel Skill
Help users evaluate and iteratively improve GenAI models and agents using
the Agent Platform GenAI Evaluation SDK (google.genai / agentplatform).
When to use this skill
- Evaluating GenAI agents or models with the Agent Platform GenAI
Evaluation SDK (
client.evals.evaluate()). - Creating evaluation datasets from session traces, pandas DataFrames, or synthetic generation.
- Selecting, configuring, or writing custom evaluation metrics.
- Analyzing rubric verdicts, loss patterns, and clustering failures.
- Suggesting concrete code/prompt improvements based on eval results.
- Evaluating a model served on an Agent Platform endpoint (BYOM) or a
Model-as-a-Service (MaaS) model by ID — including deploying the model
first if needed. For this case, follow
references/deployment.md and use the
endpoint_evaluation.py/maas_evaluation.pyscripts.