Agent Platform Eval Flywheel Skill

Help users evaluate and iteratively improve GenAI models and agents using the Agent Platform GenAI Evaluation SDK (google.genai / agentplatform).

When to use this skill

Evaluating GenAI agents or models with the Agent Platform GenAI Evaluation SDK (client.evals.evaluate()).
Creating evaluation datasets from session traces, pandas DataFrames, or synthetic generation.
Selecting, configuring, or writing custom evaluation metrics.
Analyzing rubric verdicts, loss patterns, and clustering failures.
Suggesting concrete code/prompt improvements based on eval results.
Evaluating a model served on an Agent Platform endpoint (BYOM) or a Model-as-a-Service (MaaS) model by ID — including deploying the model first if needed. For this case, follow references/deployment.md and use the endpoint_evaluation.py / maas_evaluation.py scripts.