agent-platform-eval-flywheel

Installation
SKILL.md

Agent Platform Eval Flywheel Skill

Help users evaluate and iteratively improve GenAI models and agents using the Agent Platform GenAI Evaluation SDK (google.genai / agentplatform).

When to use this skill

  • Evaluating GenAI agents or models with the Agent Platform GenAI Evaluation SDK (client.evals.evaluate()).
  • Creating evaluation datasets from session traces, pandas DataFrames, or synthetic generation.
  • Selecting, configuring, or writing custom evaluation metrics.
  • Analyzing rubric verdicts, loss patterns, and clustering failures.
  • Suggesting concrete code/prompt improvements based on eval results.
  • Evaluating a model served on an Agent Platform endpoint (BYOM) or a Model-as-a-Service (MaaS) model by ID — including deploying the model first if needed. For this case, follow references/deployment.md and use the endpoint_evaluation.py / maas_evaluation.py scripts.
Installs
1.6K
Repository
google/skills
GitHub Stars
14.3K
First Seen
Jun 2, 2026
agent-platform-eval-flywheel — google/skills