agent:eval

Installation
SKILL.md

Agent Evaluation System

Guides the user through building a comprehensive evaluation system for their AI agent. Applies patterns 10-17 from "Patterns for Building AI Agents" (Bhagwat & Gienow, 2025): failure mode taxonomy, business metrics, cross-referencing, iterating against evals, test suites, SME labeling, production datasets, and live evaluation.

When to use

Use this skill when the user needs to:

  • Define what "good" looks like for an AI agent
  • Create a failure mode taxonomy
  • Set up business metrics for agent performance
  • Build an evaluation test suite
  • Design SME labeling workflows
  • Plan production data evaluation pipelines

Instructions

Step 1: Understand the Agent

Use the AskUserQuestion tool to gather context:

Related skills
Installs
6
Repository
ikatsuba/skills
First Seen
Mar 6, 2026