agent-evaluation

Installation
SKILL.md

Agent Evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks

Capabilities

  • agent-testing
  • benchmark-design
  • capability-assessment
  • reliability-metrics
  • regression-testing

Prerequisites

  • Knowledge: Testing methodologies, Statistical analysis basics, LLM behavior patterns
  • Skills_recommended: autonomous-agents, multi-agent-orchestration
  • Required skills: testing-fundamentals, llm-fundamentals

Scope

Installs
687
GitHub Stars
39.9K
First Seen
Jan 19, 2026
agent-evaluation — sickn33/antigravity-awesome-skills