eval

Installation

SKILL.md

/hub:eval — Evaluate Agent Results

Rank all agent results for a session. Supports metric-based evaluation (run a command), LLM judge (compare diffs), or hybrid.

Usage

/hub:eval                           # Eval latest session using configured criteria
/hub:eval 20260317-143022           # Eval specific session
/hub:eval --judge                   # Force LLM judge mode (ignore metric config)

What It Does

Metric Mode (eval command configured)

Run the evaluation command in each agent's worktree:

Installs

1.4K

Repository

alirezarezvani/…e-skills

GitHub Stars

21.5K

First Seen

Mar 17, 2026

Security Audits

Gen Agent Trust HubWarn

SocketPass

SnykPass

eval — alirezarezvani/claude-skills