eval
SKILL.md
Eval — Eval-Driven Development
Define evals before coding, check during development, report after implementation.
Usage
/eval define <name> # Create eval definition before coding
/eval check <name> # Run evals during development
/eval report <name> # Generate full report after implementation
/eval list # List all evals in project
Subcommands
define
Create eval definition at .claude/evals/<name>.md:
- Ask user for feature description
- Generate capability evals (what it should do)
- Generate regression evals (what shouldn't break)
- Define success metrics (pass@k targets)
- Save to
.claude/evals/<name>.md
check
Run evals and show status:
- Read eval definition from
.claude/evals/<name>.md - Run code-based graders (grep, test, build)
- Run model-based graders if needed
- Flag items for human review
- Show pass/fail status
report
Generate comprehensive report:
- Run all evals
- Calculate pass@k metrics
- Compare to baseline (if exists)
- Generate markdown report
- Append to
.claude/evals/<name>.log
list
Show all evals and their status.
Output Format
EVAL REPORT: add-auth
=====================
Capability Evals:
register: PASS (pass@1)
login: PASS (pass@2)
invalid-reject: PASS (pass@1)
Overall: 3/3
Regression Evals:
public-routes: PASS
api-responses: PASS
Overall: 2/2
Metrics:
pass@1: 67%
pass@3: 100%
Status: READY FOR REVIEW
File Structure
.claude/evals/
├── add-auth.md # Definition
├── add-auth.log # Run history
└── baseline.json # Regression baselines
Weekly Installs
1
Repository
fancive/claude-skillsFirst Seen
12 days ago
Security Audits
Installed on
codex1