agent-evals
SKILL.md
Agent Evals
Create repeatable checks so agent behavior improves safely over time.
Evaluation Layers
- Unit evals: prompt-level correctness
- Tool evals: API/tool call decision quality
- End-to-end evals: realistic multi-step tasks
- Safety evals: prompt injection and data leak resistance
CI/CD Integration
# Example eval pipeline steps
make evals-smoke
make evals-regression
make evals-safety
Best Practices
- Version datasets with expected outputs.
- Track pass rates and score drift over time.
- Block deploys on critical safety regressions.
Related Skills
- github-actions - Eval automation in CI
- ai-agent-security - Security-focused eval cases
Weekly Installs
9
Repository
bagelhole/devop…t-skillsGitHub Stars
13
First Seen
Feb 21, 2026
Security Audits
Installed on
cline9
github-copilot9
codex9
kimi-cli9
gemini-cli9
cursor9