ml-model-evaluation
SKILL.md
Ml Model Evaluation
Overview
Use this skill to evaluate models with decision-grade evidence across aggregate and high-risk segments.
Scope Boundaries
- Use this skill when the task matches the trigger condition described in
description. - Do not use this skill when the primary task falls outside this skill's domain.
Shared References
- Threshold and segmentation rules:
references/threshold-and-segmentation-rules.md
Templates And Assets
- Evaluation report template:
assets/evaluation-report-template.md
Inputs To Gather
- Dataset splits and baseline/candidate definitions.
- Business cost trade-offs for false positives/negatives.
- Segment definitions for fairness/risk-critical cohorts.
- Acceptance thresholds and calibration requirements.
Deliverables
- Evaluation report with thresholds and decision.
- Segment-level failure analysis.
- Acceptance/rejection rationale and follow-ups.
Workflow
- Build evaluation report in
assets/evaluation-report-template.md. - Apply threshold/segment policy via
references/threshold-and-segmentation-rules.md. - Validate calibration and error concentration risks.
- Compare baseline vs candidate under same conditions.
- Publish release recommendation and unresolved risks.
Quality Standard
- Thresholds are tied to business risk trade-offs.
- Critical segments are explicitly evaluated.
- Decision rationale is traceable to evidence.
Failure Conditions
- Stop when evaluation omits high-risk segments.
- Stop when acceptance thresholds are undefined.
- Escalate when model risk is unacceptable for rollout.
Weekly Installs
4
Repository
kentoshimizu/sw…t-skillsGitHub Stars
4
First Seen
13 days ago
Security Audits
Installed on
opencode4
gemini-cli4
codebuddy4
github-copilot4
codex4
kimi-cli4