Skill Evals Optimize

Triage failed eval cases using the steering guide, apply limited fixes, and retest with a strict iteration cap.

Inputs

Locate latest results and list failed cases
```
bash scripts/list-fails.sh
```
For each failed case
- Read the case entry in opencode_skill_loading_eval_dataset.jsonl.
- Consult the steering guide for the appropriate fix strategy.
- Propose the smallest targeted change (skill description, prompt, permissions, or tests).

Retest only the failed cases

bash scripts/retest-fails.sh --parallel 3

After two fix+retest cycles, stop optimizing.
Acknowledge remaining failures as legitimate model limitations or out-of-scope behaviors.

bash scripts/list-fails.sh lists FAIL case IDs from the latest run.
bash scripts/retest-fails.sh re-runs only failing cases.
- Use --filter-id to scope (e.g., --filter-id "gh_|c7_").
- Use --dry-run to print the command without executing.

Related skills

Installs

Repository

GitHub Stars

First Seen

Mar 13, 2026

Security Audits