skill-evals-optimize
SKILL.md
Skill Evals Optimize
Triage failed eval cases using the steering guide, apply limited fixes, and retest with a strict iteration cap.
Inputs
- Results root:
evals/skill-loading/.tmp/opencode-eval-results - Max optimization iterations: 2
- Steering guide:
evals/skill-loading/docs/skill-optimization-steering.md
Workflow
-
Locate latest results and list failed cases
bash scripts/list-fails.sh -
For each failed case
- Read the case entry in
opencode_skill_loading_eval_dataset.jsonl. - Consult the steering guide for the appropriate fix strategy.
- Propose the smallest targeted change (skill description, prompt, permissions, or tests).
- Read the case entry in
-
Retest only the failed cases
bash scripts/retest-fails.sh --parallel 3 -
Enforce the iteration cap (2 max)
- After two fix+retest cycles, stop optimizing.
- Acknowledge remaining failures as legitimate model limitations or out-of-scope behaviors.
Helper Scripts
bash scripts/list-fails.shlists FAIL case IDs from the latest run.bash scripts/retest-fails.shre-runs only failing cases.- Use
--filter-idto scope (e.g.,--filter-id "gh_|c7_"). - Use
--dry-runto print the command without executing.
- Use
Rules
- Do not broaden permissions or weaken tests to force a pass.
- Prefer minimal, reversible changes.
- If a failure persists after two iterations, label it as legitimate and move on.
Output
- Summarize PASS/FAIL counts.
- List failed case IDs and which ones were accepted as legitimate failures.
- Reference the steering guide for any remaining follow-up.
Weekly Installs
1
Repository
chandima/opencode-configGitHub Stars
1
First Seen
4 days ago
Security Audits
Installed on
amp1
cline1
openclaw1
opencode1
cursor1
kimi-cli1