skill-evals-optimize
Skill Evals Optimize
Triage failed eval cases using the steering guide, apply limited fixes, and retest with a strict iteration cap.
Inputs
- Results root:
evals/skill-loading/.tmp/opencode-eval-results - Max optimization iterations: 2
- Steering guide:
evals/skill-loading/docs/skill-optimization-steering.md
Workflow
-
Locate latest results and list failed cases
bash scripts/list-fails.sh -
For each failed case
- Read the case entry in
opencode_skill_loading_eval_dataset.jsonl. - Consult the steering guide for the appropriate fix strategy.
- Propose the smallest targeted change (skill description, prompt, permissions, or tests).
- Read the case entry in
-
Retest only the failed cases
bash scripts/retest-fails.sh --parallel 3 -
Enforce the iteration cap (2 max)
- After two fix+retest cycles, stop optimizing.
- Acknowledge remaining failures as legitimate model limitations or out-of-scope behaviors.
Helper Scripts
bash scripts/list-fails.shlists FAIL case IDs from the latest run.bash scripts/retest-fails.shre-runs only failing cases.- Use
--filter-idto scope (e.g.,--filter-id "gh_|c7_"). - Use
--dry-runto print the command without executing.
- Use
Rules
- Do not broaden permissions or weaken tests to force a pass.
- Prefer minimal, reversible changes.
- If a failure persists after two iterations, label it as legitimate and move on.
Output
- Summarize PASS/FAIL counts.
- List failed case IDs and which ones were accepted as legitimate failures.
- Reference the steering guide for any remaining follow-up.
More from chandima/opencode-config
github-ops
|
2context7-docs
|
2mcporter
Direct MCP access via MCPorter. Use to discover MCP servers, list MCP tools, or call an MCP tool (e.g., chrome-devtools screenshot, firecrawl scrape) when no specific skill exists. Supports any configured MCP server.
1asu-discover
Semantic search across 760+ ASU GitHub repositories via RAG. Use for finding code patterns, integrations, SDKs, and ASU-specific conventions. Domains: PeopleSoft, EDNA, DPL, Terraform, EEL, Vault, CI/CD. Use BEFORE starting ASU integration tasks.
1skill-evals-run
Run the OpenCode skill-loading eval suite for this repo. Use when asked to run skill evals, skill-loading evals, or the skill-evals-run command.
1planning-doc
Create or update PLAN.md planning documents when users ask for a plan or planning. Use when asked to create a plan, planning document, or plan-based workflow.
1