skill-evals-run

SKILL.md

Skill Evals Run

Run the local skill-loading eval suite with the shell runner.

Prerequisites

  • opencode is installed and on PATH.
  • Provider config is available under ~/.config/opencode/ (used even with --isolate-config).
  • If using --disable-models-fetch, ~/.cache/opencode/models.json exists and includes the target model.
  • Auth/credentials are present (typically ~/.local/share/opencode/auth.json).
  • Network access is available for model calls.

Command

Run:

evals/skill-loading/opencode_skill_eval_runner.sh \
  --repo "$PWD" \
  --dataset evals/skill-loading/opencode_skill_loading_eval_dataset.jsonl \
  --matrix evals/skill-loading/opencode_skill_eval_matrix.json \
  --disable-models-fetch \
  --isolate-config \
  --parallel 3

Arguments

If the user provides any of the following flags, append them to the command:

  • --filter-id <regex>
  • --filter-category <substring>
  • --parallel <n>

If --parallel is omitted, keep the default of 3.

After the run

  • Summarize PASS/FAIL counts and list failed case IDs.
  • If failures exist (PASS/FAIL, not ERROR), reference evals/skill-loading/docs/skill-optimization-steering.md and suggest the next remediation step.
  • If any cases are ERROR, do not suggest optimization. Instead, inspect evals/skill-loading/.tmp/opencode-eval-results/<run>/results.json and any traces to identify the crash, then re-run the evals once the error is resolved.

Notes

  • Run from the repo root so relative paths resolve.
  • --isolate-config also disables project config, so no extra flag is required to avoid loading repo config/plugins during evals.
  • With --disable-models-fetch, the runner falls back to ~/.cache/opencode/models.json when present. Use --models-url file://... if you need a different cache file.
  • Include the exact command used in the response.
Weekly Installs
1
GitHub Stars
1
First Seen
4 days ago
Installed on
amp1
cline1
openclaw1
opencode1
cursor1
kimi-cli1