skill-evals-run
Skill Evals Run
Run the local skill-loading eval suite with the shell runner.
Prerequisites
opencodeis installed and on PATH.- Provider config is available under
~/.config/opencode/(used even with--isolate-config). - If using
--disable-models-fetch,~/.cache/opencode/models.jsonexists and includes the target model. - Auth/credentials are present (typically
~/.local/share/opencode/auth.json). - Network access is available for model calls.
Command
Run:
evals/skill-loading/opencode_skill_eval_runner.sh \
--repo "$PWD" \
--dataset evals/skill-loading/opencode_skill_loading_eval_dataset.jsonl \
--matrix evals/skill-loading/opencode_skill_eval_matrix.json \
--disable-models-fetch \
--isolate-config \
--parallel 3
Arguments
If the user provides any of the following flags, append them to the command:
--filter-id <regex>--filter-category <substring>--parallel <n>
If --parallel is omitted, keep the default of 3.
After the run
- Summarize PASS/FAIL counts and list failed case IDs.
- If failures exist (PASS/FAIL, not ERROR), reference
evals/skill-loading/docs/skill-optimization-steering.mdand suggest the next remediation step. - If any cases are ERROR, do not suggest optimization. Instead, inspect
evals/skill-loading/.tmp/opencode-eval-results/<run>/results.jsonand any traces to identify the crash, then re-run the evals once the error is resolved.
Notes
- Run from the repo root so relative paths resolve.
--isolate-configalso disables project config, so no extra flag is required to avoid loading repo config/plugins during evals.- With
--disable-models-fetch, the runner falls back to~/.cache/opencode/models.jsonwhen present. Use--models-url file://...if you need a different cache file. - Include the exact command used in the response.
More from chandima/opencode-config
github-ops
|
2context7-docs
|
2mcporter
Direct MCP access via MCPorter. Use to discover MCP servers, list MCP tools, or call an MCP tool (e.g., chrome-devtools screenshot, firecrawl scrape) when no specific skill exists. Supports any configured MCP server.
1asu-discover
Semantic search across 760+ ASU GitHub repositories via RAG. Use for finding code patterns, integrations, SDKs, and ASU-specific conventions. Domains: PeopleSoft, EDNA, DPL, Terraform, EEL, Vault, CI/CD. Use BEFORE starting ASU integration tasks.
1planning-doc
Create or update PLAN.md planning documents when users ask for a plan or planning. Use when asked to create a plan, planning document, or plan-based workflow.
1agent-browser
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
1