skills/ajrlewis/ai-skills/addon-deterministic-eval-suite

addon-deterministic-eval-suite

SKILL.md

Add-on: Deterministic Eval Suite

Use this skill when a project needs reproducible, merge-blocking evaluation checks.

Compatibility

  • Works with all architect-* scaffolds.
  • Recommended default for production-default mode.

Inputs

Collect:

  • EVAL_SCOPE: skill-only | project-only | both (default both).
  • BLOCK_ON_FAIL: yes | no (default yes).
  • RUN_DOCKER_CHECKS: yes | no (default yes for production-default).

Integration Workflow

  1. Add deterministic eval artifacts:
evals/deterministic/manifest.yaml
evals/deterministic/run.sh
evals/deterministic/checks/
.github/workflows/evals-deterministic.yml
  1. Baseline checks (always include):
  • file/contract existence checks
  • lint/type/test/build command checks
  • docker artifact checks (Dockerfile, docker-compose.yml, image build)
  • decision trace checks (docs/DECISION_LOG.md, REVIEW_BUNDLE/DECISION_TRACE.md)
  • non-zero exit on failure
  • for skills repositories: add repository-local checks that validate skill folder/frontmatter naming
  • for skills repositories: add repository-local checks that validate required decision-policy language
  1. Skill-specific checks:
  • one check file per selected skill
  • examples:
  • check_nostr_profile.sh
  • check_rag_ingest_query.sh
  • check_review_bundle.sh
  • check_decision_trace.sh
  • check_skill_repo_policy.sh
  1. Output summary:
  • write deterministic run summary to REVIEW_BUNDLE/TEST_EVIDENCE.md.

Required Template

evals/deterministic/manifest.yaml

version: 1
checks:
  - id: contracts
    command: "bash evals/deterministic/checks/check_contracts.sh"
  - id: tests
    command: "bash evals/deterministic/checks/check_tests.sh"
  - id: build
    command: "bash evals/deterministic/checks/check_build.sh"
  - id: decision_trace
    command: "bash evals/deterministic/checks/check_decision_trace.sh"

Guardrails

  • Documentation contract for generated code:

    • Python: write module docstrings and docstrings for public classes, methods, and functions.
    • Next.js/TypeScript: write JSDoc for exported components, hooks, utilities, and route handlers.
    • Add concise rationale comments only for non-obvious logic, invariants, or safety constraints.
    • Apply this contract even when using template snippets below; expand templates as needed.
  • Deterministic evals are source-of-truth merge gates.

  • Avoid network-dependent assertions unless explicitly required.

  • Keep commands idempotent and non-destructive.

  • Fail closed: missing required checks must fail the run.

  • Treat missing decision rationale artifacts as deterministic failure.

Validation Checklist

  • Confirm generated code includes required docstrings/JSDoc and rationale comments for non-obvious logic.
test -f evals/deterministic/manifest.yaml
test -f evals/deterministic/run.sh
test -f .github/workflows/evals-deterministic.yml
bash evals/deterministic/run.sh

Decision Justification Rule

  • Every non-trivial decision must include a concrete justification.
  • Capture the alternatives considered and why they were rejected.
  • State tradeoffs and residual risks for the chosen option.
  • If justification is missing, treat the task as incomplete and surface it as a blocker.
Weekly Installs
9
First Seen
14 days ago
Installed on
gemini-cli9
github-copilot9
codex9
kimi-cli9
cursor9
opencode9