skills/0xsero/vllm-studio/evidence-heavy-evaluator

evidence-heavy-evaluator

SKILL.md

Evidence Heavy Evaluator

Run a deterministic repo evaluation and emit auditable artifacts in test-output.

Workflow

Choose inputs:

target_dir: repo or subdirectory to evaluate.
profile: readiness, maintainability, or release-readiness.
depth: quick or deep.
execute_checks: include to run lint/test/typecheck/build evidence.

Collect evidence:

skills/evidence-heavy-evaluator/scripts/collect_evidence.sh \
  --target-dir <target_dir> \
  --profile <profile> \
  --depth <depth> \
  [--execute-checks]

Read outputs from <target_dir>/test-output/evidence-heavy-evaluator/:

readiness-scorecard.json
readiness-report.md
checks-summary.tsv
metrics.tsv
signals.tsv

Summarize results for the user:

Lead with highest-impact failed criteria.
Cite the exact artifact paths used as evidence.
Separate failed checks from skipped/not-evaluated checks.

Guardrails

Keep evaluation read-only: do not edit code as part of this skill.
Treat command failures as evidence, not blockers.
Preserve deterministic ordering in report summaries.
If --execute-checks is omitted, call out that quality execution criteria are not evaluated.

Criteria

Use references/criteria-matrix.md as the source of truth for scoring criteria and profile weights.

Notes

The collector automatically runs render_report.py after evidence collection.
uv is required because render_report.py is executed with uv run.

Weekly Installs

2

Repository

0xsero/vllm-studio

GitHub Stars

291

First Seen

9 days ago

Security Audits

Gen Agent Trust HubFail

Installed on

openclaw2

zencoder1

amp1

cline1

opencode1

cursor1