doppler-bench
SKILL.md
DOPPLER Bench Skill
Use this skill for repeatable performance measurement and cross-product comparisons.
Mandatory Style Guides
Read these before non-trivial edits or benchmark-methodology changes:
docs/style/general-style-guide.mddocs/style/javascript-style-guide.mddocs/style/config-style-guide.mddocs/style/command-interface-design-guide.mddocs/style/harness-style-guide.mddocs/style/benchmark-style-guide.md
Developer Guide Routing
When benchmark work becomes extension work, also open:
docs/developer-guides/README.md
Common routes:
- tuning or adding execution identities:
docs/developer-guides/06-kernel-path-preset.md - kernel-level perf changes:
docs/developer-guides/11-wgsl-kernel.md - attention-path throughput work:
docs/developer-guides/13-attention-variant.md - cache/layout throughput work:
docs/developer-guides/15-kvcache-layout.md - command-surface or harness-contract additions:
docs/developer-guides/12-command-surface.md
Execution Plane Contract
- JSON governs benchmark policy and engine selection (
runtimeConfig, presets, rule assets). - JS wraps execution: contract validation, harness/runtime assembly, config isolation, and dispatch orchestration.
- WGSL remains deterministic compute; it must not own benchmark semantics or fallback logic.
- Any benchmark fairness axis (
sampling,seed, budget, run policy) must come from shared contract JSON and be identical across engines. - Any unrepresented behavior choice must fail fast instead of falling back.
Cross-Engine Compare (Canonical)
# Fair compute comparison (default parity decode cadence)
node tools/compare-engines.js --mode compute --warmup 1 --runs 3 --decode-profile parity --save --json
# Doppler throughput-tuned decode cadence
node tools/compare-engines.js --mode compute --warmup 1 --runs 3 --decode-profile throughput --save --json
# Warm-start only (includes model load)
node tools/compare-engines.js --mode warm --warmup 1 --runs 3 --decode-profile parity --save --json
Notes:
--decode-profile paritymaps Doppler tobatchSize=1,readbackInterval=1for closer TJS cadence parity.--decode-profile throughputmaps Doppler tobatchSize=4,readbackInterval=4.- Prefill is normalized as
prompt_tokens / ttft_msin compare output.
Doppler Benchmark (Primary)
# Warm-cache benchmark (recommended baseline)
npm run bench -- --config '{"request":{"modelId":"MODEL_ID","runtimePreset":"experiments/bench/gemma3-bench-q4k","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true}}}' --json
# Cold-cache benchmark (cache disabled per run)
npm run bench -- --config '{"request":{"modelId":"MODEL_ID","runtimePreset":"experiments/bench/gemma3-bench-q4k","cacheMode":"cold"},"run":{"surface":"browser","bench":{"save":true}}}' --json
# Compare against last saved run
npm run bench -- --config '{"request":{"modelId":"MODEL_ID","runtimePreset":"experiments/bench/gemma3-bench-q4k","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true,"compare":"last"}}}' --json
Notes:
benchdefaults to--surface auto; setrun.surface="browser"when you explicitly want the browser relay.- Saved artifacts go to
benchmarks/vendors/results/when--saveis used. - For instrumentation-heavy investigation, run
debugwithrequest.runtimePreset="experiments/gemma3-profile".
Performance Investigation Loop (Squeeze Workflow)
# 1) Baseline parity
node tools/compare-engines.js --mode compute --warmup 1 --runs 3 --decode-profile parity --save --json
# 2) Throughput probe
node tools/compare-engines.js --mode compute --warmup 1 --runs 3 --decode-profile throughput --save --json
# 3) Readback sensitivity (fixed workload, warm cache)
npm run bench -- --config '{"request":{"modelId":"MODEL_ID","runtimePreset":"experiments/gemma3-investigate-readback-r1","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true}}}' --json
npm run bench -- --config '{"request":{"modelId":"MODEL_ID","runtimePreset":"experiments/gemma3-investigate-readback-r8","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true}}}' --json
# 4) Profile traces (investigate intent + profiler)
npm run debug -- --config '{"request":{"modelId":"MODEL_ID","runtimePreset":"experiments/gemma3-profile"},"run":{"surface":"auto"}}' --json
Vendor Benchmark (Transformers.js)
# Raw Transformers.js benchmark with ORT op profiling summary
node benchmarks/runners/transformersjs-bench.js --workload g3-p064-d064-t0-k1 --cache-mode warm --profile-ops on --profile-top 20 --json
# Normalize result into vendor registry output
node tools/vendor-bench.js run --target transformersjs --workload g3-p064-d064-t0-k1 -- node benchmarks/runners/transformersjs-bench.js --workload g3-p064-d064-t0-k1 --cache-mode warm --profile-ops on --profile-top 20 --json
Coverage Tracking (Bench vs Profile)
# Validate registry + harness + capability matrix
node tools/vendor-bench.js validate
# Show capability coverage for all targets
node tools/vendor-bench.js capabilities
# Show exact Doppler -> Transformers.js feature gaps
node tools/vendor-bench.js gap --base doppler --target transformersjs
Key Metrics
decode_tokens_per_secprefill_tokens_per_sec_ttft(preferred normalized prefill metric)prefill_tokens_per_sec(legacy alias)ttft_msdecode_ms_per_token_p50/p95model_load_msort_profiled_total_ms(Transformers.js harness)
Canonical Files
tools/doppler-cli.jsbenchmarks/runners/transformersjs-bench.jsbenchmarks/runners/transformersjs-runner.htmlbenchmarks/vendors/registry.jsonbenchmarks/vendors/capabilities.jsonbenchmarks/vendors/results/docs/developer-guides/README.md
Related Skills
doppler-debugfor correctness regressions discovered during bench runsdoppler-convertwhen conversion format/quantization differences affect perf
Weekly Installs
1
Repository
clocksmith/dopplerGitHub Stars
2
First Seen
7 days ago
Security Audits
Installed on
mcpjam1
claude-code1
replit1
junie1
windsurf1
zencoder1