DOPPLER Bench Skill

Use this skill for repeatable performance measurement and cross-product comparisons.

Mandatory Style Guides

Read these before non-trivial edits or benchmark-methodology changes:

docs/style/general-style-guide.md
docs/style/javascript-style-guide.md
docs/style/config-style-guide.md
docs/style/command-interface-design-guide.md
docs/style/harness-style-guide.md
docs/style/benchmark-style-guide.md

Developer Guide Routing

When benchmark work becomes extension work, also open:

docs/developer-guides/README.md

Common routes:

tuning or adding execution identities: docs/developer-guides/06-kernel-path-preset.md
kernel-level perf changes: docs/developer-guides/11-wgsl-kernel.md
attention-path throughput work: docs/developer-guides/13-attention-variant.md
cache/layout throughput work: docs/developer-guides/15-kvcache-layout.md
command-surface or harness-contract additions: docs/developer-guides/12-command-surface.md

Execution Plane Contract

JSON governs benchmark policy and engine selection (runtimeConfig, presets, rule assets).
JS wraps execution: contract validation, harness/runtime assembly, config isolation, and dispatch orchestration.
WGSL remains deterministic compute; it must not own benchmark semantics or fallback logic.
Any benchmark fairness axis (sampling, seed, budget, run policy) must come from shared contract JSON and be identical across engines.
Any unrepresented behavior choice must fail fast instead of falling back.

Cross-Engine Compare (Canonical)

# Fair compute comparison (default parity decode cadence)
node tools/compare-engines.js --mode compute --warmup 1 --runs 3 --decode-profile parity --save --json

# Doppler throughput-tuned decode cadence
node tools/compare-engines.js --mode compute --warmup 1 --runs 3 --decode-profile throughput --save --json

# Warm-start only (includes model load)
node tools/compare-engines.js --mode warm --warmup 1 --runs 3 --decode-profile parity --save --json

Notes:

--decode-profile parity maps Doppler to batchSize=1, readbackInterval=1 for closer TJS cadence parity.
--decode-profile throughput maps Doppler to batchSize=4, readbackInterval=4.
Prefill is normalized as prompt_tokens / ttft_ms in compare output.

Doppler Benchmark (Primary)

# Warm-cache benchmark (recommended baseline)
npm run bench -- --config '{"request":{"modelId":"MODEL_ID","runtimePreset":"experiments/bench/gemma3-bench-q4k","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true}}}' --json

# Cold-cache benchmark (cache disabled per run)
npm run bench -- --config '{"request":{"modelId":"MODEL_ID","runtimePreset":"experiments/bench/gemma3-bench-q4k","cacheMode":"cold"},"run":{"surface":"browser","bench":{"save":true}}}' --json

# Compare against last saved run
npm run bench -- --config '{"request":{"modelId":"MODEL_ID","runtimePreset":"experiments/bench/gemma3-bench-q4k","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true,"compare":"last"}}}' --json

Notes:

bench defaults to --surface auto; set run.surface="browser" when you explicitly want the browser relay.
Saved artifacts go to benchmarks/vendors/results/ when --save is used.
For instrumentation-heavy investigation, run debug with request.runtimePreset="experiments/gemma3-profile".

Performance Investigation Loop (Squeeze Workflow)

# 1) Baseline parity
node tools/compare-engines.js --mode compute --warmup 1 --runs 3 --decode-profile parity --save --json

# 2) Throughput probe
node tools/compare-engines.js --mode compute --warmup 1 --runs 3 --decode-profile throughput --save --json

# 3) Readback sensitivity (fixed workload, warm cache)
npm run bench -- --config '{"request":{"modelId":"MODEL_ID","runtimePreset":"experiments/gemma3-investigate-readback-r1","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true}}}' --json
npm run bench -- --config '{"request":{"modelId":"MODEL_ID","runtimePreset":"experiments/gemma3-investigate-readback-r8","cacheMode":"warm"},"run":{"surface":"browser","bench":{"save":true}}}' --json

# 4) Profile traces (investigate intent + profiler)
npm run debug -- --config '{"request":{"modelId":"MODEL_ID","runtimePreset":"experiments/gemma3-profile"},"run":{"surface":"auto"}}' --json

Vendor Benchmark (Transformers.js)

# Raw Transformers.js benchmark with ORT op profiling summary
node benchmarks/runners/transformersjs-bench.js --workload g3-p064-d064-t0-k1 --cache-mode warm --profile-ops on --profile-top 20 --json

# Normalize result into vendor registry output
node tools/vendor-bench.js run --target transformersjs --workload g3-p064-d064-t0-k1 -- node benchmarks/runners/transformersjs-bench.js --workload g3-p064-d064-t0-k1 --cache-mode warm --profile-ops on --profile-top 20 --json

Coverage Tracking (Bench vs Profile)

# Validate registry + harness + capability matrix
node tools/vendor-bench.js validate

# Show capability coverage for all targets
node tools/vendor-bench.js capabilities

# Show exact Doppler -> Transformers.js feature gaps
node tools/vendor-bench.js gap --base doppler --target transformersjs

Key Metrics

decode_tokens_per_sec
prefill_tokens_per_sec_ttft (preferred normalized prefill metric)
prefill_tokens_per_sec (legacy alias)
ttft_ms
decode_ms_per_token_p50/p95
model_load_ms
ort_profiled_total_ms (Transformers.js harness)

Canonical Files

tools/doppler-cli.js
benchmarks/runners/transformersjs-bench.js
benchmarks/runners/transformersjs-runner.html
benchmarks/vendors/registry.json
benchmarks/vendors/capabilities.json
benchmarks/vendors/results/
docs/developer-guides/README.md

Related Skills

doppler-debug for correctness regressions discovered during bench runs
doppler-convert when conversion format/quantization differences affect perf

doppler-bench