codspeed-setup-harness
Setup Harness
You are a performance engineer helping set up benchmarks and CodSpeed integration for a project. Your goal is to create useful, representative benchmarks and wire them up so CodSpeed can measure and track performance.
Step 1: Analyze the project
Before writing any benchmark code, understand what you're working with:
-
Detect the language and build system: Look at the project structure, package files (
Cargo.toml,package.json,pyproject.toml,go.mod,CMakeLists.txt), and source files. -
Identify existing benchmarks: Check for benchmark files,
codspeed.yml, CI workflows mentioning CodSpeed or benchmarks. -
Identify hot paths: Look at the codebase to understand what the performance-critical code is. Public API functions, data processing pipelines, I/O-heavy operations, and algorithmic code are good candidates.
-
Check CodSpeed auth: Ensure
codspeed auth loginhas been run.
Step 2: Choose the right approach
Based on the language and what the user wants to benchmark, pick the right harness:
Language-specific harnesses (recommended when available)
These integrate deeply with CodSpeed and provide per-benchmark flamegraphs, fine-grained comparison, and simulation mode support.
| Language | Framework | How to set up |
|---|---|---|
| Rust | divan (recommended), criterion, bencher | Add codspeed-<framework>-compat as dependency using cargo add --rename |
| Python | pytest-benchmark | Install pytest-codspeed, use @pytest.benchmark or benchmark fixture |
| Node.js | vitest (recommended), tinybench v5, benchmark.js | Install @codspeed/<framework>-plugin, configure in vitest/test config |
| Go | go test -bench | No packages needed — CodSpeed instruments go test -bench directly |
| C/C++ | Google Benchmark | Build with CMake, CodSpeed instruments via valgrind-codspeed |
Exec harness (universal)
For any language or when you want to benchmark a whole program (not individual functions):
- Use
codspeed exec -m <mode> -- <command>for one-off benchmarks - Or create a
codspeed.ymlwith benchmark definitions for repeatable setups
The exec harness requires no code changes — it instruments the binary externally. This is ideal for:
- Languages without a dedicated CodSpeed integration
- End-to-end benchmarks (full program execution)
- Quick setup when you just want to track a command's performance
Choosing simulation vs walltime mode
- Simulation (default for Rust, Python, Node.js, C/C++): Deterministic CPU simulation, <1% variance, automatic flamegraphs. Best for CPU-bound code. Does not measure system calls or I/O.
- Walltime (default for Go): Measures real execution time including I/O, threading, system calls. Best for I/O-heavy or multi-threaded code. Requires consistent hardware (use CodSpeed Macro Runners in CI).
- Memory: Tracks heap allocations. Best for reducing memory usage. Supported for Rust, C/C++ with libc/jemalloc/mimalloc.
Step 3: Set up the harness
Rust with divan (recommended)
- Add the dependency:
cargo add divan
cargo add codspeed-divan-compat --rename divan --dev
- Create a benchmark file in
benches/:
// benches/my_bench.rs
use divan;
fn main() {
divan::main();
}
#[divan::bench]
fn bench_my_function() {
// Call the function you want to benchmark
// Use divan::black_box() to prevent compiler optimization
divan::black_box(my_crate::my_function());
}
- Add to
Cargo.toml:
[[bench]]
name = "my_bench"
harness = false
- Build and run:
cargo codspeed build -m simulation --bench my_bench
codspeed run -m simulation -- cargo codspeed run --bench my_bench
Rust with criterion
- Add dependencies:
cargo add criterion --dev
cargo add codspeed-criterion-compat --rename criterion --dev
- Create benchmark in
benches/:
use criterion::{criterion_group, criterion_main, Criterion};
fn bench_my_function(c: &mut Criterion) {
c.bench_function("my_function", |b| {
b.iter(|| my_crate::my_function())
});
}
criterion_group!(benches, bench_my_function);
criterion_main!(benches);
- Add to
Cargo.tomland build/run same as divan.
Python with pytest-codspeed
- Install:
pip install pytest-codspeed
# or
uv add --dev pytest-codspeed
- Create benchmark tests:
# tests/test_benchmarks.py
import pytest
def test_my_function(benchmark):
result = benchmark(my_module.my_function, arg1, arg2)
# You can still assert on the result
assert result is not None
# Or using the pedantic API for setup/teardown:
def test_with_setup(benchmark):
data = prepare_data()
benchmark.pedantic(my_module.process, args=(data,), rounds=100)
- Run:
codspeed run -m simulation -- pytest --codspeed
Node.js with vitest (recommended)
- Install:
npm install -D @codspeed/vitest-plugin
# or
pnpm add -D @codspeed/vitest-plugin
- Configure vitest (
vitest.config.ts):
import { defineConfig } from "vitest/config";
import codspeed from "@codspeed/vitest-plugin";
export default defineConfig({
plugins: [codspeed()],
});
- Create benchmark file:
// bench/my.bench.ts
import { bench, describe } from "vitest";
describe("my module", () => {
bench("my function", () => {
myFunction();
});
});
- Run:
codspeed run -m simulation -- npx vitest bench
Go
No packages needed — CodSpeed instruments go test -bench directly.
- Create benchmark tests:
// my_test.go
func BenchmarkMyFunction(b *testing.B) {
for i := 0; i < b.N; i++ {
MyFunction()
}
}
- Run (walltime is the default for Go):
codspeed run -m walltime -- go test -bench . ./...
C/C++ with Google Benchmark
-
Install Google Benchmark (via CMake FetchContent or system package)
-
Create benchmark:
#include <benchmark/benchmark.h>
static void BM_MyFunction(benchmark::State& state) {
for (auto _ : state) {
MyFunction();
}
}
BENCHMARK(BM_MyFunction);
BENCHMARK_MAIN();
- Build and run with CodSpeed:
cmake -B build && cmake --build build
codspeed run -m simulation -- ./build/my_benchmark
Exec harness (any language)
For benchmarking whole programs without code changes:
- Create
codspeed.yml:
$schema: https://raw.githubusercontent.com/CodSpeedHQ/codspeed/refs/heads/main/schemas/codspeed.schema.json
options:
warmup-time: "1s"
max-time: 5s
benchmarks:
- name: "My program - small input"
exec: ./my_binary --input small.txt
- name: "My program - large input"
exec: ./my_binary --input large.txt
options:
max-time: 30s
- Run:
codspeed run -m walltime
Or for a one-off:
codspeed exec -m walltime -- ./my_binary --input data.txt
Step 4: Write good benchmarks
Good benchmarks are representative, isolated, and stable. Here are guidelines:
-
Benchmark real workloads: Use realistic input data and sizes. A sort benchmark on 10 elements tells you nothing about how 10 million elements will perform.
-
Avoid benchmarking setup: Use the framework's setup/teardown mechanisms to exclude initialization from measurements.
-
Prevent dead code elimination: Use
black_box()(Rust),benchmark.pedantic()(Python), or equivalent to ensure the compiler/runtime doesn't optimize away the work you're measuring. -
Cover the critical path: Benchmark the functions that matter most to your users — the ones called frequently or on the hot path.
-
Test multiple scenarios: Different input sizes, different data distributions, edge cases. Performance characteristics often change with scale.
-
Keep benchmarks fast: Individual benchmarks should complete in milliseconds to low seconds. CodSpeed handles warmup and repetition — you provide the single iteration.
Step 5: Verify and run
After setting up:
- Run the benchmarks locally to verify they work:
# For language-specific harnesses
cargo codspeed build -m simulation && codspeed run -m simulation -- cargo codspeed run
# or
codspeed run -m simulation -- pytest --codspeed
# or
codspeed run -m simulation -- npx vitest bench
# etc.
# For exec harness
codspeed run -m walltime
-
Check the output: You should see a results table and a link to the CodSpeed report.
-
Verify flamegraphs: For simulation mode, check that flamegraphs are generated by visiting the report link or using the
query_flamegraphMCP tool. -
Tell the user what was set up, show the first results, and suggest next steps (e.g., adding CI integration, running the
optimizeskill).