golang-benchmark
Persona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision.
Thinking mode: Use ultrathink for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions.
Go Benchmarking & Performance Measurement
Performance improvement does not exist without measures — if you can measure it, you can improve it.
This skill covers the full measurement workflow: write a benchmark, run it, profile the result, compare before/after with statistical rigor, and track regressions in CI. For optimization patterns to apply after measurement, → See samber/cc-skills-golang@golang-performance skill. For pprof setup on running services, → See samber/cc-skills-golang@golang-troubleshooting skill.
Writing Benchmarks
b.Loop() (Go 1.24+) — preferred
b.Loop() prevents the compiler from optimizing away the code under test — without it, the compiler can detect dead results and eliminate them, producing misleadingly fast numbers. It also excludes setup code before the loop from timing automatically.
func BenchmarkParse(b *testing.B) {
data := loadFixture("large.json") // setup — excluded from timing
for b.Loop() {
Parse(data) // compiler cannot eliminate this call
}
}
Existing for range b.N benchmarks still work but should migrate to b.Loop() — the old pattern requires manual b.ResetTimer() and a package-level sink variable to prevent dead code elimination.
Memory tracking
func BenchmarkAlloc(b *testing.B) {
b.ReportAllocs() // or run with -benchmem flag
for b.Loop() {
_ = make([]byte, 1024)
}
}
b.ReportMetric() adds custom metrics (e.g., throughput):
b.ReportMetric(float64(totalBytes)/b.Elapsed().Seconds(), "bytes/s")
Sub-benchmarks and table-driven
func BenchmarkEncode(b *testing.B) {
for _, size := range []int{64, 256, 4096} {
b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
data := make([]byte, size)
for b.Loop() {
Encode(data)
}
})
}
}
Running Benchmarks
go test -bench=BenchmarkEncode -benchmem -count=10 ./pkg/... | tee bench.txt
| Flag | Purpose |
|---|---|
-bench=. |
Run all benchmarks (regexp filter) |
-benchmem |
Report allocations (B/op, allocs/op) |
-count=10 |
Run 10 times for statistical significance |
-benchtime=3s |
Minimum time per benchmark (default 1s) |
-cpu=1,2,4 |
Run with different GOMAXPROCS values |
-cpuprofile=cpu.prof |
Write CPU profile |
-memprofile=mem.prof |
Write memory profile |
-trace=trace.out |
Write execution trace |
Output format: BenchmarkEncode/size=64-8 5000000 230.5 ns/op 128 B/op 2 allocs/op — the -8 suffix is GOMAXPROCS, ns/op is time per operation, B/op is bytes allocated per op, allocs/op is heap allocation count per op.
Documenting Results in Commits
Paste benchstat output in the commit body when the change has a measurable performance impact. This documents why an optimization was made, prevents future readers from reverting it, and lets reviewers verify the claim without re-running benchmarks.
Commit format:
perf(parser): reduce Parse allocations 50% with sync.Pool
Replace per-call []byte allocation with a pooled buffer.
goos: linux / goarch: amd64 / cpu: AMD Ryzen 9 5950X
│ old │ new │
│ sec/op │ sec/op vs base │
Parse-32 4.592µ ± 2% 3.041µ ± 1% -33.78% (p=0.000 n=10)
│ old │ new │
│ B/op │ B/op vs base │
Parse-32 1.024Ki ± 0% 0.512Ki ± 0% -50.00% (p=0.000 n=10)
│ old │ new │
│ allocs/op │ allocs/op vs base │
Parse-32 12.00 ± 0% 6.000 ± 0% -50.00% (p=0.000 n=10)
Rules:
- Only include benchmarks directly affected by the change — strip unrelated rows
- Never paste results with
~(no statistical significance) — the improvement cannot be claimed - Include the hardware context line (
goos/goarch/cpu) so results are reproducible - Use
perf(scope):commit type for performance-only changes
Profiling from Benchmarks
Generate profiles directly from benchmark runs — no HTTP server needed:
# CPU profile
go test -bench=BenchmarkParse -cpuprofile=cpu.prof ./pkg/parser
go tool pprof cpu.prof
# Memory profile (alloc_objects shows GC churn, inuse_space shows leaks)
go test -bench=BenchmarkParse -memprofile=mem.prof ./pkg/parser
go tool pprof -alloc_objects mem.prof
# Execution trace
go test -bench=BenchmarkParse -trace=trace.out ./pkg/parser
go tool trace trace.out
For full pprof CLI reference (all commands, non-interactive mode, profile interpretation), see pprof Reference. For execution trace interpretation, see Trace Reference. For statistical comparison, see benchstat Reference.
Reference Files
-
pprof Reference — Interactive and non-interactive analysis of CPU, memory, and goroutine profiles. Full CLI commands, profile types (CPU vs allocobjects vs inuse_space), web UI navigation, and interpretation patterns. Use this to dive deep into _where time and memory are being spent in your code.
-
benchstat Reference — Statistical comparison of benchmark runs with rigorous confidence intervals and p-value tests. Covers output reading, filtering old benchmarks, interleaving results for visual clarity, and regression detection. Use this when you need to prove a change made a meaningful performance difference, not just a lucky run.
-
Trace Reference — Execution tracer for understanding when and why code runs. Visualizes goroutine scheduling, garbage collection phases, network blocking, and custom span annotations. Use this when pprof (which shows where CPU goes) isn't enough — you need to see the timeline of what happened.
-
Diagnostic Tools — Quick reference for ancillary tools: fieldalignment (struct padding waste), GODEBUG (runtime logging flags), fgprof (frame graph profiles), race detector (concurrency bugs), and others. Use this when you have a specific symptom and need a focused diagnostic — don't reach for pprof if a simpler tool already answers your question.
-
Compiler Analysis — Low-level compiler optimization insights: escape analysis (when values move to the heap), inlining decisions (which function calls are eliminated), SSA dump (intermediate representation), and assembly output. Use this when benchmarks show allocations you didn't expect, or when you want to verify the compiler did what you intended.
-
CI Regression Detection — Automated performance regression gating in CI pipelines. Covers three tools (benchdiff for quick PR comparisons, cob for strict threshold-based gating, gobenchdata for long-term trend dashboards), noisy neighbor mitigation strategies (why cloud CI benchmarks vary 5-10% even on quiet machines), and self-hosted runner tuning to make benchmarks reproducible. Use this when you want to ensure pull requests don't silently slow down your codebase — detecting regressions early prevents shipping performance debt.
-
Investigation Session — Production performance troubleshooting workflow combining Prometheus runtime metrics (heap size, GC frequency, goroutine counts), PromQL queries to correlate metrics with code changes, runtime configuration flags (GODEBUG env vars to enable GC logging), and cost warnings (when you're hitting performance tax). Use this when production benchmarks look good but real traffic behaves differently.
-
Prometheus Go Metrics Reference — Complete listing of Go runtime metrics actually exposed as Prometheus metrics by
prometheus/client_golang. Covers 30 default metrics, 40+ optional metrics (Go 1.17+), process metrics, and common PromQL queries. Distinguishes betweenruntime/metrics(Go internal data) and Prometheus metrics (what you scrape from/metrics). Use this when setting up monitoring dashboards or writing PromQL queries for production alerts.
Cross-References
- → See
samber/cc-skills-golang@golang-performanceskill for optimization patterns to apply after measuring ("if X bottleneck, apply Y") - → See
samber/cc-skills-golang@golang-troubleshootingskill for pprof setup on running services (enable, secure, capture), Delve debugger, GODEBUG flags, root cause methodology - → See
samber/cc-skills-golang@golang-observabilityskill for everyday always-on monitoring, continuous profiling (Pyroscope), distributed tracing (OpenTelemetry) - → See
samber/cc-skills-golang@golang-testingskill for general testing practices - → See
samber/cc-skills@promql-cliskill for querying Prometheus runtime metrics in production to validate benchmark findings
More from samber/cc-skills-golang
golang-code-style
Golang code style, formatting and conventions. Use when writing Go code, reviewing style, configuring linters, writing comments, or establishing project standards.
2.5Kgolang-performance
Golang performance optimization patterns and methodology - if X bottleneck, then apply Y. Covers allocation reduction, CPU efficiency, memory layout, GC tuning, pooling, caching, and hot-path optimization. Use when profiling or benchmarks have identified a bottleneck and you need the right optimization pattern to fix it. Also use when performing performance code review to suggest improvements or benchmarks that could help identify quick performance gains. Not for measurement methodology (see golang-benchmark skill) or debugging workflow (see golang-troubleshooting skill).
2.5Kgolang-error-handling
Idiomatic Golang error handling — creation, wrapping with %w, errors.Is/As, errors.Join, custom error types, sentinel errors, panic/recover, the single handling rule, structured logging with slog, HTTP request logging middleware, and samber/oops for production errors. Built to make logs usable at scale with log aggregation 3rd-party tools. Apply when creating, wrapping, inspecting, or logging errors in Go code.
2.4Kgolang-design-patterns
Idiomatic Golang design patterns — functional options, constructors, error flow and cascading, resource management and lifecycle, graceful shutdown, resilience, architecture, dependency injection, data handling, streaming, and more. Apply when explicitly choosing between architectural patterns, implementing functional options, designing constructor APIs, setting up graceful shutdown, applying resilience patterns, or asking which idiomatic Go pattern fits a specific problem.
2.3Kgolang-testing
Provides a comprehensive guide for writing production-ready Golang tests. Covers table-driven tests, test suites with testify, mocks, unit tests, integration tests, benchmarks, code coverage, parallel tests, fuzzing, fixtures, goroutine leak detection with goleak, snapshot testing, memory leaks, CI with GitHub Actions, and idiomatic naming conventions. Use this whenever writing tests, asking about testing patterns or setting up CI for Go projects. Essential for ANY test-related conversation in Go.
2.3Kgolang-concurrency
Golang concurrency patterns. Use when writing or reviewing concurrent Go code involving goroutines, channels, select, locks, sync primitives, errgroup, singleflight, worker pools, or fan-out/fan-in pipelines. Also triggers when you detect goroutine leaks, race conditions, channel ownership issues, or need to choose between channels and mutexes.
2.3K