Go Performance

Start with measurement, not rewriting.

Read the right reference

Read references/measurement.md for benchmark setup, go test flags, pprof, trace, flight recording, runtime metrics, and PGO workflow.
Read references/optimization.md when you are changing code after measurement or reviewing hot-path code.

Reproduce the problem and name the metric that matters: ns/op, B/op, allocs/op, throughput, tail latency, pause time, goroutine growth, or CPU saturation.
Add or repair a benchmark before changing code. On Go 1.24+ prefer b.Loop() for new or edited benchmarks unless the repo must support older Go.
Run the benchmark repeatedly and compare with benchstat; do not trust one run.
Collect one diagnostic at a time: CPU, heap/allocs, mutex, block, or trace. Do not mix profiles unless you must; diagnostics can distort each other.
Fix the dominant cost first: algorithmic complexity, redundant work, bad data layout, excess allocation, or contention.
Re-run the same benchmark and compare with benchstat.
Apply PGO only after the code path is correct and the profile is representative.
Validate the change under realistic service conditions with runtime metrics, net/http/pprof, or flight recording if the issue is production-only.

Prefer algorithmic or architectural fixes over stylistic micro-optimizations.
Use benchmark evidence and profiles to justify code complexity.
For long-running services, profile the service shape you actually run; microbenchmarks alone are not enough.
Use -run='^$' when you want benchmark-only runs.
For contention or scheduler issues, use trace, block, and mutex tooling instead of only CPU profiles.
For intermittent production latency, consider the Go 1.25+ flight recorder before building custom tracing machinery.

Re-measure old workarounds on Go 1.26 before preserving them. Go 1.26 changed the runtime and compiler enough that some older allocation, cgo, and GC workarounds may no longer pay for their complexity.
On Linux containers, remember that Go 1.25+ made GOMAXPROCS container-aware by default. Do not cargo-cult automaxprocs into modern Go services without a measured reason.
Use testing.T.ArtifactDir plus go test -artifacts -outputdir ... when a benchmark or perf regression test needs to retain profiles, traces, or other debugging output.

When reporting findings or a fix: