go-performance
Go Performance
Start with measurement, not rewriting.
Read the right reference
- Read references/measurement.md for benchmark setup,
go testflags,pprof, trace, flight recording, runtime metrics, and PGO workflow. - Read references/optimization.md when you are changing code after measurement or reviewing hot-path code.
Default workflow
- Reproduce the problem and name the metric that matters:
ns/op,B/op,allocs/op, throughput, tail latency, pause time, goroutine growth, or CPU saturation. - Add or repair a benchmark before changing code. On Go 1.24+ prefer
b.Loop()for new or edited benchmarks unless the repo must support older Go. - Run the benchmark repeatedly and compare with
benchstat; do not trust one run. - Collect one diagnostic at a time: CPU, heap/allocs, mutex, block, or trace. Do not mix profiles unless you must; diagnostics can distort each other.
- Fix the dominant cost first: algorithmic complexity, redundant work, bad data layout, excess allocation, or contention.
- Re-run the same benchmark and compare with
benchstat. - Apply PGO only after the code path is correct and the profile is representative.
- Validate the change under realistic service conditions with runtime metrics,
net/http/pprof, or flight recording if the issue is production-only.
Rules of engagement
- Prefer algorithmic or architectural fixes over stylistic micro-optimizations.
- Use benchmark evidence and profiles to justify code complexity.
- For long-running services, profile the service shape you actually run; microbenchmarks alone are not enough.
- Use
-run='^$'when you want benchmark-only runs. - For contention or scheduler issues, use trace, block, and mutex tooling instead of only CPU profiles.
- For intermittent production latency, consider the Go 1.25+ flight recorder before building custom tracing machinery.
Go 1.26-specific posture
- Re-measure old workarounds on Go 1.26 before preserving them. Go 1.26 changed the runtime and compiler enough that some older allocation, cgo, and GC workarounds may no longer pay for their complexity.
- On Linux containers, remember that Go 1.25+ made
GOMAXPROCScontainer-aware by default. Do not cargo-cultautomaxprocsinto modern Go services without a measured reason. - Use
testing.T.ArtifactDirplusgo test -artifacts -outputdir ...when a benchmark or perf regression test needs to retain profiles, traces, or other debugging output.
Output expectations
When reporting findings or a fix:
- State the bottleneck and the evidence.
- State the specific change and why it should move the measured metric.
- Report before/after benchmark or profile deltas.
- Call out residual risks, version assumptions, or production-only gaps.
More from blacktop/dotfiles
ratatui-tui
|
117code-simplifier
Simplifies and refines code for clarity, consistency, and maintainability while preserving all functionality. Focuses on recently modified code unless instructed otherwise.
3handoff
Generate optimized handoff prompts for delegating work to another LLM agent. Use when handing work to GPT-5.x/Codex, Claude 4.x, Gemini 3.x, or Grok 4.x, either as a shared-workspace sub-task handoff or a fresh-context handoff for a new session or model. Triggers on requests like "create a handoff prompt", "delegate this task to another agent", "hand this off", or "prepare context for another agent".
1rust-profiling
Profile Rust code using samply to identify CPU bottlenecks. Use when performance is slow, before optimizing, or when the user asks to profile.
1second-opinion
Run an external LLM code review with Codex CLI, Gemini CLI, or both. Use when the user asks for a second opinion, external review, Codex review, Gemini review, or wants a model-vs-model review of current changes, a branch diff, a specific commit, or a GitHub pull request.
1humanizer
|
1