rust-profiling
SKILL.md
Rust Profiling
Purpose
Guide agents through Rust performance profiling: flamegraphs via cargo-flamegraph, binary size analysis, monomorphization bloat measurement, Criterion microbenchmarks, and interpreting profiling results with inlined Rust frames.
Triggers
- "How do I generate a flamegraph for a Rust program?"
- "My Rust binary is huge — how do I find what's causing it?"
- "How do I write Criterion benchmarks?"
- "How do I measure monomorphization bloat?"
- "Rust performance is worse than expected — how do I profile it?"
- "How do I use perf with Rust?"
Workflow
1. Build for profiling
# Release with debug symbols (needed for readable profiles)
# Cargo.toml:
[profile.release-with-debug]
inherits = "release"
debug = true
cargo build --profile release-with-debug
# Or quick: release + debug info inline
CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release
2. Flamegraphs with cargo-flamegraph
# Install
cargo install flamegraph
# Linux: uses perf (requires perf_event_paranoid ≤ 1)
sudo sh -c 'echo 1 > /proc/sys/kernel/perf_event_paranoid'
cargo flamegraph --bin myapp -- arg1 arg2
# macOS: uses DTrace (requires sudo)
sudo cargo flamegraph --bin myapp -- arg1 arg2
# Profile tests
cargo flamegraph --test mytest -- test_filter
# Profile benchmarks
cargo flamegraph --bench mybench -- --bench
# Output
# Generates flamegraph.svg in current directory
# Open in browser: firefox flamegraph.svg
Custom flamegraph options:
# More samples
cargo flamegraph --freq 1000 --bin myapp
# Filter to specific threads
cargo flamegraph --bin myapp -- args 2>/dev/null
# Using perf directly for more control
perf record -g -F 999 ./target/release-with-debug/myapp args
perf script | stackcollapse-perf.pl | flamegraph.pl > out.svg
3. Binary size analysis with cargo-bloat
# Install
cargo install cargo-bloat
# Show top functions by size
cargo bloat --release -n 20
# Show per-crate size breakdown
cargo bloat --release --crates
# Include only specific crate
cargo bloat --release --filter myapp
# Compare before/after a change
cargo bloat --release --crates > before.txt
# make changes
cargo bloat --release --crates > after.txt
diff before.txt after.txt
Typical output:
File .text Size Crate Name
2.4% 3.0% 47.0KiB std <std macros>
1.8% 2.3% 35.5KiB myapp myapp::heavy_module::process
1.2% 1.5% 23.1KiB serde serde::de::...
4. Monomorphization bloat with cargo-llvm-lines
# Install
cargo install cargo-llvm-lines
# Show LLVM IR line counts (proxy for monomorphization)
cargo llvm-lines --release | head -40
# Filter to your crate only
cargo llvm-lines --release | grep '^myapp'
Typical output:
Lines Copies Function name
85330 1 [LLVM passes]
7761 92 core::fmt::write
4672 11 myapp::process::<impl MyTrait for T>
3201 47 <alloc::vec::Vec<T> as core::ops::Drop>::drop
High Copies count = monomorphization expansion. Fix:
// Before: generic, gets monomorphized for every T
fn process<T: AsRef<[u8]>>(data: T) -> usize {
do_work(data.as_ref())
}
// After: thin generic wrapper + concrete inner
fn process<T: AsRef<[u8]>>(data: T) -> usize {
fn inner(data: &[u8]) -> usize { do_work(data) }
inner(data.as_ref())
}
5. Criterion microbenchmarks
# Cargo.toml
[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }
[[bench]]
name = "my_bench"
harness = false
// benches/my_bench.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
fn bench_process(c: &mut Criterion) {
// Simple benchmark
c.bench_function("process 1000 items", |b| {
let data: Vec<i32> = (0..1000).collect();
b.iter(|| process(black_box(&data))) // black_box prevents optimization
});
}
fn bench_sizes(c: &mut Criterion) {
let mut group = c.benchmark_group("process_sizes");
for size in [100, 1000, 10000].iter() {
let data: Vec<i32> = (0..*size).collect();
group.bench_with_input(
BenchmarkId::from_parameter(size),
&data,
|b, data| b.iter(|| process(black_box(data))),
);
}
group.finish();
}
criterion_group!(benches, bench_process, bench_sizes);
criterion_main!(benches);
# Run all benchmarks
cargo bench
# Run specific benchmark
cargo bench --bench my_bench
# Run with filter
cargo bench -- process_sizes
# Compare with baseline (save/load)
cargo bench -- --save-baseline before
# make changes
cargo bench -- --baseline before
# View HTML report
open target/criterion/report/index.html
6. perf with Rust (Linux)
# Record
perf record -g ./target/release-with-debug/myapp args
perf record -g -F 999 ./target/release-with-debug/myapp args # higher freq
# Report
perf report # interactive TUI
perf report --stdio --no-call-graph | head -40 # text
# Annotate specific function
perf annotate myapp::hot_function
# stat (quick counters)
perf stat ./target/release/myapp args
Rust-specific perf tips:
- Build with
debug = 1(line tables only) for faster builds with line-level attribution - Use
RUSTFLAGS="-C force-frame-pointers=yes"for better call graphs without DWARF unwinding - Disable ASLR for reproducible addresses:
setarch $(uname -m) -R ./myapp
7. heaptrack / DHAT for allocations
# heaptrack (Linux)
heaptrack ./target/release/myapp args
heaptrack_print heaptrack.myapp.*.zst | head -50
# DHAT via Valgrind
valgrind --tool=dhat ./target/debug/myapp args
# Open dhat-out.* with dh_view.html
For flamegraph setup and Criterion configuration, see references/cargo-flamegraph-setup.md.
Related skills
- Use
skills/rust/rustc-basicsfor build configuration (debug symbols, profiles) - Use
skills/profilers/linux-perffor perf fundamentals - Use
skills/profilers/flamegraphsfor reading and interpreting flamegraph SVGs - Use
skills/profilers/valgrindfor allocation profiling with massif/DHAT
Weekly Installs
34
Repository
mohitmishra786/…v-skillsGitHub Stars
27
First Seen
Feb 21, 2026
Security Audits
Installed on
github-copilot31
opencode30
gemini-cli30
amp30
cline30
codex30