rust-build-times
Rust Build Times
Purpose
Guide agents through diagnosing and improving Rust compilation speed: cargo-timings for build profiling, sccache for caching, the Cranelift codegen backend for faster dev builds, workspace crate splitting, LTO configuration trade-offs, and fast linkers (mold/lld).
Triggers
- "My Rust project takes too long to compile"
- "How do I profile which crates are slow to build?"
- "How do I set up sccache for Rust?"
- "What is the Cranelift backend and how does it help?"
- "Should I use thin LTO or fat LTO?"
- "How do I use the mold linker with Rust?"
Workflow
1. Diagnose with cargo-timings
# Build with timing report
cargo build --timings
# Opens build/cargo-timings/cargo-timing.html
# Shows: crate compilation timeline, parallelism, bottlenecks
# For release builds
cargo build --release --timings
# Key things to look for in the timing report:
# - Long sequential chains (no parallelism)
# - Individual crates taking > 10s (candidates for optimization)
# - Proc-macro crates blocking everything downstream
# cargo-llvm-lines — count LLVM IR lines per function (monomorphization)
cargo install cargo-llvm-lines
cargo llvm-lines --release | head -20
# Shows functions generating the most LLVM IR (template explosion)
2. sccache — compilation caching for Rust
# Install
cargo install sccache
# or: brew install sccache
# Configure for Rust builds
export RUSTC_WRAPPER=sccache
# Add to .cargo/config.toml (project or global)
# ~/.cargo/config.toml
[build]
rustc-wrapper = "sccache"
# Check cache stats
sccache --show-stats
# S3 backend for CI teams
export SCCACHE_BUCKET=my-rust-cache
export SCCACHE_REGION=us-east-1
export AWS_ACCESS_KEY_ID=xxx
export AWS_SECRET_ACCESS_KEY=yyy
sccache --start-server
# GitHub Actions with sccache
# - uses: mozilla-actions/sccache-action@v0.0.4
3. Cranelift codegen backend
Cranelift is a fast codegen backend (vs LLVM) — produces slower code but compiles much faster. Ideal for development builds:
# Install nightly (Cranelift requires nightly for now)
rustup toolchain install nightly
rustup component add rustc-codegen-cranelift-preview --toolchain nightly
# Use Cranelift for dev builds only
# .cargo/config.toml
[unstable]
codegen-backend = true
[profile.dev]
codegen-backend = "cranelift"
# Use per-build
CARGO_PROFILE_DEV_CODEGEN_BACKEND=cranelift \
RUSTFLAGS="-Zunstable-options" \
cargo +nightly build
Cranelift vs LLVM trade-off:
- Dev builds: 20–40% faster compilation with Cranelift
- Runtime performance: LLVM-compiled code is faster (Cranelift skips many optimizations)
- Release builds: always use LLVM
4. Workspace splitting for parallelism
A single large crate compiles sequentially. Split into smaller crates to enable Cargo parallelism:
# Before: one giant crate
[package]
name = "monolith" # everything in one crate = sequential compile
# After: workspace with parallel crates
[workspace]
members = [
"core", # compiled in parallel
"networking", # no deps on ui → parallel with ui
"ui", # no deps on networking → parallel
"server", # depends on core + networking
"cli", # depends on core + ui
]
# Visualize dependency graph
cargo tree | head -30
cargo tree --graph | dot -Tsvg > deps.svg # visual graph
# Check how many crates compile in parallel
cargo build -j$(nproc) --timings # maximize parallelism
Rules for effective workspace splitting:
- Break circular dependencies first
- Separate proc-macros into their own crate (they block everything)
- Keep frequently-changed code isolated (less invalidation)
5. LTO configuration
LTO improves runtime performance but increases link time:
# Cargo.toml profile configuration
[profile.release]
lto = "thin" # thin LTO: good performance, much faster than "fat"
codegen-units = 1 # needed for best optimization (but disables parallelism)
[profile.release-fast]
inherits = "release"
lto = "fat" # full LTO: maximum performance, very slow link
[profile.dev]
lto = "off" # never use LTO in dev (compilation speed)
codegen-units = 16 # maximize parallel codegen in dev
LTO comparison:
| Setting | Link time | Runtime perf | Use when |
|---|---|---|---|
lto = false |
Fast | Baseline | Dev builds |
lto = "thin" |
Moderate | +5–15% | Most release builds |
lto = "fat" |
Slow | +15–30% | Maximum performance |
codegen-units = 1 |
Slowest | Best | With LTO for release |
6. Fast linkers
The linker is often the bottleneck for large Rust projects:
# mold — fastest general-purpose linker (Linux)
sudo apt-get install mold
# .cargo/config.toml
[target.x86_64-unknown-linux-gnu]
linker = "clang"
rustflags = ["-C", "link-arg=-fuse-ld=mold"]
# Or use cargo-zigbuild (uses zig cc as linker)
cargo install cargo-zigbuild
cargo zigbuild --release
# lld — LLVM's linker (faster than GNU ld, available everywhere)
# .cargo/config.toml
[target.x86_64-unknown-linux-gnu]
rustflags = ["-C", "link-arg=-fuse-ld=lld"]
# On macOS: zld or the default lld
[target.x86_64-apple-darwin]
rustflags = ["-C", "link-arg=-fuse-ld=/usr/local/bin/zld"]
Linker speed comparison (large project, typical):
- GNU ld: baseline
- lld: ~2× faster
- mold: ~5–10× faster
- gold: ~1.5× faster
7. Other quick wins
# Reduce debug info level (faster but less debuggable)
# Cargo.toml
[profile.dev]
debug = 1 # 0=off, 1=line tables, 2=full (default)
# debug=1 saves 20-40% on debug build time
# Split debug info (reduces linker input)
[profile.dev]
split-debuginfo = "unpacked" # macOS: equivalent of gsplit-dwarf
# Disable incremental compilation (sometimes faster for full rebuilds)
CARGO_INCREMENTAL=0 cargo build
# Reduce proc-macro compile time (pin heavy proc-macro deps)
# Heavy proc-macros: serde, tokio, axum — keep versions stable
Related skills
- Use
skills/rust/cargo-workflowsfor Cargo workspace and profile configuration - Use
skills/build-systems/build-accelerationfor C/C++ equivalent build acceleration - Use
skills/debuggers/dwarf-debug-formatfor debug info size/split-dwarf tradeoffs - Use
skills/binaries/linkers-ltofor LTO internals
More from mohitmishra786/low-level-dev-skills
cmake
CMake build system skill for C/C++ projects. Use when writing or refactoring CMakeLists.txt, configuring out-of-source builds, selecting generators (Ninja, Make, VS), managing targets and dependencies with target_link_libraries, integrating external packages via find_package or FetchContent, enabling sanitizers, setting up toolchain files for cross-compilation, or exporting CMake packages. Activates on queries about CMakeLists.txt, cmake configure errors, target properties, install rules, CPack, or CMake presets.
590static-analysis
Static analysis skill for C/C++ codebases. Use when hardening code quality, triaging noisy builds, running clang-tidy, cppcheck, or scan-build, interpreting check categories, suppressing false positives, or integrating static analysis into CI. Activates on queries about clang-tidy checks, cppcheck, scan-build, compile_commands.json, code hardening, or static analysis warnings.
412llvm
LLVM IR and pass pipeline skill. Use when working directly with LLVM Intermediate Representation (IR), running opt passes, generating IR with llc, inspecting or writing LLVM IR for custom passes, or understanding how the LLVM backend lowers IR to assembly. Activates on queries about LLVM IR, opt, llc, llvm-dis, LLVM passes, IR transformations, or building LLVM-based tools.
364gdb
GDB debugger skill for C/C++ programs. Use when starting a GDB session, setting breakpoints, stepping through code, inspecting variables, debugging crashes, using reverse debugging (record/replay), remote debugging with gdbserver, or loading core dumps. Activates on queries about GDB commands, segfaults, hangs, watchpoints, conditional breakpoints, pretty-printers, Python GDB scripting, or multi-threaded debugging.
159linux-perf
Linux perf profiler skill for CPU performance analysis. Use when collecting sampling profiles with perf record, generating perf report, measuring hardware counters (cache misses, branch mispredicts, IPC), identifying hot functions, or feeding perf data into flamegraph tools. Activates on queries about perf, Linux performance counters, PMU events, off-CPU profiling, perf stat, perf annotate, or sampling-based profiling on Linux.
147core-dumps
Core dump analysis skill for production crash triage. Use when loading core files in GDB or LLDB, enabling core dump generation on Linux/macOS, mapping symbols with debuginfo or debuginfod, or extracting backtraces from crashes without re-running the program. Activates on queries about core files, ulimit, coredumpctl, debuginfod, crash triage, or analyzing segfaults from production binaries.
135