build-acceleration
Build Acceleration
Purpose
Guide agents through reducing C/C++ build times using caching (ccache/sccache), distributed compilation (distcc), unity/jumbo builds, precompiled headers, split-DWARF for faster linking, and include pruning with IWYU.
Triggers
- "My C++ build is too slow — how do I speed it up?"
- "How do I set up ccache / sccache?"
- "How do precompiled headers work with CMake?"
- "How do I set up distributed compilation with distcc?"
- "How do I reduce link times with split-DWARF?"
- "How do I find which headers are slowing down compilation?"
Workflow
1. Diagnose the bottleneck first
# Time the full build
time cmake --build build -j$(nproc)
# Find the slowest TUs (CMake ≥3.16 with --profiling-output)
cmake -S . -B build -DCMAKE_CXX_FLAGS="-ftime-report"
cmake --build build 2>&1 | grep "Total" | sort -t: -k2 -rn | head -20
# Ninja build timings (use ninja -j1 for serial timing)
ninja -C build -j1 2>&1 | grep "^\[" | sort -t" " -k2 -rn | head -20
2. ccache — compiler cache
# Install
apt-get install ccache # Ubuntu/Debian
brew install ccache # macOS
# Check hit rate
ccache -s
# Configure cache size (default 5GB)
ccache -M 20G
# Invalidate cache if needed
ccache -C
CMake integration (recommended over prefix hacks):
# CMakeLists.txt
find_program(CCACHE_PROGRAM ccache)
if(CCACHE_PROGRAM)
set(CMAKE_C_COMPILER_LAUNCHER ${CCACHE_PROGRAM})
set(CMAKE_CXX_COMPILER_LAUNCHER ${CCACHE_PROGRAM})
endif()
Key ~/.config/ccache/ccache.conf options:
max_size = 20G
compression = true
compression_level = 6
# For CI: share cache across jobs
cache_dir = /shared/ccache
3. sccache — cloud-compatible cache (Rust, C/C++)
cargo install sccache
# Or: brew install sccache
# Set as compiler launcher
export RUSTC_WRAPPER=sccache # for Rust
export CMAKE_C_COMPILER_LAUNCHER=sccache # for CMake
# With S3 backend
export SCCACHE_BUCKET=my-build-cache
export SCCACHE_REGION=us-east-1
sccache --start-server
sccache --show-stats
4. Precompiled headers (PCH)
PCH compiles a large header once and reuses the binary form.
# CMake ≥3.16 native PCH support
target_precompile_headers(mylib PRIVATE
<vector>
<string>
<unordered_map>
"myproject/common.h"
)
# Share PCH across targets (avoids recompilation)
target_precompile_headers(myapp REUSE_FROM mylib)
// Traditional: stdafx.h / pch.h approach
// All TUs include pch.h as the very first include
// pch.h includes heavy system headers
#pragma once
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
PCH is most effective when headers are large and stable (STL, Boost, Qt). Avoid PCH for frequently-changing project headers.
5. Unity / jumbo builds
Combine multiple .cpp files into one TU to reduce header parsing overhead and improve inlining.
# CMake ≥3.16 unity build
set_target_properties(mylib PROPERTIES UNITY_BUILD ON)
# Control batch size (default 8 files per unity TU)
set_target_properties(mylib PROPERTIES UNITY_BUILD_BATCH_SIZE 16)
# Exclude specific files from unity (e.g., if they have ODR issues)
set_source_files_properties(problem.cpp PROPERTIES SKIP_UNITY_BUILD_INCLUSION ON)
Manual unity file:
// unity_build.cpp
#include "module_a.cpp"
#include "module_b.cpp"
#include "module_c.cpp"
Watch out for: anonymous namespaces (each TU has its own), using namespace in headers, duplicate static variables.
6. split-DWARF — reduce link time
Split DWARF puts debug info in .dwo sidecar files, dramatically reducing what the linker must process.
# GCC / Clang
gcc -g -gsplit-dwarf -o prog main.c
# CMake global
add_compile_options(-gsplit-dwarf)
# Combine .dwo files for distribution (optional)
dwp -o prog.dwp prog # GNU dwp tool
Pair with --gdb-index for faster GDB startup:
gcc -g -gsplit-dwarf -Wl,--gdb-index -o prog main.c
Link time comparison (large project, typical): -g full DWARF ~4×–6× longer link vs -gsplit-dwarf.
7. distcc — distributed compilation
# Install on all machines
apt-get install distcc
# Start daemon on worker machines
distccd --daemon --allow 192.168.1.0/24 --jobs 8
# Client: set DISTCC_HOSTS
export DISTCC_HOSTS="localhost/4 worker1/8 worker2/8"
make -j20 CC="distcc gcc"
# CMake integration
set(CMAKE_C_COMPILER_LAUNCHER distcc)
set(CMAKE_CXX_COMPILER_LAUNCHER distcc)
Stack with ccache: CC="ccache distcc gcc" — ccache checks local cache first, falls back to distcc.
8. Include pruning with IWYU
# Install
apt-get install iwyu
# Run via CMake
cmake -S . -B build -DCMAKE_CXX_INCLUDE_WHAT_YOU_USE=iwyu
cmake --build build 2>&1 | tee iwyu.log
# Apply fixes automatically
fix_include < iwyu.log --nosafe_headers
See skills/build-systems/include-what-you-use for full IWYU workflow.
For ccache configuration options, see references/ccache-config.md.
Related skills
- Use
skills/build-systems/cmakefor CMake project structure - Use
skills/build-systems/include-what-you-usefor IWYU header pruning - Use
skills/rust/rust-build-timesfor Rust-specific build acceleration - Use
skills/debuggers/dwarf-debug-formatfor split-DWARF internals
More from mohitmishra786/low-level-dev-skills
cmake
CMake build system skill for C/C++ projects. Use when writing or refactoring CMakeLists.txt, configuring out-of-source builds, selecting generators (Ninja, Make, VS), managing targets and dependencies with target_link_libraries, integrating external packages via find_package or FetchContent, enabling sanitizers, setting up toolchain files for cross-compilation, or exporting CMake packages. Activates on queries about CMakeLists.txt, cmake configure errors, target properties, install rules, CPack, or CMake presets.
586static-analysis
Static analysis skill for C/C++ codebases. Use when hardening code quality, triaging noisy builds, running clang-tidy, cppcheck, or scan-build, interpreting check categories, suppressing false positives, or integrating static analysis into CI. Activates on queries about clang-tidy checks, cppcheck, scan-build, compile_commands.json, code hardening, or static analysis warnings.
409llvm
LLVM IR and pass pipeline skill. Use when working directly with LLVM Intermediate Representation (IR), running opt passes, generating IR with llc, inspecting or writing LLVM IR for custom passes, or understanding how the LLVM backend lowers IR to assembly. Activates on queries about LLVM IR, opt, llc, llvm-dis, LLVM passes, IR transformations, or building LLVM-based tools.
362gdb
GDB debugger skill for C/C++ programs. Use when starting a GDB session, setting breakpoints, stepping through code, inspecting variables, debugging crashes, using reverse debugging (record/replay), remote debugging with gdbserver, or loading core dumps. Activates on queries about GDB commands, segfaults, hangs, watchpoints, conditional breakpoints, pretty-printers, Python GDB scripting, or multi-threaded debugging.
156linux-perf
Linux perf profiler skill for CPU performance analysis. Use when collecting sampling profiles with perf record, generating perf report, measuring hardware counters (cache misses, branch mispredicts, IPC), identifying hot functions, or feeding perf data into flamegraph tools. Activates on queries about perf, Linux performance counters, PMU events, off-CPU profiling, perf stat, perf annotate, or sampling-based profiling on Linux.
145core-dumps
Core dump analysis skill for production crash triage. Use when loading core files in GDB or LLDB, enabling core dump generation on Linux/macOS, mapping symbols with debuginfo or debuginfod, or extracting backtraces from crashes without re-running the program. Activates on queries about core files, ulimit, coredumpctl, debuginfod, crash triage, or analyzing segfaults from production binaries.
132