memory-model
Memory Model
Purpose
Guide agents through C++ and Rust memory models: memory orderings, the happens-before relation, atomic operations, fences, and practical patterns for lock-free data structures.
Triggers
- "What is the C++ memory model?"
- "What memory order should I use for my atomic operation?"
- "What is the difference between acquire-release and seq_cst?"
- "How do I use std::atomic in C++?"
- "What is acquire/release in Rust atomics?"
- "How do I implement a lock-free queue?"
Workflow
1. Memory ordering overview
Modern CPUs and compilers reorder operations for performance. The memory model specifies what reorderings are allowed and how synchronisation is achieved.
Ordering strength (weakest to strongest):
Relaxed < Release/Acquire < AcqRel < SeqCst
Stronger ordering = more synchronization = more correct, but slower
Weaker ordering = fewer barriers = faster, but needs careful analysis
2. Memory orderings
| Order | C++ | Rust | What it means |
|---|---|---|---|
| Relaxed | memory_order_relaxed |
Ordering::Relaxed |
No ordering guarantee; just atomicity |
| Consume | memory_order_consume |
(use Acquire) | Data dependency ordering |
| Acquire | memory_order_acquire |
Ordering::Acquire |
This load sees all writes before the matching release |
| Release | memory_order_release |
Ordering::Release |
All writes before this store are visible to acquire |
| AcqRel | memory_order_acq_rel |
Ordering::AcqRel |
Both acquire and release on RMW ops |
| SeqCst | memory_order_seq_cst |
Ordering::SeqCst |
Total order across all seq_cst operations |
3. C++ std::atomic
#include <atomic>
#include <thread>
std::atomic<int> counter{0};
std::atomic<bool> ready{false};
// Producer thread
void producer() {
data = 42; // (1) write data
ready.store(true, std::memory_order_release); // (2) signal
}
// Consumer thread
void consumer() {
while (!ready.load(std::memory_order_acquire)); // (3) wait
assert(data == 42); // (4) guaranteed to see (1)
}
Acquire-release guarantees: if thread A does a release store to X, and thread B does an acquire load that sees A's value, then all writes by A before the release are visible to B after the acquire.
4. Choosing the right ordering
Use case?
├── Counter (just needs atomicity, order irrelevant) → Relaxed
├── Reference counting (decrement + final check) → AcqRel (dec), Acquire (load 0 check)
├── Publish data from one thread to another → Release (store), Acquire (load)
├── Mutual exclusion / mutex implementation → AcqRel / SeqCst
├── Lock-free queue multiple producers/consumers → SeqCst (safest to start)
└── Sequence number check (simple flag) → Release + Acquire
5. Common patterns
// Pattern 1: Spinlock
class Spinlock {
std::atomic_flag flag = ATOMIC_FLAG_INIT;
public:
void lock() {
while (flag.test_and_set(std::memory_order_acquire))
; // spin
}
void unlock() {
flag.clear(std::memory_order_release);
}
};
// Pattern 2: Reference counting
class RefCounted {
std::atomic<int> refcount{1};
public:
void addref() {
refcount.fetch_add(1, std::memory_order_relaxed); // only need atomicity
}
void release() {
if (refcount.fetch_sub(1, std::memory_order_acq_rel) == 1) {
// AcqRel ensures we see all writes from other releasers
delete this;
}
}
};
// Pattern 3: One-time initialisation
class LazyInit {
std::atomic<void*> ptr{nullptr};
std::mutex mtx;
public:
void* get() {
void* p = ptr.load(std::memory_order_acquire);
if (p == nullptr) {
std::lock_guard lock(mtx);
p = ptr.load(std::memory_order_relaxed);
if (p == nullptr) {
p = create();
ptr.store(p, std::memory_order_release);
}
}
return p;
}
};
6. Rust atomics
use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
use std::sync::Arc;
// Simple counter
let counter = Arc::new(AtomicUsize::new(0));
// Increment
counter.fetch_add(1, Ordering::Relaxed);
// Read
let val = counter.load(Ordering::Relaxed);
// Publish/subscribe pattern
static READY: AtomicBool = AtomicBool::new(false);
// Publisher thread
unsafe { DATA = 42; } // Write data
READY.store(true, Ordering::Release); // Signal
// Subscriber thread
while !READY.load(Ordering::Acquire) {}
let d = unsafe { DATA }; // Safe: guaranteed to see publisher's write
7. Fences
Fences provide ordering without an atomic operation on a specific variable:
// C++ fence — equivalent to a global memory barrier
std::atomic_thread_fence(std::memory_order_acquire); // Acquire fence
std::atomic_thread_fence(std::memory_order_release); // Release fence
// Typical use: multiple atomic writes then one fence
relaxed_atomic_a.store(1, std::memory_order_relaxed);
relaxed_atomic_b.store(2, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release); // barrier for all above
sentinel.store(true, std::memory_order_relaxed);
8. Common mistakes
| Mistake | Fix |
|---|---|
| Using Relaxed for publish/subscribe | Use Release on store, Acquire on load |
| Using SeqCst everywhere | Profile first; use weakest correct ordering |
| Forgetting that non-atomic loads are not atomic | All shared mutable data needs atomic or mutex |
Using volatile for thread safety in C++ |
volatile is not a memory ordering tool; use atomic |
| Assuming sequential consistency without SeqCst | Each platform has different default consistency |
For memory ordering rules and happens-before reference, see references/cpp-memory-ordering.md.
Related skills
- Use
skills/runtimes/sanitizers— TSan detects data races involving non-atomic accesses - Use
skills/rust/rust-sanitizers-mirifor detecting Rust memory ordering violations with Miri - Use
skills/low-level-programming/assembly-x86to understand generated fence instructions - Use
skills/debuggers/gdbfor debugging concurrent programs with thread inspection
More from mohitmishra786/low-level-dev-skills
cmake
CMake build system skill for C/C++ projects. Use when writing or refactoring CMakeLists.txt, configuring out-of-source builds, selecting generators (Ninja, Make, VS), managing targets and dependencies with target_link_libraries, integrating external packages via find_package or FetchContent, enabling sanitizers, setting up toolchain files for cross-compilation, or exporting CMake packages. Activates on queries about CMakeLists.txt, cmake configure errors, target properties, install rules, CPack, or CMake presets.
580static-analysis
Static analysis skill for C/C++ codebases. Use when hardening code quality, triaging noisy builds, running clang-tidy, cppcheck, or scan-build, interpreting check categories, suppressing false positives, or integrating static analysis into CI. Activates on queries about clang-tidy checks, cppcheck, scan-build, compile_commands.json, code hardening, or static analysis warnings.
407llvm
LLVM IR and pass pipeline skill. Use when working directly with LLVM Intermediate Representation (IR), running opt passes, generating IR with llc, inspecting or writing LLVM IR for custom passes, or understanding how the LLVM backend lowers IR to assembly. Activates on queries about LLVM IR, opt, llc, llvm-dis, LLVM passes, IR transformations, or building LLVM-based tools.
361gdb
GDB debugger skill for C/C++ programs. Use when starting a GDB session, setting breakpoints, stepping through code, inspecting variables, debugging crashes, using reverse debugging (record/replay), remote debugging with gdbserver, or loading core dumps. Activates on queries about GDB commands, segfaults, hangs, watchpoints, conditional breakpoints, pretty-printers, Python GDB scripting, or multi-threaded debugging.
153linux-perf
Linux perf profiler skill for CPU performance analysis. Use when collecting sampling profiles with perf record, generating perf report, measuring hardware counters (cache misses, branch mispredicts, IPC), identifying hot functions, or feeding perf data into flamegraph tools. Activates on queries about perf, Linux performance counters, PMU events, off-CPU profiling, perf stat, perf annotate, or sampling-based profiling on Linux.
142core-dumps
Core dump analysis skill for production crash triage. Use when loading core files in GDB or LLDB, enabling core dump generation on Linux/macOS, mapping symbols with debuginfo or debuginfod, or extracting backtraces from crashes without re-running the program. Activates on queries about core files, ulimit, coredumpctl, debuginfod, crash triage, or analyzing segfaults from production binaries.
131