Memory Consistency Models are formal specifications that define the order in which memory operations (loads and stores) performed by one processor become visible to other processors in a shared-memory multiprocessor system — choosing the right consistency model is critical because it determines both the correctness guarantees available to programmers and the hardware/compiler optimization opportunities.
Sequential Consistency (SC):
- Definition: the result of any execution is the same as if operations of all processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program — the strongest and most intuitive model
- Implications: all processors observe stores in the same total order, no store can appear to be reordered before a prior load or store from the same processor — severely limits hardware optimization
- Performance Cost: prevents store buffers, write combining, and out-of-order memory access — modern processors would lose 30-50% performance under strict SC
- Historical Significance: defined by Lamport (1979), serves as the reference model against which all relaxed models are compared
Total Store Order (TSO):
- Relaxation: allows a processor's own stores to be buffered and read by subsequent loads before becoming globally visible — store-to-load reordering is permitted (FIFO store buffer)
- x86 Implementation: Intel and AMD processors implement TSO (with minor exceptions) — stores are ordered with respect to each other and loads see the most recent store from the local store buffer
- Store Buffer Forwarding: a load can read a value from the local store buffer before it's written to cache — this is the only reordering permitted under TSO
- Programming Impact: most intuitive algorithms work correctly under TSO without explicit fences — only algorithms relying on store-to-load ordering (like Dekker's algorithm) require MFENCE instructions
Relaxed Consistency Models:
- Weak Ordering: divides memory operations into ordinary and synchronization operations — ordinary operations can be freely reordered, synchronization operations enforce ordering barriers
- Release Consistency (RC): refines weak ordering by distinguishing acquire (lock) and release (unlock) operations — acquires prevent subsequent operations from moving before them, releases prevent prior operations from moving after them
- ARM and POWER Models: extremely relaxed — allow store-to-store, load-to-load, and load-to-store reordering in addition to store-to-load — require explicit barrier instructions (dmb, lwsync) for ordering
- Alpha Model: historically the most relaxed — even allowed dependent loads to be reordered (value speculation), requiring explicit memory barriers between a pointer load and its dereference
Memory Fences and Barriers:
- Full Fence (MFENCE on x86): prevents all reordering across the fence — loads and stores before the fence complete before any loads or stores after the fence begin
- Store Fence (SFENCE): ensures all prior stores are globally visible before subsequent stores — used with non-temporal stores that bypass cache
- Load Fence (LFENCE): ensures all prior loads complete before subsequent loads execute — rarely needed on x86 (TSO already orders loads) but critical on ARM/POWER
- Acquire/Release Semantics: one-directional barriers — acquire prevents downward movement, release prevents upward movement — sufficient for most synchronization patterns and cheaper than full fences
Language-Level Memory Models:
- C++11/C11 Memory Model: defines memory_order_seq_cst (default), memory_order_acquire, memory_order_release, memory_order_relaxed, and memory_order_acq_rel — portable across architectures
- Java Memory Model (JMM): volatile reads/writes provide acquire/release semantics, final fields are safely published after construction — happens-before relationship defines visibility guarantees
- Compiler Barriers: prevent compiler reordering without emitting hardware fence instructions — asm volatile("" ::: "memory") in GCC, std::atomic_signal_fence in C++
- Data Race Freedom (DRF): if a program is correctly synchronized (no data races), it behaves as if executed under sequential consistency — the DRF guarantee is the foundation of modern language memory models
Correctly understanding memory consistency is essential for writing portable parallel code — a program that works on x86 (TSO) may fail on ARM (relaxed) if it relies on implicit ordering guarantees that don't exist on weaker architectures.