Lock-Free Data Structures

Lock-Free Data Structures are concurrent data structures that guarantee system-wide progress without using mutual exclusion locks, relying instead on atomic hardware primitives (Compare-And-Swap, Load-Linked/Store-Conditional, Fetch-And-Add) to coordinate access — eliminating the deadlock, priority inversion, and convoying problems inherent in lock-based designs while providing superior scalability on many-core systems.

Traditional lock-based data structures serialize all access through critical sections: when one thread holds the lock, all other threads block regardless of whether they conflict. Lock-free structures allow concurrent operations to proceed independently, synchronizing only at the point of actual conflict.

Progress Guarantees:

| Guarantee | Definition | Practical Implication |
|-----------|-----------|----------------------|
| Obstruction-free | Single thread in isolation completes | Weakest; may livelock |
| Lock-free | At least one thread makes progress | System-wide progress guaranteed |
| Wait-free | Every thread completes in bounded steps | Strongest; individual progress guaranteed |

Compare-And-Swap (CAS): The workhorse atomic primitive: CAS(address, expected, desired) atomically checks if *address == expected and, if so, writes desired. If not, it returns the current value. Lock-free algorithms use CAS in retry loops: read current state, compute new state, CAS to install — if CAS fails (another thread modified state), re-read and retry. This is the foundation of lock-free stacks (Treiber stack), queues (Michael-Scott queue), and hash tables.

The ABA Problem: CAS cannot distinguish between "value was A the entire time" and "value changed from A to B and back to A." This causes correctness bugs in pointer-based structures where a freed and reallocated node reappears at the same address. Solutions: tagged pointers (embed a version counter in the pointer — ABA changes the tag even if the pointer recycles), hazard pointers (defer memory reclamation until no thread holds a reference), and epoch-based reclamation (free memory only when all threads have passed a global epoch boundary).

Lock-Free Queue (Michael-Scott): The most widely-deployed lock-free queue uses a linked list with separate head and tail pointers. Enqueue: allocate node, CAS tail->next from NULL to new node, CAS tail to new node. Dequeue: CAS head to head->next, return value. Helping mechanism: if a thread observes that tail->next is non-NULL but tail hasn't advanced, it helps advance tail — ensuring system-wide progress even if the enqueuing thread stalls.

Memory Ordering Considerations: Lock-free algorithms require careful memory ordering specification: acquire semantics (subsequent reads/writes cannot be reordered before this load), release semantics (prior reads/writes cannot be reordered after this store), and sequentially-consistent (total ordering across all threads). C++11/C11 atomics provide these ordering levels. Using weaker ordering (acquire/release instead of sequential consistency) can improve performance by 2-5x on architectures with relaxed memory models (ARM, POWER).

Lock-free data structures represent the gold standard for concurrent programming on modern many-core hardware — they replace the coarse serialization of locks with fine-grained atomic coordination, enabling scalability that lock-based designs fundamentally cannot achieve as core counts continue to grow.

Want to learn more?