Home Knowledge Base Parallel Prefix Sum (Scan)

Parallel Prefix Sum (Scan) is the foundational parallel algorithm that computes all prefix sums of an input array in O(log N) steps — transforming [a₀, a₁, a₂, ...] into [a₀, a₀+a₁, a₀+a₁+a₂, ...] (inclusive scan) — and serving as the core building block for parallel sorting, stream compaction, radix sort, histogram computation, and memory allocation in GPU computing.

Why Scan Is Fundamental

Scan converts a seemingly sequential operation (running sum) into a parallel operation. More importantly, scan solves the "parallel addressing" problem: when each thread produces a variable number of outputs and needs to know where in the output array to write them, an exclusive scan of the output counts gives each thread its starting write position. This makes scan the gateway to parallelizing any irregular, data-dependent computation.

Algorithm Variants

1. Up-Sweep (Reduce): Build a reduction tree bottom-up, computing partial sums at each level. O(N) work in log2(N) steps. 2. Down-Sweep: Propagate partial sums back down the tree to compute all prefix sums. O(N) work in log2(N) steps. Total work: O(N) — optimal. Total steps: O(log N). This is the standard GPU scan algorithm.

GPU Implementation (Large Arrays)

For arrays larger than one thread block: 1. Each thread block scans its local chunk (e.g., 1024 elements) and writes the block's total sum to an auxiliary array. 2. A second kernel scans the auxiliary array (block sums). 3. A third kernel adds each block's prefix to all elements in that block.

This three-kernel approach handles arrays of any size. CUB and Thrust libraries provide optimized implementations that achieve >90% of peak memory bandwidth on modern GPUs.

Applications

Parallel Prefix Sum is the Swiss Army knife of parallel algorithms — appearing inside nearly every non-trivial GPU algorithm as the mechanism that converts irregular, data-dependent parallelism into efficient, conflict-free parallel execution.

parallel prefix sum scaninclusive exclusive scangpu scan algorithmblelloch scanwork efficient scan

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.