Systolic arrays are specialized hardware architectures that efficiently compute matrix multiplications by flowing data rhythmically through a grid of processing elements, with Google's TPUs being the most prominent implementation for AI acceleration. Architecture: 2D array of processing elements (PEs), each performing multiply-accumulate; data flows in from edges and propagates through array; results exit from opposite edges. Data flow pattern: one matrix's rows flow horizontally, other matrix's columns flow vertically; each PE computes one output element's partial sum. TPU implementation: Google TPU v1 had 256ร256 systolic array; data flows in waves (systolic like heartbeat); no separate control for each PEโsimplicity enables density. Efficiency: once data is in the array, it is reused multiple times before exiting; high arithmetic intensity; minimizes memory bandwidth requirements. Comparison to GPUs: systolic arrays more specialized but more efficient for dense matrix operations; GPUs more flexible for varied computation patterns. Matrix multiply importance: core of neural network inference and training (linear layers, attention); optimizing this operation provides largest gains. Modern implementations: TPU, AWS Inferentia/Trainium, Intel Nervana, and others use systolic or systolic-like architectures. Systolic arrays exemplify domain-specific architecture design for matrix-heavy AI workloads.
Explore 500+ Semiconductor & AI Topics
From EUV lithography to CUDA optimization โ search the full knowledge base or chat with our AI assistant.