Ray | ChipFoundryServices

Home› Knowledge Base› Ray

Systolic arrays are specialized hardware architectures that efficiently compute matrix multiplications by flowing data rhythmically through a grid of processing elements, with Google's TPUs being the most prominent implementation for AI acceleration. Architecture: 2D array of processing elements (PEs), each performing multiply-accumulate; data flows in from edges and propagates through array; results exit from opposite edges. Data flow pattern: one matrix's rows flow horizontally, other matrix's columns flow vertically; each PE computes one output element's partial sum. TPU implementation: Google TPU v1 had 256×256 systolic array; data flows in waves (systolic like heartbeat); no separate control for each PE—simplicity enables density. Efficiency: once data is in the array, it is reused multiple times before exiting; high arithmetic intensity; minimizes memory bandwidth requirements. Comparison to GPUs: systolic arrays more specialized but more efficient for dense matrix operations; GPUs more flexible for varied computation patterns. Matrix multiply importance: core of neural network inference and training (linear layers, attention); optimizing this operation provides largest gains. Modern implementations: TPU, AWS Inferentia/Trainium, Intel Nervana, and others use systolic or systolic-like architectures. Systolic arrays exemplify domain-specific architecture design for matrix-heavy AI workloads.

systolic arraymatrix multiply

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.

🔍 Search Topics 💬 Ask CFSGPT 📚 Browse All