Home Knowledge Base GPU Kernel Fusion

GPU Kernel Fusion is the optimization technique of combining multiple sequential GPU kernel launches into a single kernel — eliminating kernel launch overhead, reducing global memory round-trips for intermediate results, and increasing arithmetic intensity by keeping data in registers or shared memory across combined operations.

Motivation:

Fusion Categories:

Fusion Compilers and Frameworks:

Fusion Limitations:

GPU kernel fusion is arguably the most impactful compiler optimization for deep learning workloads — frameworks like PyTorch 2.0 (TorchInductor) and JAX (XLA) achieve 1.5-3× end-to-end training speedup primarily through aggressive kernel fusion, making it the default optimization strategy for modern deep learning compilers.

gpu kernel fusion optimizationoperator fusion deep learningkernel launch overheadfused kernel computationfusion compiler optimization

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.