Home Knowledge Base Butterfly All-Reduce Algorithm

Butterfly All-Reduce Algorithm is the recursive communication pattern based on hypercube topology where processes exchange and reduce data in log(N) steps by communicating with partners at exponentially increasing distances — achieving both bandwidth optimality (like ring) and logarithmic latency (like tree) for power-of-2 process counts, making it the theoretically optimal all-reduce algorithm when process count constraints are satisfied.

Algorithm Mechanics:

Bandwidth and Latency Optimality:

Implementation Challenges:

Rabenseifner Algorithm (Practical Butterfly):

Comparison with Ring and Tree:

Optimization Techniques:

Use Cases:

Performance Characteristics:

Butterfly all-reduce is the theoretically optimal algorithm that proves efficient all-reduce is possible — achieving both bandwidth and latency optimality simultaneously, it represents the performance target that practical algorithms strive for, and in its Rabenseifner variant, provides the best all-around performance for medium-sized messages in MPI-based scientific computing.

butterfly allreduce algorithmrecursive halving doublingbutterfly network topologybutterfly allreduce bandwidthpower of two allreduce

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.