Home Knowledge Base InfiniBand

InfiniBand is a high-bandwidth, low-latency networking technology using RDMA for GPU cluster communication — providing 200-400 Gbps per port with microsecond latencies, InfiniBand is the interconnect of choice for large-scale AI training where multi-node communication efficiency determines scaling effectiveness.

What Is InfiniBand?

Why InfiniBand for AI

InfiniBand Generations

Speed Evolution:

Generation | Speed (per port) | Year
-----------|------------------|------
EDR        | 100 Gbps         | 2014
HDR        | 200 Gbps         | 2019
NDR        | 400 Gbps         | 2022
XDR        | 800 Gbps         | 2024
GDR        | 1600 Gbps        | Future

Comparison with Ethernet:

Aspect        | InfiniBand NDR | 400G Ethernet
--------------|----------------|---------------
Bandwidth     | 400 Gbps       | 400 Gbps
Latency       | ~1 μs          | ~10-50 μs
RDMA          | Native         | RoCE (extra)
Congestion    | Credit-based   | Drop-based
CPU overhead  | Minimal        | Higher
AI training   | Optimized      | Improving
Cost          | Higher         | Lower

RDMA Explained

How RDMA Works:

Traditional Network:
CPU → Copy to buffer → NIC → Network → NIC → Copy to buffer → CPU

RDMA:
GPU Memory → NIC → Network → NIC → GPU Memory
(CPU not involved, zero-copy)

GPU Direct RDMA:

┌─────────┐    NVLink    ┌─────────┐
│  GPU 0  │◄────────────►│  GPU 1  │
└────┬────┘              └────┬────┘
     │ PCIe                   │ PCIe
     ▼                        ▼
┌─────────┐  InfiniBand  ┌─────────┐
│   NIC   │◄────────────►│   NIC   │
└─────────┘   (RDMA)     └─────────┘

GPU Direct: GPU memory directly accessed by NIC
No CPU involvement, minimal latency

AI Training Infrastructure

Typical Large Cluster:

┌─────────────────────────────────────────────────────────┐
│                    Spine Switches                       │
│  (InfiniBand NDR, high-radix, non-blocking)            │
└─────────────────────────────────────────────────────────┘
           │         │         │         │
           ▼         ▼         ▼         ▼
     ┌─────────┐┌─────────┐┌─────────┐┌─────────┐
     │  Leaf   ││  Leaf   ││  Leaf   ││  Leaf   │
     │ Switch  ││ Switch  ││ Switch  ││ Switch  │
     └────┬────┘└────┬────┘└────┬────┘└────┬────┘
          │          │          │          │
    ┌─────┼─────┐   ...        ...        ...
    │     │     │
┌──────┐┌──────┐┌──────┐
│DGX 1 ││DGX 2 ││DGX 3 │  (8 H100s each)
└──────┘└──────┘└──────┘

NCCL with InfiniBand

import torch
import torch.distributed as dist
import os

# Set NCCL environment for InfiniBand
os.environ["NCCL_IB_DISABLE"] = "0"  # Enable InfiniBand
os.environ["NCCL_NET_GDR_LEVEL"] = "5"  # Enable GPUDirect

# Initialize distributed
dist.init_process_group(
    backend="nccl",
    init_method="env://",
)

# Training code - NCCL uses InfiniBand automatically
model = DistributedDataParallel(model)

Checking InfiniBand

# List InfiniBand devices
ibstat

# Show port status
ibstatus

# Check link speed
ibstat mlx5_0 | grep Rate

# Performance test
ib_write_bw -d mlx5_0

InfiniBand vs. Alternatives

Use Case              | Best Choice
----------------------|------------------
AI training (1000+ GPU) | InfiniBand NDR
Small clusters (<64 GPU)| Either (cost-dependent)
Cloud/flexibility     | Ethernet (easier)
Maximum performance   | InfiniBand
Budget constrained    | 400G Ethernet + RoCE

Cost Considerations

Component          | InfiniBand | 400G Ethernet
-------------------|------------|---------------
NIC/HCA            | $3-5K      | $1-2K
Switch (port)      | $500-1K    | $200-400
Total system cost  | Higher     | Lower
Performance/$      | Better at scale | Better for small

InfiniBand is the performance backbone of large-scale AI training — when training frontier models across thousands of GPUs, the efficiency of collective operations enabled by InfiniBand's low latency and RDMA capabilities directly determines how well training scales.

infinibandrdmanetworkhpcmellanoxclusterlatency

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.