1-Bit SGD

1-Bit SGD is a gradient quantization method that compresses each gradient component to a single bit (its sign) — transmitting only +1 or -1 for each gradient element, achieving 32× compression compared to 32-bit floating point, with error feedback to maintain convergence.

1-Bit SGD Algorithm

- Sign: $hat{g}_i = ext{sign}(g_i + e_i)$ — quantize to +1 or -1.
- Scale: Multiply by the mean absolute gradient magnitude for rescaling.
- Error Feedback: $e_i leftarrow (g_i + e_i) - hat{g}_i$ — accumulate quantization error.
- Communication: 1 bit per gradient component + 1 scalar (mean magnitude) per layer.

Why It Matters

- 32× Compression: Reduces gradient communication by 32× compared to full precision.
- Error Feedback Essential: Without error feedback, 1-bit SGD diverges. With it, convergence is preserved.
- Microsoft: Originally proposed by Microsoft Research — successfully scaled speech recognition training.

1-Bit SGD is extreme gradient quantization — compressing gradients to their signs for massive communication savings with error feedback for convergence.

Want to learn more?