Dropout — a regularization technique that randomly deactivates neurons during training, forcing the network to learn redundant representations and reducing overfitting.
How It Works
- During training: Each neuron is set to zero with probability $p$ (typically 0.1–0.5)
- During inference: All neurons are active, but outputs are scaled by $(1-p)$ to compensate
- Effect: The network can't rely on any single neuron — must learn distributed, robust features
Why It Works
- Approximate ensemble: Each training step uses a different sub-network. Dropout is like training $2^n$ networks simultaneously
- Prevents co-adaptation: Neurons can't learn to depend on specific partners
Variants
- Standard Dropout: Applied to fully connected layers
- Spatial Dropout (Dropout2D): Drops entire feature maps in CNNs (more effective than per-pixel)
- DropConnect: Drops weights instead of activations
- DropPath/Stochastic Depth: Drops entire residual blocks (used in Vision Transformers)
Practical Tips
- Typically $p=0.5$ for hidden layers, $p=0.1$–$0.2$ for input layers
- Don't use with Batch Normalization (they conflict — BN already regularizes)
- Always disable during evaluation: model.eval() in PyTorch
Dropout remains one of the most effective and widely-used regularization techniques despite its simplicity.