Home Knowledge Base Huber loss

Huber loss is a robust loss function that combines the best properties of Mean Squared Error (MSE) and Mean Absolute Error (MAE) — perfectly suited for regression problems where data contains outliers, combining smooth gradients near zero with bounded growth for large errors, making it the standard choice for outlier-resistant deep learning and reinforcement learning applications.

What Is Huber Loss?

Huber loss is designed to be less sensitive to outliers in data compared to MSE while maintaining the smoothness advantages of squared error near zero. The loss function transitions smoothly from quadratic behavior for small errors to linear behavior for large errors, controlled by a delta parameter δ that determines where this transition occurs. For errors smaller than δ, Huber loss behaves like MSE (quadratic), and for errors larger than δ, it behaves like MAE (linear).

Formula and Mathematical Definition

The mathematical definition of Huber loss is:

L(y, ŷ) = 
  0.5 * (y - ŷ)²           if |y - ŷ| ≤ δ  (quadratic region)
  δ * |y - ŷ| - 0.5 * δ²   if |y - ŷ| > δ  (linear region)

Where y is the true value, ŷ is the prediction, and δ is the transition parameter. The gradient is:

Why Huber Loss Matters

Huber vs MSE vs MAE Comparison

AspectMSEMAEHuber
Small errorsQuadratic penaltyLinear penaltyQuadratic
Large errorsExplodesLinearLinear (bounded)
Gradient at 02(y-ŷ) → 0 smoothlyUndefined (±1)Smooth
Outlier sensitivityVery highModerateLow
OptimizationSmooth, stableLess smoothVery smooth
Use caseClean dataRobustNoisy data

Implementation in Major Frameworks

PyTorch implementation:

import torch.nn.functional as F

# Using built-in Huber loss (δ=1.0 default)
loss = F.smooth_l1_loss(predictions, targets)

# Custom delta parameter
loss = F.huber_loss(predictions, targets, delta=1.0)

# Also called Smooth L1
criterion = torch.nn.SmoothL1Loss(beta=1.0)
loss = criterion(predictions, targets)

TensorFlow/Keras:

import tensorflow as tf

loss = tf.keras.losses.Huber(delta=1.0)
compiled_model.compile(loss=loss, optimizer='adam')

When to Use Huber Loss

Tuning the Delta Parameter δ

Relationship to Other Robust Losses

Practical Applications

Computer Vision: YOLO, Faster R-CNN bounding box regression. Smooth L1 prevents large box misalignments from dominating gradients, improving detection of small and large objects equally.

Reinforcement Learning: Q-learning in DQN and Double DQN. Handles exploration-induced very large TD errors without destabilizing value function learning.

Time Series: Stock price and sensor data prediction. Accommodates occasional sensor spikes or market anomalies without corrupting model.

Geometry and Pose: 3D pose estimation and 6D object pose where scale differs dramatically between translation and rotation components.

Huber loss is the practical choice for robust regression with noise — universally applicable across domains with outlier-contaminated data, providing the ideal balance between MSE's optimization efficiency and MAE's outlier robustness.

huber losssmooth l1robust regression

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.