Cauchy loss (also called Lorentzian loss) is a highly robust loss function based on the Cauchy probability distribution ā providing extreme resistance to outliers and anomalies through bounded influence of any error magnitude, making it ideal for datasets with heavy-tailed noise, extreme value pollution, or unknown outlier distributions.
What Is Cauchy Loss?
Cauchy loss is derived from the negative log-likelihood of the Cauchy probability distribution, a theoretically-grounded choice for systems where even very large errors should have bounded influence on parameter updates. Unlike MSE where large errors dominate (quadratic), and unlike Huber where large errors still grow linearly, Cauchy loss grows logarithmically ā any error, no matter how large, contributes a bounded amount to the gradient.
Mathematical Definition
Cauchy loss formula:
``
L(x) = (c²/2) * log(1 + (x/c)²)
Where:
- x = error (y - Å·)
- c = scale parameter controlling sensitivity
`
Key properties:
- As x ā 0: L(x) ā x²/2 (quadratic, like MSE)
- As x ā ā: L(x) ā c² * log(|x|/c) (logarithmic growth)
- Gradient: āL/āx = (x)/(1 + (x/c)²) ā bounded by ±c/2
- Hessian: Positive definite everywhere (convex)
Why Cauchy Loss Matters
- Extreme Outliers OK: Outliers with magnitude 10Ć, 100Ć, or 1000Ć typical errors still contribute bounded gradients
- Heavy-Tailed Distributions: Matches distributions with occasional extreme events (Pareto, Zipf)
- No Explosive Gradients: Unlike MSE, impossible to overflow numerical precision
- Theoretically Grounded: Maximum likelihood estimator for Cauchy-distributed errors
- Robust Statistics: Classical choice in robust statistics literature
- Stability: Critical for adversarial robustness and noisy sensor data
Cauchy vs Huber vs MSE: Outlier Sensitivity
| Error Magnitude | MSE | Huber (Ī“=1) | Cauchy (c=1) |
|-----------------|-----|-------------|-------------|
| 0.5 | 0.125 | 0.125 | 0.110 |
| 1.0 | 1.0 | 1.0 | 0.347 |
| 2.0 | 4.0 | 1.5 | 0.693 |
| 5.0 | 25.0 | 4.5 | 1.435 |
| 10.0 | 100.0 | 9.5 | 2.137 |
| 100.0 | 10000.0 | 99.5 | 4.615 |
Cauchy remains bounded while Huber and MSE grow unboundedly.
Tuning the Scale Parameter c
- c = 0.5: More sensitive, smaller errors emphasized
- c = 1.0: Balanced default choice
- c = 2.0: More tolerant, extreme outliers have less influence
- Strategy: Set c to expected noise level in residuals; larger c for noisier data
Implementation
PyTorch:
`python`
def cauchy_loss(predictions, targets, c=1.0):
errors = predictions - targets
loss = (c2 / 2) * torch.log(1 + (errors / c) 2)
return loss.mean()
JAX:
`python
import jax.numpy as jnp
def cauchy_loss(pred, target, c=1.0):
error = pred - target
return jnp.mean((c2 / 2) * jnp.log(1 + (error / c)2))
``
When to Use Cauchy Loss
- Heavy-Tailed Noise: Data follows distribution with occasional extreme events
- Contaminated Data: Unknown percentage of outliers or measurement errors
- Adversarial Setting: Need robustness to malicious extreme perturbations
- Astronomical Data: Dealing with rare transient events and artifacts
- Sensor Networks: Occasional sensor malfunction producing impossibly large readings
- Financial Data: Stock prices with market shocks and circuit-breaker events
- Biological Data: Occasional experimental artifacts or setup failures
Comparison to Alternatives
| Loss | Robustness | Convexity | Interpretability | Speed |
|------|-----------|-----------|------------------|-------|
| MSE | None | Convex | Simple | Fast |
| Huber | Moderate | Convex | Clear cutoff | Fast |
| Cauchy | Extreme | Convex | Theory-based | Fast |
| Tukey | Very High | Non-convex | Hard rejection | Slower |
Practical Applications
3D Computer Vision: Structure-from-motion where occasional faulty matches cause nonsensical depth estimates; Cauchy loss permits robust triangulation even with erroneous correspondence matches.
Depth Estimation: Monocular depth prediction where rare images contain strong artifacts (transparency, extreme lighting); Cauchy prevents outlier frames from corrupting learned depth relationships.
LiDAR Processing: Autonomous vehicles ignoring occasional reflector artifacts or multi-bounce returns that spoil density-based matching.
Audio Processing: Noise robustness in speech enhancement where occasional impulse noise spikes shouldn't destroy learned acoustic models.
Cauchy loss is the ultimate outlier-robust loss ā providing theoretical grounding and practical robustness for datasets where extreme deviations must be tolerated, enabling principled learning from contaminated, heavy-tailed, or adversarially-perturbed data.