Tukey's biweight loss

Tukey's biweight loss is an M-estimator loss function that completely and absolutely ignores errors exceeding a threshold — providing hard outlier rejection where the gradient vanishes for extreme deviations, enabling models to learn data patterns despite massive contamination from gross errors, the ultimate robustness for filtering erroneous data.

What Is Tukey's Biweight Loss?

Tukey's biweight (also called bisquare) is a redescending M-estimator from robust statistics that behaves like a quadratic penalty near zero, gradually decreases in influence for moderate errors, and completely rejects (zero gradient) for large errors beyond threshold c. This is the ultimate form of outlier rejection — unlike Huber and Cauchy where large errors still contribute some gradient, Tukey completely ignores them.

Mathematical Definition

Tukey biweight loss:
``ρ(x) = (c²/6) * [1 - (1 - (x/c)²)³] if |x| ≤ c (influence region) c²/6 if |x| > c (rejection region)

Weight function w(x) = (1 - (x/c)²)² if |x| ≤ c, else 0 Gradient: ∂ρ/∂x = x * (1 - (x/c)²)² if |x| ≤ c, else 0`

Three distinct regions: 1. |x| < c: Quadratic-like behavior with influence gradually decreasing 2. |x| = c: Transition point where influence reaches maximum 3. |x| > c: Gradient exactly zero — complete outlier rejection

Why Tukey's Biweight Matters

- Hard Rejection: Errors beyond threshold completely ignored — maximum possible robustness - Redescending Property: Influence increases then decreases with error magnitude - Classical Foundation: Developed by John Tukey, proven robust statistics researcher - RANSAC-Like: Functions similar to RANSAC consensus but through soft downweighting - Parameter Control: Threshold c allows tuning how to classify outliers - Justifiable: Works even with 50%+ contamination (breakdown point = 0.5)

The Redescending Property

Unlike Huber and Cauchy where influence monotonically increases, Tukey's biweight reaches maximum influence at error = 0.3c, then decreases, reaching zero at c:

`Influence vs Error Magnitude: | | ╱╲ | ╱ ╲ | ╱ ╲___ | ╱ (zero influence beyond c) |___________|____ 0 c`

Comparison: Outlier Rejection Approaches

| Error = 5c | MSE | Huber | Cauchy | Tukey | |-----------|-----|-------|--------|-------| | Loss | (5c)² = 25c² | 5c * c = 5c² | c² ln(26) ≈ 3.3c² | c²/6 ≈ 0.167c² | | Influence | Extreme | High | Moderate | Zero | | Gradient Magnitude | 10c | c | Small | Exactly 0 |

Parameter Selection

- c = 1.0: Standard default - c = 4.685 * σ: Recommended for Gaussian noise with std σ (breakdown point 50%) - Strategy for tuning: - Compute residual median absolute deviation (MAD) - Set c = 4.685 * MAD - Or cross-validate on validation set

Implementation

PyTorch:`python def tukey_biweight_loss(predictions, targets, c=1.0): errors = (predictions - targets) mask = (errors.abs() <= c).float() term = 1 - (errors / c) ** 2 loss = (c2 / 6) mask (1 - term 3) return loss.mean()`

NumPy (for offline analysis):`python import numpy as np

def tukey_biweight(x, c=1.0): mask = np.abs(x) <= c loss = np.zeros_like(x, dtype=float) loss[mask] = (c2/6) (1 - (1 - (x[mask]/c)2)*3) loss[~mask] = c**2/6 return loss``

When to Use Tukey's Biweight

- Gross Outliers: Data contains obviously wrong values (sensor failures, data entry errors)
- Contaminated Data: Unknown large percentage of corrupted observations
- Automatic Outlier Detection: Threshold enables identifying rejected samples
- Robust Fitting: Least squares fitting that ignores bad leverage points
- Certified Protection: 50% breakdown point guarantees robustness
- High-Dimensional: More robust than alternatives in high-dimensional settings

Practical Applications

Robust Least Squares: Fitting lines, planes, curves to data with gross errors — automatic leverage point rejection enables fitting despite bad measurements.

Astronomical Data: Detecting planets from stellar brightness where cosmic rays and instrumental glitches contaminate significant portion of measurements; Tukey enables using all data while ignoring artifact-corrupted observations.

Survey Data: Statistical analysis of survey responses with occasional fraudulent/nonsense entries; Tukey automatically downweights or ignores impossible values without manual cleaning.

Geospatial Analysis: GPS trajectories with occasional wild spikes (multipath, jamming); Tukey filters outlier positions while preserving real movements.

Quality Control: Manufacturing processes flagging and ignoring equipment malfunctions while maintaining statistical model of normal operations.

Tukey's biweight is the maximum-robustness outlier elimination — hard rejection for gross errors enables learning from contaminated data that would destroy other methods, providing theoretical guarantee of robustness even with 50% contamination.

Want to learn more?