Quantile loss

Keywords: quantile loss,pinball loss,prediction interval

Quantile loss (also called pinball loss) is an asymmetric loss function used to train models that predict specific quantiles of a conditional distribution — rather than the mean — enabling the construction of calibrated prediction intervals that quantify uncertainty, by penalizing underprediction and overprediction at different rates determined by the quantile parameter τ, with applications in demand forecasting, risk assessment, weather prediction, and any domain requiring interpretable confidence bounds alongside point predictions.

Mathematical Definition

For a target quantile τ ∈ (0, 1), the quantile loss for prediction ŷ and true value y is:

L_τ(y, ŷ) = τ · max(y − ŷ, 0) + (1 − τ) · max(ŷ − y, 0)

Equivalently:
- If y ≥ ŷ (underprediction): L_τ = τ · (y − ŷ) — penalize missing the true value by factor τ
- If y < ŷ (overprediction): L_τ = (1 − τ) · (ŷ − y) — penalize exceeding the true value by factor (1 − τ)

Calibration Property

The remarkable property of quantile loss: minimizing E[L_τ(y, ŷ)] over all functions ŷ(x) yields the conditional τ-quantile Q_τ(y | x) — the value below which a fraction τ of outcomes fall.

For τ = 0.5: The loss is symmetric (τ = 1-τ = 0.5), and minimization yields the conditional median — the value where 50% of outcomes are below.

For τ = 0.9: The loss penalizes underprediction 9× more than overprediction (τ/(1-τ) = 9:1). The optimizer is pushed to predict high, landing at the 90th percentile.

For τ = 0.1: The loss penalizes overprediction 9× more than underprediction. The optimizer predicts low, landing at the 10th percentile.

Building Prediction Intervals

The power of quantile regression lies in combining multiple quantile predictions:

Train three separate models (or a multi-output model with three heads):
- Model for τ = 0.1: Predicts the 10th percentile lower bound
- Model for τ = 0.5: Predicts the median (central forecast)
- Model for τ = 0.9: Predicts the 90th percentile upper bound

The interval [Q_0.1(y|x), Q_0.9(y|x)] is an 80% prediction interval: in a well-calibrated model, 80% of true outcomes fall within this range.

Advantages over Gaussian Assumptions

Standard prediction intervals assume Gaussian residuals: ŷ ± 1.28σ for an 80% interval. Quantile regression makes no distributional assumption:
- Asymmetric intervals: If demand is right-skewed (rare spikes), the interval can extend further upward than downward
- Heteroscedasticity: Interval width can vary with x (predictions are more uncertain in some regions)
- Non-Gaussian distributions: Naturally captures fat tails, multimodality, or truncated distributions

Gradient Properties

Quantile loss is piecewise linear (not smooth at y = ŷ), making gradient-based optimization require subgradients:

∂L_τ/∂ŷ = τ − 𝟙[y > ŷ]

This is:
- +τ when ŷ > y (we overpredicted: gradient pushes prediction down)
- -(1-τ) when ŷ < y (we underpredicted: gradient pushes prediction up)
- Undefined at ŷ = y (subgradient can be any value in [-(1-τ), τ])

For tree-based models (LightGBM, XGBoost): built-in quantile loss support via gradient and Hessian computation.

Quantile Regression Forests

Random Forests naturally estimate conditional quantiles: instead of averaging leaf values, record all training samples reaching each leaf and report the τ-quantile of those sample values. This non-parametric approach avoids the model-per-quantile limitation and prevents quantile crossing (lower quantiles exceeding higher quantiles).

Interval Calibration

A critical evaluation metric: a 90% prediction interval should contain the true value 90% of the time (interval coverage). Models with poor calibration produce intervals that are systematically too narrow (overconfident) or too wide (underconfident). Reliability diagrams plot nominal vs. actual coverage across quantile levels.

Applications

- Retail demand forecasting: Predict the 80th percentile demand to set safety stock levels, minimizing both overstock cost and stockout probability
- Energy grid planning: Forecast peak demand distribution for capacity planning
- Clinical trial endpoints: Report confidence bounds on treatment effect estimates
- Financial VaR: Value at Risk is the 5th percentile of daily return distribution — a quantile regression problem
- Weather: Temperature forecast with uncertainty bounds for agricultural planning

Want to learn more?

Search 13,225+ semiconductor and AI topics or chat with our AI assistant.

Search Topics Chat with CFSGPT