Home Knowledge Base Quantile loss

Quantile loss (also called pinball loss) is an asymmetric loss function used to train models that predict specific quantiles of a conditional distribution — rather than the mean — enabling the construction of calibrated prediction intervals that quantify uncertainty, by penalizing underprediction and overprediction at different rates determined by the quantile parameter τ, with applications in demand forecasting, risk assessment, weather prediction, and any domain requiring interpretable confidence bounds alongside point predictions.

Mathematical Definition

For a target quantile τ ∈ (0, 1), the quantile loss for prediction ŷ and true value y is:

L_τ(y, ŷ) = τ · max(y − ŷ, 0) + (1 − τ) · max(ŷ − y, 0)

Equivalently:

Calibration Property

The remarkable property of quantile loss: minimizing E[L_τ(y, ŷ)] over all functions ŷ(x) yields the conditional τ-quantile Q_τ(y | x) — the value below which a fraction τ of outcomes fall.

For τ = 0.5: The loss is symmetric (τ = 1-τ = 0.5), and minimization yields the conditional median — the value where 50% of outcomes are below.

For τ = 0.9: The loss penalizes underprediction 9× more than overprediction (τ/(1-τ) = 9:1). The optimizer is pushed to predict high, landing at the 90th percentile.

For τ = 0.1: The loss penalizes overprediction 9× more than underprediction. The optimizer predicts low, landing at the 10th percentile.

Building Prediction Intervals

The power of quantile regression lies in combining multiple quantile predictions:

Train three separate models (or a multi-output model with three heads):

The interval [Q_0.1(y|x), Q_0.9(y|x)] is an 80% prediction interval: in a well-calibrated model, 80% of true outcomes fall within this range.

Advantages over Gaussian Assumptions

Standard prediction intervals assume Gaussian residuals: ŷ ± 1.28σ for an 80% interval. Quantile regression makes no distributional assumption:

Gradient Properties

Quantile loss is piecewise linear (not smooth at y = ŷ), making gradient-based optimization require subgradients:

∂L_τ/∂ŷ = τ − 𝟙[y > ŷ]

This is:

For tree-based models (LightGBM, XGBoost): built-in quantile loss support via gradient and Hessian computation.

Quantile Regression Forests

Random Forests naturally estimate conditional quantiles: instead of averaging leaf values, record all training samples reaching each leaf and report the τ-quantile of those sample values. This non-parametric approach avoids the model-per-quantile limitation and prevents quantile crossing (lower quantiles exceeding higher quantiles).

Interval Calibration

A critical evaluation metric: a 90% prediction interval should contain the true value 90% of the time (interval coverage). Models with poor calibration produce intervals that are systematically too narrow (overconfident) or too wide (underconfident). Reliability diagrams plot nominal vs. actual coverage across quantile levels.

Applications

quantile losspinball lossprediction interval

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.