Home Knowledge Base Model Calibration

Model Calibration is the property of a probabilistic classifier where predicted confidence scores accurately reflect empirical outcome probabilities — a well-calibrated model that says "70% confidence" is correct approximately 70% of the time across all such predictions, making calibration essential for risk-sensitive applications where downstream decisions depend on the model's expressed uncertainty.

What Is Model Calibration?

Why Calibration Matters

Measuring Calibration

Reliability Diagram (Calibration Plot):

Expected Calibration Error (ECE): ECE = Σ (|B_m| / n) × |acc(B_m) - conf(B_m)| Where B_m = predictions in bin m, acc = accuracy, conf = mean confidence. Lower ECE = better calibration.

Maximum Calibration Error (MCE): Worst-case calibration error across all bins — more conservative than ECE.

Negative Log-Likelihood (NLL): Proper scoring rule penalizing both accuracy and calibration — theoretically optimal measure.

Why Modern Neural Networks Are Overconfident

Guo et al. (ICML 2017) showed that modern deep neural networks trained with cross-entropy loss are significantly overconfident — they are more accurate than older networks but worse calibrated:

Calibration Techniques

TechniqueMethodWhen to UseComplexity
Temperature ScalingSingle parameter T: softmax(logits/T)Post-training, simple modelsVery low
Platt ScalingSigmoid on output scoresBinary classificationLow
Isotonic RegressionNon-parametric monotonic mappingWhen data abundantMedium
Dirichlet CalibrationMulti-class generalization of PlattMulti-class classificationMedium
Bayesian Deep LearningUncertainty in weightsBuilt-in calibrationHigh

Temperature Scaling in Practice

The simplest and most effective post-hoc calibration method for neural networks: 1. Train the model normally (do not change weights). 2. On a held-out calibration set, find scalar T that minimizes NLL: T = argmin_T NLL(softmax(logits/T)). 3. At inference: use softmax(logits/T) as calibrated probability.

For LLMs, temperature scaling directly corresponds to the temperature parameter used during sampling — this is not coincidental; temperature was originally a calibration tool.

Model calibration is the bridge between predicted confidence and trustworthy uncertainty communication — in every domain where AI predictions inform real decisions, the gap between expressed confidence and empirical accuracy determines whether AI assistance improves or degrades human judgment.

calibrationprobabilityconfidence

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.