Perplexity and Cross-Entropy Loss
Cross-Entropy Loss Explained Cross-entropy loss is the primary training objective for language models. It measures how well the model's predicted probability distribution matches the actual next token.
Mathematical Definition For a sequence of tokens with predictions p and true labels y: $$ Loss = -\frac{1}{N} \sum_{i=1}^{N} \log P(y_i | x_{
Lower loss means the model assigns higher probability to correct tokens.
Example If the model predicts:
- "the" with 60% probability
- "a" with 30% probability
- Other tokens share 10%
And the correct next word is "the":
- Loss = -log(0.6) ≈ 0.51
If "the" was predicted with 90% probability:
- Loss = -log(0.9) ≈ 0.11 (better!)
Perplexity
What is Perplexity? Perplexity (PPL) is the exponentiated cross-entropy loss. It represents "how confused" the model is about predictions.
Formula $$ PPL = exp(Loss) = exp\left(-\frac{1}{N} \sum \log P(y_i)\right) $$
Interpreting Perplexity
| Perplexity | Interpretation |
|---|---|
| 1 | Perfect prediction (impossible in practice) |
| 10-20 | Excellent for domain-specific models |
| 20-50 | Good general language model |
| 50-100 | Average quality |
| >100 | Poor, model is "confused" |
Intuitive Meaning A perplexity of 50 means the model is as uncertain as if it were choosing uniformly among 50 possible tokens.
Practical Use
- Training: Minimize cross-entropy loss
- Evaluation: Report perplexity on held-out test sets
- Model comparison: Lower perplexity generally means better language modeling (but does not always correlate with downstream task performance)
Explore 500+ Semiconductor & AI Topics
From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.