Sampling Parameters: Temperature and Top-P
How LLM Generation Works LLMs predict the next token by computing a probability distribution over their vocabulary. Sampling parameters control how tokens are selected from this distribution.
Temperature
What is Temperature? Temperature scales the logits (raw prediction scores) before applying softmax, controlling the "sharpness" of the probability distribution.
Temperature Effects
| Temperature | Behavior | Use Case |
|---|---|---|
| 0.0 | Deterministic (greedy) | Factual, code |
| 0.3-0.5 | Low randomness | Technical writing |
| 0.7-0.8 | Balanced | General chat |
| 1.0 | Standard randomness | Creative tasks |
| 1.5+ | High randomness | Brainstorming |
Mathematical Effect
Softmax with temperature T:
P(token) = exp(logit/T) / Σ exp(logits/T)
T < 1: Sharpens distribution (more deterministic)
T > 1: Flattens distribution (more random)
T = 0: Argmax (greedy decoding)
Top-P (Nucleus Sampling)
What is Top-P? Top-P sampling selects from the smallest set of tokens whose cumulative probability exceeds P, then samples randomly from this set.
Top-P Values
| Top-P | Behavior |
|---|---|
| 0.1 | Very restrictive (few options) |
| 0.5 | Moderate diversity |
| 0.9 | Standard recommendation |
| 1.0 | Include all tokens |
Recommended Settings by Task
| Task | Temp | Top-P |
|---|---|---|
| Code generation | 0.0-0.2 | 0.95 |
| Data extraction | 0.0 | 1.0 |
| Technical Q&A | 0.3 | 0.9 |
| Creative writing | 0.8-1.0 | 0.95 |
| Brainstorming | 1.0-1.5 | 0.95 |
Best Practice Generally use either temperature OR top-p, not both. Most APIs default top-p to 1.0 and let you adjust temperature.
Explore 500+ Semiconductor & AI Topics
From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.