Home Knowledge Base Sampling Parameters: Temperature and Top-P

Sampling Parameters: Temperature and Top-P

How LLM Generation Works LLMs predict the next token by computing a probability distribution over their vocabulary. Sampling parameters control how tokens are selected from this distribution.

Temperature

What is Temperature? Temperature scales the logits (raw prediction scores) before applying softmax, controlling the "sharpness" of the probability distribution.

Temperature Effects

TemperatureBehaviorUse Case
0.0Deterministic (greedy)Factual, code
0.3-0.5Low randomnessTechnical writing
0.7-0.8BalancedGeneral chat
1.0Standard randomnessCreative tasks
1.5+High randomnessBrainstorming

Mathematical Effect

Softmax with temperature T:
P(token) = exp(logit/T) / Σ exp(logits/T)

T < 1: Sharpens distribution (more deterministic)
T > 1: Flattens distribution (more random)
T = 0: Argmax (greedy decoding)

Top-P (Nucleus Sampling)

What is Top-P? Top-P sampling selects from the smallest set of tokens whose cumulative probability exceeds P, then samples randomly from this set.

Top-P Values

Top-PBehavior
0.1Very restrictive (few options)
0.5Moderate diversity
0.9Standard recommendation
1.0Include all tokens

Recommended Settings by Task

TaskTempTop-P
Code generation0.0-0.20.95
Data extraction0.01.0
Technical Q&A0.30.9
Creative writing0.8-1.00.95
Brainstorming1.0-1.50.95

Best Practice Generally use either temperature OR top-p, not both. Most APIs default top-p to 1.0 and let you adjust temperature.

temperaturetop_psampling

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.