Sampling Parameters: Temperature and Top-P

Home› Knowledge Base› Sampling Parameters: Temperature and Top-P

Sampling Parameters: Temperature and Top-P

How LLM Generation Works LLMs predict the next token by computing a probability distribution over their vocabulary. Sampling parameters control how tokens are selected from this distribution.

Temperature

What is Temperature? Temperature scales the logits (raw prediction scores) before applying softmax, controlling the "sharpness" of the probability distribution.

Temperature Effects

Temperature	Behavior	Use Case
0.0	Deterministic (greedy)	Factual, code
0.3-0.5	Low randomness	Technical writing
0.7-0.8	Balanced	General chat
1.0	Standard randomness	Creative tasks
1.5+	High randomness	Brainstorming

Mathematical Effect

Softmax with temperature T:
P(token) = exp(logit/T) / Σ exp(logits/T)

T < 1: Sharpens distribution (more deterministic)
T > 1: Flattens distribution (more random)
T = 0: Argmax (greedy decoding)

Top-P (Nucleus Sampling)

What is Top-P? Top-P sampling selects from the smallest set of tokens whose cumulative probability exceeds P, then samples randomly from this set.

Top-P Values

Top-P	Behavior
0.1	Very restrictive (few options)
0.5	Moderate diversity
0.9	Standard recommendation
1.0	Include all tokens

Recommended Settings by Task

Task	Temp	Top-P
Code generation	0.0-0.2	0.95
Data extraction	0.0	1.0
Technical Q&A	0.3	0.9
Creative writing	0.8-1.0	0.95
Brainstorming	1.0-1.5	0.95

Best Practice Generally use either temperature OR top-p, not both. Most APIs default top-p to 1.0 and let you adjust temperature.

temperaturetop_psampling

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.

🔍 Search Topics 💬 Ask CFSGPT 📚 Browse All