Beam Search and Nucleus Sampling Decoding are complementary strategies for generating high-quality text from language models by balancing diversity and quality â beam search explores most-likely paths while nucleus sampling maintains coherence through probabilistic token selection from adaptive vocabulary.
Beam Search Algorithm:
- Multiple Hypotheses: maintaining B best partial sequences (beams) sorted by cumulative log probability â B=3-5 typical with diminishing returns beyond 10
- Expansion Step: extending each beam by one token, computing softmax over 50K vocabulary â O(B*V) complexity per step where V is vocabulary size
- Pruning: keeping only top B hypotheses from BĂV candidates using priority queue â reduces memory from exponential to linear in B
- Length Normalization: dividing scores by sequence length^Îą (Îą=0.6-0.7) to prevent bias toward short sentences â prevents algorithm favoring 1-2 word outputs
- Coverage Penalty: penalizing repeated coverage of same input tokens (for encoder-decoder models like T5) â improves summary diversity
Beam Search Characteristics:
- Quality Improvement: 5-10 BLEU point improvement on machine translation vs greedy (e.g., 28.0â33.5 BLEU) â noticeable in benchmarks but marginal in human evaluation
- Computational Cost: B=5 increases latency 5x due to batch processing larger number of sequences â trading generation speed for slightly better quality
- Determinism: identical outputs given same seed, reproducible across runs â useful for testing but unsuitable for creative tasks
- Hallucination Rate: 40-60% reduction in factual errors compared to greedy on QA tasks â especially beneficial for knowledge-critical applications
Nucleus (Top-P) Sampling:
- Cumulative Probability: selecting smallest vocabulary subset with cumulative probability >P (P=0.9 typical) â dynamically sized vocabulary per token
- Sorted Selection: ranking tokens by probability, accumulating until threshold P crossed â adaptive vocabulary 20-200 tokens depending on distribution
- Sampling: uniformly sampling from nucleus subset then applying temperature scaling â introduces beneficial stochasticity
- Temperature Interaction: combining nucleus (P) with temperature T for fine-grained control â P=0.9, T=0.8 balances quality and diversity
Top-K Sampling Approach:
- Fixed Vocabulary: sampling only from top K highest probability tokens (K=40-50 typical) â prevents sampling from extremely low probability tokens
- Hyperparameter Sensitivity: K=10 produces very focused outputs, K=100 allows more diversity â requires manual tuning per application
- Computational Simplicity: partial sort identifying top K requires O(Klog(V)) vs full sort O(Vlog(V)) â marginal speedup compared to nucleus
- Comparison: nucleus sampling outperforms fixed top-K on diversity while maintaining quality (human preference 65-75% in studies)
Temperature Scaling Impact:
- T=0: greedy decoding selecting arg-max token â deterministic, prone to repetition
- T=0.7: sharp distribution sharpening rare tokens, reducing diversity â recommended for factual tasks (QA, summarization)
- T=1.0: no scaling, using model calibrated probabilities â baseline setting
- T=1.5: softened distribution emphasizing diversity â recommended for creative tasks (story generation, dialogue)
Practical Decoding Strategies:
- Repetition Penalty: dividing logit of previously generated tokens by penalty parameter (1.0-2.0) â prevents repetitive sequences common in nucleus sampling
- Length Penalty: decreasing future token logits as sequence grows â encourages longer generations (useful for minimum length requirements)
- Bad Words Filter: zeroing logits of inappropriate tokens before sampling â prevents toxic or off-topic outputs
- Constraint Satisfaction: modifying probabilities to steer toward particular semantic constraints (CommonSense reasoning, QA answer format)
Beam Search and Nucleus Sampling Decoding are complementary techniques â beam search providing quality improvements for deterministic tasks while nucleus sampling enables creative, diverse text generation for conversational and creative applications.