Mechanism

Repetition penalty decreases probability of previously generated tokens to prevent repetitive output. Mechanism: For each token already in output, divide its logit by penalty factor (1.0 = no effect, >1.0 = suppress). Some implementations use additive penalty instead. Formula: logit_new = logit / penalty if token appeared, else logit unchanged. Scope options: Penalize all previous tokens, sliding window of recent tokens only, or frequency-based (penalize more for repeated tokens). Typical values: 1.0-1.2 (subtle), 1.2-1.5 (moderate), 1.5+ (aggressive). Related techniques: Presence penalty (flat penalty for any appearance), frequency penalty (scales with occurrence count), no-repeat-ngram (forbid exact n-gram repeats). Implementation: Applied before softmax during token selection. Trade-offs: Too low → repetitive "loop" outputs, too high → unnatural topic changes, forced vocabulary diversity. Use cases: Open-ended generation, chatbots, creative writing. Best practices: Start with 1.1-1.2, adjust based on output quality, combine with nucleus sampling for best results.

Want to learn more?