Home Knowledge Base Repetition penalty

Repetition penalty is a decoding modification that reduces the probability of tokens that have already appeared in generated text — preventing the common failure mode where language models get stuck in loops, repeating the same phrases or patterns indefinitely.

What Is Repetition Penalty?

Why Repetition Occurs

Example Problem:

Without penalty:
"I love AI. I love AI. I love AI. I love AI..."

With penalty:
"I love AI. It enables incredible applications, 
from healthcare to creative writing..."

How It Works

Algorithm:

For each next token prediction:
  1. Get logits from model
  2. For each token that appeared in context:
     - If logit > 0: logit = logit / penalty
     - If logit < 0: logit = logit * penalty
  3. Apply softmax
  4. Sample or argmax

Implementation:

import torch

def apply_repetition_penalty(
    logits: torch.Tensor,
    input_ids: torch.Tensor,
    penalty: float = 1.2
):
    """Apply repetition penalty to logits."""
    # Get unique tokens that have appeared
    unique_tokens = input_ids.unique()
    
    for token_id in unique_tokens:
        # Penalize both positive and negative logits correctly
        if logits[token_id] > 0:
            logits[token_id] = logits[token_id] / penalty
        else:
            logits[token_id] = logits[token_id] * penalty
    
    return logits

Hugging Face Usage:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

inputs = tokenizer("The weather today is", return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    repetition_penalty=1.2,  # >1.0 penalizes repetition
    do_sample=True,
    top_p=0.92,
)

Related Techniques

No-Repeat N-gram:

outputs = model.generate(
    **inputs,
    no_repeat_ngram_size=3,  # Block any 3-gram from repeating
)

# Effect: "the big red" can only appear once

Frequency/Presence Penalty (OpenAI-style):

# OpenAI API
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[...],
    frequency_penalty=0.5,  # Based on count
    presence_penalty=0.5,   # Binary: appeared or not
)

# frequency_penalty: Stronger for more frequent tokens
# presence_penalty: Same penalty regardless of count

Comparison:

Technique          | Mechanism
-------------------|----------------------------------
repetition_penalty | Multiplicative on seen tokens
frequency_penalty  | Additive based on count
presence_penalty   | Additive if seen at all
no_repeat_ngram    | Hard block on n-gram sequences

Parameter Tuning

Guidelines:

Value     | Effect
----------|----------------------------------
1.0       | No penalty (default/off)
1.1-1.2   | Light penalty (most uses)
1.2-1.5   | Moderate penalty
1.5-2.0   | Strong penalty
>2.0      | Very strong (may hurt quality)

By Use Case:

Use Case             | repetition_penalty
---------------------|--------------------
Conversational       | 1.1-1.2
Creative writing     | 1.0-1.15
Technical writing    | 1.15-1.3
Summarization        | 1.1-1.2
Code generation      | 1.0-1.1 (code repeats naturally)

Potential Issues

Issue                | Mitigation
---------------------|----------------------------------
Over-penalizing      | Use lower penalty value
Hurts coherence      | Limit to generated tokens only
Blocks needed words  | Use frequency_penalty instead
Affects stop words   | Exclude common tokens from penalty

Repetition penalty is essential for usable text generation — without it, most sampling methods eventually produce repetitive output, making this simple modification a standard component of production generation pipelines.

repetition penaltyfrequencypresenceloopdegeneration

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.