Stop sequences | ChipFoundryServices

Home› Knowledge Base› Stop sequences

Stop sequences are special tokens or strings that signal a language model to terminate generation — configuring stop sequences enables precise control over output boundaries, preventing rambling, unwanted continuations, or infinite generation loops.

What Are Stop Sequences?

Definition: Tokens/strings that halt generation when produced.
Mechanism: Generation stops immediately when stop sequence detected.
Purpose: Control output length and structure.
Examples: "

", "", "User:", EOS token.

Why Stop Sequences Matter

Structured Output: Stop at expected boundaries.
Conversation: Stop when assistant turn ends.
Cost Control: Prevent unnecessary token generation.
Format Compliance: Ensure proper structure.
Agent Safety: Prevent uncontrolled generation.

Types of Stop Sequences

Built-in:

Token Type      | Example        | Purpose
----------------|----------------|------------------
EOS             | </s>, <|endoftext|> | Model's trained end
Pad             | <pad>          | Unused in generation

Custom:

Application     | Stop Sequences
----------------|----------------------------------
Chat            | "User:", "Human:", "

User"
QA              | "

", "Question:"
JSON            | "}", "
"
Code            | "```", "# End"
Function call   | ")", "]}"

Implementation

OpenAI API:

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "List 3 colors:"}
    ],
    stop=["4.", "

"],  # Stop at 4th item or double newline
)

Hugging Face:

from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B")

# Method 1: Using eos_token_id
outputs = model.generate(
    **inputs,
    eos_token_id=tokenizer.eos_token_id,
)

# Method 2: Custom stopping criteria
class StopOnTokens(StoppingCriteria):
    def __init__(self, stop_ids):
        self.stop_ids = stop_ids
    
    def __call__(self, input_ids, scores, **kwargs):
        for stop_id in self.stop_ids:
            if input_ids[0, -1] == stop_id:
                return True
        return False

stop_tokens = tokenizer.encode("User:", add_special_tokens=False)
stopping_criteria = [StopOnTokens(stop_tokens)]

outputs = model.generate(
    **inputs,
    stopping_criteria=stopping_criteria,
)

String-Based Stopping:

class StopOnString(StoppingCriteria):
    def __init__(self, tokenizer, stop_strings):
        self.tokenizer = tokenizer
        self.stop_strings = stop_strings
    
    def __call__(self, input_ids, scores, **kwargs):
        generated = self.tokenizer.decode(input_ids[0])
        for stop in self.stop_strings:
            if stop in generated:
                return True
        return False

Common Patterns

Chat Applications:

stop_sequences = [
    "User:",
    "Human:",
    "
User
",
    "<|eot_id|>",  # Llama 3 turn end
]

Structured Output:

# For JSON output
stop_sequences = ["```", "
}
"]

# For function calls
stop_sequences = [")
", ")]"]

# For lists
stop_sequences = ["

", "---"]

Agent/Tool Use:

# Stop when action specified
stop_sequences = [
    "Action:",
    "Observation:",
    "PAUSE",
]

Best Practices

✅ Good Practices:
- Include multiple relevant stop sequences
- Test with edge cases
- Consider partial matches
- Handle stop sequence in output (trim if needed)
- Use model-specific tokens when available

❌ Common Mistakes:
- Forgetting newlines in stop sequences
- Stop sequence too common (premature stop)
- Stop sequence too rare (never triggers)
- Not trimming stop sequence from output

Trimming Output:

def generate_with_stop(prompt, stop_sequences):
    output = model.generate(prompt, stop=stop_sequences)
    
    # Trim stop sequence from end if present
    for stop in stop_sequences:
        if output.endswith(stop):
            output = output[:-len(stop)]
    
    return output.strip()

Stop sequences are fundamental to controlled generation — without proper termination signals, language models will continue generating until max tokens, wasting compute and potentially producing harmful or incoherent continuations.

stop sequenceeosterminationgenerationcontrolboundary

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.

🔍 Search Topics 💬 Ask CFSGPT 📚 Browse All