Stop sequences are special tokens or strings that signal a language model to terminate generation — configuring stop sequences enables precise control over output boundaries, preventing rambling, unwanted continuations, or infinite generation loops.
What Are Stop Sequences?
- Definition: Tokens/strings that halt generation when produced.
- Mechanism: Generation stops immediately when stop sequence detected.
- Purpose: Control output length and structure.
- Examples: "
", "", "User:", EOS token.
Why Stop Sequences Matter
- Structured Output: Stop at expected boundaries.
- Conversation: Stop when assistant turn ends.
- Cost Control: Prevent unnecessary token generation.
- Format Compliance: Ensure proper structure.
- Agent Safety: Prevent uncontrolled generation.
Types of Stop Sequences
Built-in:
Token Type | Example | Purpose
----------------|----------------|------------------
EOS | </s>, <|endoftext|> | Model's trained end
Pad | <pad> | Unused in generation
Custom:
Application | Stop Sequences
----------------|----------------------------------
Chat | "User:", "Human:", "
User"
QA | "
", "Question:"
JSON | "}", "
"
Code | "```", "# End"
Function call | ")", "]}"
Implementation
OpenAI API:
response = openai.chat.completions.create(
model="gpt-4",
messages=[
{"role": "user", "content": "List 3 colors:"}
],
stop=["4.", "
"], # Stop at 4th item or double newline
)
Hugging Face:
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B")
# Method 1: Using eos_token_id
outputs = model.generate(
**inputs,
eos_token_id=tokenizer.eos_token_id,
)
# Method 2: Custom stopping criteria
class StopOnTokens(StoppingCriteria):
def __init__(self, stop_ids):
self.stop_ids = stop_ids
def __call__(self, input_ids, scores, **kwargs):
for stop_id in self.stop_ids:
if input_ids[0, -1] == stop_id:
return True
return False
stop_tokens = tokenizer.encode("User:", add_special_tokens=False)
stopping_criteria = [StopOnTokens(stop_tokens)]
outputs = model.generate(
**inputs,
stopping_criteria=stopping_criteria,
)
String-Based Stopping:
class StopOnString(StoppingCriteria):
def __init__(self, tokenizer, stop_strings):
self.tokenizer = tokenizer
self.stop_strings = stop_strings
def __call__(self, input_ids, scores, **kwargs):
generated = self.tokenizer.decode(input_ids[0])
for stop in self.stop_strings:
if stop in generated:
return True
return False
Common Patterns
Chat Applications:
stop_sequences = [
"User:",
"Human:",
"
User
",
"<|eot_id|>", # Llama 3 turn end
]
Structured Output:
# For JSON output
stop_sequences = ["```", "
}
"]
# For function calls
stop_sequences = [")
", ")]"]
# For lists
stop_sequences = ["
", "---"]
Agent/Tool Use:
# Stop when action specified
stop_sequences = [
"Action:",
"Observation:",
"PAUSE",
]
Best Practices
✅ Good Practices:
- Include multiple relevant stop sequences
- Test with edge cases
- Consider partial matches
- Handle stop sequence in output (trim if needed)
- Use model-specific tokens when available
❌ Common Mistakes:
- Forgetting newlines in stop sequences
- Stop sequence too common (premature stop)
- Stop sequence too rare (never triggers)
- Not trimming stop sequence from output
Trimming Output:
def generate_with_stop(prompt, stop_sequences):
output = model.generate(prompt, stop=stop_sequences)
# Trim stop sequence from end if present
for stop in stop_sequences:
if output.endswith(stop):
output = output[:-len(stop)]
return output.strip()
Stop sequences are fundamental to controlled generation — without proper termination signals, language models will continue generating until max tokens, wasting compute and potentially producing harmful or incoherent continuations.
Explore 500+ Semiconductor & AI Topics
From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.