Home Knowledge Base Early stopping

Early stopping is a regularization technique that halts training when validation performance stops improving — preventing overfitting by monitoring validation metrics and saving the best model checkpoint, typically using patience parameters to allow for temporary plateaus.

What Is Early Stopping?

Why Early Stopping Works

Training Dynamics

Typical Pattern:

Epoch    | Train Loss | Val Loss  | Action
---------|------------|-----------|----------
1        | 2.5        | 2.4       | Continue
5        | 1.8        | 1.6       | Continue
10       | 1.2        | 1.3       | Save best
15       | 0.8        | 1.2       | Save best ✓
20       | 0.5        | 1.3       | Patience 1
25       | 0.3        | 1.4       | Patience 2
30       | 0.2        | 1.5       | Stop (patience exceeded)

Return model from epoch 15 (best val loss: 1.2)

Overfitting Visualization:

Loss
 │
 │   Train ─────────────────────
 │         ╲
 │          ╲    
 │           ╲_________________ (continues down)
 │
 │   Val   ─────╲
 │               ╲____╱─────────
 │                    ↑
 │              Best checkpoint
 └────────────────────────────────── Epoch

Implementation

PyTorch Training Loop:

class EarlyStopping:
    def __init__(self, patience=5, min_delta=0.001, mode="min"):
        self.patience = patience
        self.min_delta = min_delta
        self.mode = mode  # "min" for loss, "max" for accuracy
        self.counter = 0
        self.best_score = None
        self.best_model = None
        self.should_stop = False
    
    def __call__(self, score, model):
        if self.best_score is None:
            self.best_score = score
            self.save_checkpoint(model)
        elif self._is_improvement(score):
            self.best_score = score
            self.save_checkpoint(model)
            self.counter = 0
        else:
            self.counter += 1
            if self.counter >= self.patience:
                self.should_stop = True
        
        return self.should_stop
    
    def _is_improvement(self, score):
        if self.mode == "min":
            return score < self.best_score - self.min_delta
        return score > self.best_score + self.min_delta
    
    def save_checkpoint(self, model):
        self.best_model = copy.deepcopy(model.state_dict())

# Usage
early_stopping = EarlyStopping(patience=5)

for epoch in range(max_epochs):
    train_loss = train_epoch(model, train_loader)
    val_loss = validate(model, val_loader)
    
    if early_stopping(val_loss, model):
        print(f"Early stopping at epoch {epoch}")
        break

# Load best model
model.load_state_dict(early_stopping.best_model)

With Transformers:

from transformers import Trainer, TrainingArguments, EarlyStoppingCallback

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)

Key Parameters

Configuring Early Stopping:

Parameter      | Typical Values | Effect
---------------|----------------|------------------
patience       | 3-10 epochs    | Higher = more training
min_delta      | 0.001-0.01     | Required improvement
metric         | val_loss       | What to monitor
mode           | min/max        | Minimize loss or maximize accuracy
restore_best   | True           | Return to best checkpoint

Best Practices

✅ Use validation set separate from test set
✅ Save full model state for restoration
✅ Consider multiple metrics
✅ Set reasonable patience (not too short)
✅ Use with learning rate scheduling

❌ Only monitor training loss
❌ Patience = 1 (too aggressive)
❌ Forget to restore best model
❌ Use test set for early stopping criterion

Early stopping is essential protection against overfitting — by automatically detecting when the model starts memorizing training data rather than learning generalizable patterns, it ensures you get the most useful model without manual epoch tuning.

early stoppingpatiencecheckpointvalidationoverfittingregularization

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.