Online learning

Online learning is a machine learning paradigm where the model is updated incrementally as new data arrives, one example (or small batch) at a time, rather than being trained on a fixed, complete dataset. The model continuously adapts to new data throughout its lifetime.

Online vs. Batch Learning

| Aspect | Online Learning | Batch Learning |
|--------|----------------|----------------|
| Data | Streaming, one at a time | Fixed, complete dataset |
| Updates | After each example | After processing entire dataset |
| Adaptation | Immediate | Requires retraining |
| Memory | Low (doesn't store all data) | High (needs all data in memory) |
| Staleness | Always current | Becomes stale between retraining |

How Online Learning Works

- Receive a new example (x, y).
- Predict using the current model.
- Observe the true label and compute the loss.
- Update model parameters based on the loss.
- Repeat for the next example.

Online Learning Algorithms

- Online Gradient Descent: Apply stochastic gradient descent with each new example.
- Perceptron: Classic online linear classifier — update weights only on misclassified examples.
- Passive-Aggressive: More aggressive updates for examples with larger errors.
- Online Newton Step: Second-order online optimization for faster convergence.
- Bandit Algorithms: Online learning with partial feedback — UCB, Thompson Sampling.

Applications

- Recommendation Systems: Update user preferences as new interactions arrive.
- Fraud Detection: Adapt to new fraud patterns as they emerge in real-time.
- Ad Optimization: Continuously optimize ad targeting based on click-through data.
- Search Ranking: Update ranking models as user behavior evolves.
- Stream Processing: Analyze and learn from sensor data, logs, or financial streams.

Challenges

- Concept Drift: The underlying data distribution may change over time, requiring the model to adapt.
- Catastrophic Forgetting: Adapting too aggressively to new data can lose old knowledge.
- Noisy Data: Individual examples may be noisy — the model must be robust to outliers.
- Evaluation: Hard to evaluate performance on evolving distributions with traditional held-out sets.

Online learning is the natural paradigm for applications where data arrives continuously and the world changes over time — it trades the stability of batch training for continuous adaptation.

Want to learn more?