Bayesian Optimization is a sample-efficient hyperparameter tuning strategy that builds a probabilistic model of the objective function to intelligently decide which configuration to try next — unlike Random Search (blind sampling) or Grid Search (exhaustive enumeration), Bayesian Optimization "learns" from past trials which regions of the hyperparameter space are promising, balancing exploration (trying unexplored regions) and exploitation (refining known good regions) to find optimal configurations in far fewer trials.
What Is Bayesian Optimization?
- Definition: A sequential model-based optimization strategy that (1) builds a surrogate model (typically a Gaussian Process or Tree-structured Parzen Estimator) of the objective function from evaluated trials, (2) uses an acquisition function to determine the most informative point to evaluate next, and (3) updates the surrogate model with the new result, repeating until the budget is exhausted.
- Why "Bayesian"?: The algorithm maintains a probabilistic belief (posterior distribution) about the objective function — it knows both the predicted performance AND the uncertainty at every point in the search space, using uncertainty to drive exploration.
- When It Shines: When each trial is expensive (hours of GPU training, expensive API calls, physical experiments) and you need to find a good configuration in 20-50 trials instead of 500.
How Bayesian Optimization Works
| Step | Process | What Happens |
|---|---|---|
| 1. Initial trials | Evaluate 5-10 random configurations | Build initial understanding |
| 2. Fit surrogate model | Gaussian Process on (config → performance) pairs | Model predicts performance + uncertainty for any config |
| 3. Acquisition function | Find config that maximizes Expected Improvement | Balance: try where predicted good OR where very uncertain |
| 4. Evaluate | Train model with chosen config | Get actual performance |
| 5. Update surrogate | Add new result, refit GP | Surrogate becomes more accurate |
| 6. Repeat | Go to step 3 | Converge toward optimum |
Surrogate Models
| Model | How It Works | Pros | Cons | ||
|---|---|---|---|---|---|
| Gaussian Process (GP) | Non-parametric regression with uncertainty estimates | Gold standard, principled uncertainty | Scales poorly beyond ~1000 trials | ||
| TPE (Tree Parzen Estimator) | Model P(x | good) and P(x | bad) separately | Handles categorical/conditional params well | Less principled than GP |
| Random Forest | Ensemble regression as surrogate | Scales well, handles mixed types | Less smooth uncertainty estimates |
Acquisition Functions
| Function | Strategy | Behavior |
|---|---|---|
| Expected Improvement (EI) | Choose point with highest expected improvement over current best | Good balance of exploration/exploitation |
| Upper Confidence Bound (UCB) | Choose point with highest (predicted mean + κ × uncertainty) | κ controls explore/exploit |
| Probability of Improvement (PI) | Choose point most likely to beat current best | Greedy, can get stuck |
Libraries
| Library | Surrogate | Strengths |
|---|---|---|
| Optuna | TPE (default) | Modern, Python-native, pruning support, visualization |
| Hyperopt | TPE | Classic, widely tested |
| BoTorch / Ax | Gaussian Process | Facebook's framework, most principled |
| Ray Tune | Wraps Optuna/Hyperopt | Distributed execution |
| Scikit-Optimize | GP, RF, ExtraTrees | sklearn-compatible interface |
import optuna
def objective(trial):
lr = trial.suggest_float("lr", 1e-5, 1e-1, log=True)
depth = trial.suggest_int("max_depth", 3, 12)
model = train_model(lr=lr, max_depth=depth)
return evaluate(model)
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=50)
print(study.best_params)
Bayesian Optimization is the most sample-efficient hyperparameter tuning strategy — intelligently selecting which configurations to evaluate by building a probabilistic model of the objective function, making it the preferred approach when each trial is computationally expensive and the budget is limited to tens rather than hundreds of evaluations.
Related Topics
Explore 500+ Semiconductor & AI Topics
From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.