Core idea

Bayesian optimization efficiently searches hyperparameters by building a probabilistic model of the objective function. Core idea: Maintain belief about how hyperparameters affect performance. Sample where uncertain or likely good. Update belief with results. Components: Surrogate model: Gaussian process or tree model approximating the objective. Gives mean prediction and uncertainty. Acquisition function: Balances exploration (uncertain regions) and exploitation (predicted good regions). Expected improvement common. Process: Fit surrogate on observed trials, maximize acquisition to select next trial, evaluate, repeat. Advantages over random: Fewer evaluations needed for same quality. Better for expensive objectives (neural network training). When to use: Expensive evaluations (full training runs), continuous hyperparameters, moderate dimensionality (under ~20). Limitations: Overhead of surrogate fitting, struggles with very high dimensions, discrete variables handled differently. Tools: Optuna, scikit-optimize, BoTorch, Ax, Spearmint. Practical tips: Good initialization matters, allow enough trials (20-50+ typical), handle crashes gracefully. Multi-fidelity: Early stopping or simpler evaluations to filter bad configurations quickly.

Want to learn more?