Bayesian Optimization for Design

Bayesian Optimization for Design is the sample-efficient optimization technique that builds a probabilistic surrogate model (typically Gaussian process) of the expensive-to-evaluate objective function and uses acquisition functions to intelligently select the next design point to evaluate — maximizing information gain while balancing exploration and exploitation, making it ideal for chip design problems where each evaluation requires hours of synthesis, simulation, or physical implementation.

Bayesian Optimization Framework:
- Surrogate Model (Gaussian Process): probabilistic model that provides both mean prediction μ(x) and uncertainty σ(x) for any design point x; trained on observed data points (x_i, y_i) from previous evaluations; kernel function (RBF, Matérn) encodes smoothness assumptions about objective landscape
- Acquisition Function: determines which point to evaluate next; balances exploitation (sampling where μ(x) is high) and exploration (sampling where σ(x) is high); common functions include Expected Improvement (EI), Upper Confidence Bound (UCB), and Probability of Improvement (PI)
- Sequential Decision Making: iterative process — fit GP to observed data, optimize acquisition function to find next point, evaluate expensive objective at that point, update GP with new observation; continues until budget exhausted or convergence
- Multi-Fidelity Extension: leverages cheap low-fidelity evaluations (fast simulation, analytical models) and expensive high-fidelity evaluations (full synthesis, gate-level simulation); GP models correlation between fidelities; reduces total cost by 5-10×

Acquisition Functions:
- Expected Improvement (EI): EI(x) = E[max(f(x) - f_best, 0)] where f_best is current best observation; analytically computable for GP; balances exploration and exploitation naturally; most widely used acquisition function
- Upper Confidence Bound (UCB): UCB(x) = μ(x) + β·σ(x) where β controls exploration-exploitation trade-off; β=2-3 typical; theoretical regret bounds available; simpler than EI but requires tuning β
- Probability of Improvement (PI): PI(x) = P(f(x) > f_best + ξ) where ξ is exploration parameter; more exploitative than EI; useful when finding any improvement is valuable
- Knowledge Gradient: estimates value of information from evaluating x; considers not just immediate improvement but future optimization benefit; more sophisticated but computationally expensive

Applications in Chip Design:
- EDA Tool Parameter Tuning: optimize synthesis, placement, and routing tool settings; 20-50 parameters typical (effort levels, optimization strategies, timing constraints); each evaluation requires 1-6 hours of tool runtime; BO finds near-optimal settings in 50-200 evaluations vs thousands for grid search
- Analog Circuit Optimization: optimize transistor sizes, bias currents, and component values; objectives include gain, bandwidth, power, noise; constraints on stability, linearity, and supply voltage; BO handles expensive SPICE simulations efficiently
- Architecture Design Space Exploration: optimize processor microarchitecture parameters (cache sizes, pipeline depth, issue width); each evaluation requires RTL synthesis and cycle-accurate simulation; BO discovers high-performance configurations with 10-100× fewer evaluations than random search
- Process Variation Optimization: optimize design parameters for robustness to manufacturing variations; each evaluation requires Monte Carlo SPICE simulation (100-1000 samples); BO with multi-fidelity (few samples for exploration, many samples for promising designs) reduces total simulation time

Advanced BO Techniques:
- Batch Bayesian Optimization: selects multiple points to evaluate in parallel; acquisition functions extended to batch setting (q-EI, q-UCB); enables parallel evaluation on compute cluster; reduces wall-clock time proportionally to batch size
- Constrained Bayesian Optimization: handles design constraints (timing closure, power budget, area limit); separate GP models constraint functions; acquisition function modified to favor feasible regions; discovers optimal designs satisfying all constraints
- Multi-Objective Bayesian Optimization: discovers Pareto frontier for competing objectives (power vs performance); acquisition functions extended to multi-objective setting (EHVI, ParEGO); provides designer with diverse trade-off options
- Transfer Learning: leverages data from previous design projects; GP prior incorporates knowledge from related designs; reduces cold-start problem; achieves good results with fewer evaluations on new design

Practical Considerations:
- Kernel Selection: RBF kernel assumes smooth objective; Matérn kernel allows roughness control; automatic relevance determination (ARD) learns per-dimension length scales; kernel choice affects sample efficiency
- Initialization: Latin hypercube sampling or Sobol sequences for initial design points; 5-10× dimensionality typical (50-100 points for 10D problem); good initialization accelerates convergence
- Computational Cost: GP training O(n³) in number of observations; becomes expensive for >1000 observations; sparse GP approximations (inducing points, variational inference) scale to 10,000+ observations
- Hyperparameter Optimization: GP hyperparameters (length scales, noise variance) optimized by maximizing marginal likelihood; critical for good performance; periodic re-optimization as more data collected

Commercial and Research Tools:
- Synopsys DSO.ai: uses Bayesian optimization (among other techniques) for design space exploration; reported 10-20% PPA improvements; deployed in production tape-outs
- Cadence Cerebrus: ML-driven optimization includes BO-like techniques; predicts design outcomes and guides parameter selection
- Academic Tools (BoTorch, GPyOpt, Spearmint): open-source BO libraries; demonstrated on processor design, FPGA optimization, and analog circuit sizing; enable research and prototyping
- Case Studies: ARM processor design (30% energy reduction with 200 BO evaluations); FPGA place-and-route (15% frequency improvement with 100 evaluations); analog amplifier (meets specs with 50 evaluations vs 500 for manual tuning)

Performance Comparison:
- BO vs Random Search: BO achieves same quality with 10-100× fewer evaluations; critical when evaluations are expensive (hours each); random search only competitive for very cheap evaluations
- BO vs Genetic Algorithms: BO more sample-efficient (fewer evaluations); GA better for very high-dimensional spaces (>50D) and discrete combinatorial problems; BO preferred for continuous optimization with expensive evaluations
- BO vs Gradient-Based: BO handles non-differentiable, noisy, and black-box objectives; gradient methods faster when gradients available; BO preferred for EDA tools where gradients unavailable

Bayesian optimization represents the state-of-the-art in sample-efficient design optimization — its principled probabilistic approach to balancing exploration and exploitation makes it the method of choice for expensive chip design problems where evaluation budgets are limited and each design iteration costs hours of computation, enabling discovery of high-quality designs with minimal wasted effort.

Want to learn more?