Bayesian Optimization for Process

Bayesian Optimization for Process is a sample-efficient probabilistic optimization framework for finding optimal semiconductor process conditions with minimal experimental runs — using Gaussian Process surrogate models to build a probabilistic map of process response surfaces and acquisition functions to intelligently balance exploration of uncertain regions against exploitation of known high-performance areas, enabling engineers to optimize complex multi-variable recipes (etch rate, uniformity, defect density) with 5-20x fewer experiments than traditional Design of Experiments approaches.

The Core Challenge: Expensive Black-Box Optimization

Semiconductor process optimization faces unique constraints that make standard optimization approaches impractical:
- Each experiment costs hours of tool time and thousands of dollars in wafer cost
- Process responses are noisy (wafer-to-wafer variation, measurement uncertainty)
- The parameter space is high-dimensional (10-50+ variables: power, pressure, gas flows, temperature, time)
- The objective function has no analytical form — only experimental measurements exist

Bayesian Optimization was developed precisely for this setting: find the global optimum of an expensive, noisy, black-box function in as few evaluations as possible.

Algorithm Structure

Bayesian Optimization iterates three steps:

Step 1 — Surrogate model fitting: A Gaussian Process (GP) is fit to all previously observed (parameter, response) pairs. The GP provides both a mean prediction μ(x) and uncertainty estimate σ(x) at every point in parameter space.

Step 2 — Acquisition function optimization: An acquisition function α(x) is maximized over the parameter space to select the next experiment. This is a cheap optimization (no physical experiments required) that determines where to explore next.

Step 3 — Experiment and update: Run the physical experiment at the selected parameters, observe the response, add to the dataset, return to Step 1.

Acquisition Functions: Balancing Exploration vs Exploitation

| Acquisition Function | Formula | Behavior |
|---------------------|---------|---------|
| Expected Improvement (EI) | E[max(f(x) - f_best, 0)] | Conservative, focuses near known optima |
| Upper Confidence Bound (UCB) | μ(x) + κ·σ(x) | κ controls exploration-exploitation trade-off |
| Probability of Improvement (PI) | P(f(x) > f_best + ξ) | Risk-averse, misses global optima |
| Thompson Sampling | Sample from posterior, maximize | Good parallelism for batch experiments |

EI and UCB are most commonly used in semiconductor applications. κ in UCB is the key hyperparameter — large κ explores uncertain regions, small κ exploits known good areas.

Gaussian Process Surrogate Model

The GP models the process response as a random function with prior covariance structure defined by a kernel:
- Matérn 5/2 kernel: Standard choice for smooth but not infinitely differentiable responses
- RBF (squared exponential): Assumes very smooth responses — often oversmooths semiconductor data
- Automatic Relevance Determination (ARD): Separate length scale per input dimension, automatically identifies influential parameters

The GP posterior provides uncertainty calibration crucial for acquisition functions — regions with sparse data have high σ(x), attracting exploration.

Multi-Objective Extensions

Real semiconductor process optimization involves trade-offs:
- Etch rate vs. selectivity vs. profile angle
- Deposition rate vs. film stress vs. step coverage
- Throughput vs. particle contamination

Multi-objective Bayesian Optimization (e.g., EHVI — Expected Hypervolume Improvement) simultaneously optimizes Pareto fronts, identifying the trade-off curves between competing objectives without requiring the engineer to pre-specify weights.

Semiconductor Applications

- Etch recipe optimization: RF power vs. pressure vs. gas ratio for target CD, profile, and selectivity
- CVD process development: Temperature, pressure, precursor ratio for target deposition rate and film properties
- CMP recipe tuning: Pressure, velocity, slurry flow rate for planarization rate and WIWNU (within-wafer non-uniformity)
- Lithography dose/focus optimization: Scanner parameters for maximizing process window

Industrial implementation typically reduces recipe development time from weeks to days, with Bayesian Optimization requiring 20-50 experiments to achieve what classical DoE requires 100-500 experiments for equivalent parameter space coverage.

Want to learn more?