Semiconductor Manufacturing Process Recipe Optimization: Mathematical Modeling
1. Problem Context
A semiconductor recipe is a vector of controllable parameters:
$$ \mathbf{x} = \begin{bmatrix} T \\ P \\ Q_1 \\ Q_2 \\ \vdots \\ t \\ P_{\text{RF}} \end{bmatrix} \in \mathbb{R}^n $$
Where:
- $T$ = Temperature (Β°C or K)
- $P$ = Pressure (mTorr or Pa)
- $Q_i$ = Gas flow rates (sccm)
- $t$ = Process time (seconds)
- $P_{\text{RF}}$ = RF power (Watts)
Goal: Find optimal $\mathbf{x}$ such that output properties $\mathbf{y}$ meet specifications while accounting for variability.
2. Mathematical Modeling Approaches
2.1 Physics-Based (First-Principles) Models
Chemical Vapor Deposition (CVD) Example
Mass transport and reaction equation:
$$ \frac{\partial C}{\partial t} + abla \cdot (\mathbf{u}C) = D abla^2 C + R(C, T) $$
Where:
- $C$ = Species concentration
- $\mathbf{u}$ = Velocity field
- $D$ = Diffusion coefficient
- $R(C, T)$ = Reaction rate
Surface reaction kinetics (Arrhenius form):
$$ k_s = A \exp\left(-\frac{E_a}{RT}\right) $$
Where:
- $A$ = Pre-exponential factor
- $E_a$ = Activation energy
- $R$ = Gas constant
- $T$ = Temperature
Deposition rate (transport-limited regime):
$$ r = \frac{k_s C_s}{1 + \frac{k_s}{h_g}} $$
Where:
- $C_s$ = Surface concentration
- $h_g$ = Gas-phase mass transfer coefficient
Characteristics:
- Advantages: Extrapolates outside training data, physically interpretable
- Disadvantages: Computationally expensive, requires detailed mechanism knowledge
2.2 Empirical/Statistical Models (Response Surface Methodology)
Second-order polynomial model:
$$ y = \beta_0 + \sum_{i=1}^{n}\beta_i x_i + \sum_{i=1}^{n}\beta_{ii}x_i^2 + \sum_{i Where: Matrix form: $$ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon} $$ Least squares estimator: $$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} $$ Design of Experiments (DOE) Options: 2.3 Gaussian Process (Kriging) Models GP prior: $$ y(\mathbf{x}) \sim \mathcal{GP}\big(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')\big) $$ Squared exponential (RBF) kernel: $$ k(\mathbf{x}, \mathbf{x}') = \sigma_f^2 \exp\left(-\frac{1}{2}\sum_{d=1}^{n}\frac{(x_d - x'_d)^2}{\ell_d^2}\right) $$ Where: MatΓ©rn 5/2 kernel (alternative): $$ k(\mathbf{x}, \mathbf{x}') = \sigma_f^2 \left(1 + \sqrt{5}r + \frac{5r^2}{3}\right) \exp\left(-\sqrt{5}r\right) $$ Where $r = \sqrt{\sum_d \frac{(x_d - x'_d)^2}{\ell_d^2}}$ Predictive mean: $$ \mu(\mathbf{x}_) = \mathbf{k}_^T(\mathbf{K} + \sigma_n^2\mathbf{I})^{-1}\mathbf{y} $$ Predictive variance: $$ \sigma^2(\mathbf{x}_) = k(\mathbf{x}_, \mathbf{x}_) - \mathbf{k}_^T(\mathbf{K} + \sigma_n^2\mathbf{I})^{-1}\mathbf{k}_* $$ Where: Key Advantage: Native uncertainty quantificationβcritical for expensive semiconductor experiments. 2.4 Neural Network Models Feedforward neural network: $$ \mathbf{y} = f_{\theta}(\mathbf{x}) = \sigma_L\left(\mathbf{W}_L \cdots \sigma_1(\mathbf{W}_1\mathbf{x} + \mathbf{b}_1) \cdots + \mathbf{b}_L\right) $$ Where: Physics-Informed Neural Networks (PINNs): $$ \mathcal{L} = \mathcal{L}_{\text{data}} + \lambda \mathcal{L}_{\text{physics}} $$ $$ \mathcal{L}_{\text{data}} = \frac{1}{N}\sum_{i=1}^{N}\|y_i - f_\theta(\mathbf{x}_i)\|^2 $$ $$ \mathcal{L}_{\text{physics}} = \frac{1}{M}\sum_{j=1}^{M}\left\|\mathcal{F}f_\theta\right\|^2 $$ Where $\mathcal{F}$ is the differential operator from governing equations. 3. Optimization Formulations 3.1 Standard Constrained Optimization General form: $$ \min_{\mathbf{x}} \quad f(\mathbf{x}) = \|\mathbf{y}(\mathbf{x}) - \mathbf{y}_{\text{target}}\|_2^2 $$ Subject to: $$ \begin{aligned} \mathbf{y}_L &\leq \mathbf{y}(\mathbf{x}) \leq \mathbf{y}_U \quad &\text{(specification limits)} \\ \mathbf{x}_L &\leq \mathbf{x} \leq \mathbf{x}_U \quad &\text{(equipment limits)} \\ \mathbf{g}(\mathbf{x}) &\leq 0 \quad &\text{(process constraints)} \end{aligned} $$ Weighted multi-output objective: $$ f(\mathbf{x}) = \sum_{j=1}^{m} w_j \left(\frac{y_j(\mathbf{x}) - y_{j,\text{target}}}{\Delta_j}\right)^2 $$ Where $\Delta_j$ is the tolerance for output $j$. 3.2 Multi-Objective Optimization Vector optimization problem: $$ \min_{\mathbf{x}} \quad \mathbf{F}(\mathbf{x}) = \begin{bmatrix} f_1(\mathbf{x}) \\ f_2(\mathbf{x}) \\ \vdots \\ f_k(\mathbf{x}) \end{bmatrix} $$ Example objectives: Pareto dominance: $\mathbf{x}^{(1)}$ dominates $\mathbf{x}^{(2)}$ if: $$ f_i(\mathbf{x}^{(1)}) \leq f_i(\mathbf{x}^{(2)}) \quad \forall i \quad \text{and} \quad \exists j: f_j(\mathbf{x}^{(1)}) < f_j(\mathbf{x}^{(2)}) $$ Solution methods: 3.3 Robust Optimization Sources of uncertainty $\boldsymbol{\xi}$: Mean-variance formulation: $$ \min_{\mathbf{x}} \quad \mathbb{E}[f(\mathbf{x}, \boldsymbol{\xi})] + \kappa \cdot \sqrt{\text{Var}[f(\mathbf{x}, \boldsymbol{\xi})]} $$ Where $\kappa$ is the risk aversion parameter. Uncertainty propagation (first-order Taylor): $$ \mathbb{E}[y] \approx y(\bar{\mathbf{x}}) $$ $$ \text{Var}[y] \approx \sum_{i=1}^{n}\left(\frac{\partial y}{\partial x_i}\right)^2 \sigma_{x_i}^2 + 2\sum_{i Sensitivity coefficient: $$ S_i = \left|\frac{\partial y}{\partial x_i}\right| \cdot \frac{\sigma_{x_i}}{\sigma_y} $$ Chance-constrained optimization: $$ \Pr\big(y(\mathbf{x}, \boldsymbol{\xi}) \in [\text{LSL}, \text{USL}]\big) \geq 1 - \alpha $$ For Gaussian outputs: $$ \mu - z_{\alpha/2}\sigma \geq \text{LSL} \quad \text{and} \quad \mu + z_{\alpha/2}\sigma \leq \text{USL} $$ Process capability index: $$ C_{pk} = \min\left(\frac{\text{USL} - \mu}{3\sigma}, \frac{\mu - \text{LSL}}{3\sigma}\right) \geq 1.33 $$ 3.4 Bayesian Optimization Algorithm for expensive-to-evaluate functions: 1. Fit GP surrogate: $y(\mathbf{x}) \sim \mathcal{GP}(m, k)$ 2. Optimize acquisition function: $\mathbf{x}_{\text{next}} = \arg\max_{\mathbf{x}} \alpha(\mathbf{x})$ 3. Evaluate true function: $y_{\text{next}} = f(\mathbf{x}_{\text{next}}) + \varepsilon$ 4. Update dataset: $\mathcal{D} \leftarrow \mathcal{D} \cup \{(\mathbf{x}_{\text{next}}, y_{\text{next}})\}$ 5. Repeat until budget exhausted Acquisition functions: Expected Improvement (EI): $$ \text{EI}(\mathbf{x}) = \mathbb{E}\big[\max(y^* - y(\mathbf{x}), 0)\big] $$ Closed form (for GP): $$ \text{EI}(\mathbf{x}) = (y^* - \mu(\mathbf{x}))\Phi(z) + \sigma(\mathbf{x})\phi(z) $$ Where: $$ z = \frac{y^* - \mu(\mathbf{x})}{\sigma(\mathbf{x})} $$ Upper Confidence Bound (UCB): $$ \text{UCB}(\mathbf{x}) = \mu(\mathbf{x}) + \kappa \sigma(\mathbf{x}) $$ Where $\kappa$ balances exploration vs. exploitation. Probability of Improvement (PI): $$ \text{PI}(\mathbf{x}) = \Phi\left(\frac{y^* - \mu(\mathbf{x})}{\sigma(\mathbf{x})}\right) $$ 4. Run-to-Run (R2R) Control 4.1 EWMA Controller Exponentially Weighted Moving Average prediction: $$ \hat{y}_{k+1} = \lambda y_k + (1-\lambda)\hat{y}_k $$ Where $\lambda \in (0,1]$ is the smoothing factor. Recipe update: $$ x_{k+1} = x_k + G^{-1}(y_{\text{target}} - \hat{y}_{k+1}) $$ Where $G = \frac{\partial y}{\partial x}$ is the process gain matrix. 4.2 Model Predictive Control (MPC) Optimization problem: $$ \min_{\{u_k, \ldots, u_{k+N-1}\}} \sum_{j=0}^{N-1} \left[\|y_{k+j} - y_{\text{target}}\|_{\mathbf{Q}}^2 + \|\Delta u_{k+j}\|_{\mathbf{R}}^2\right] $$ Subject to: $$ \begin{aligned} \mathbf{x}_{k+j+1} &= \mathbf{A}\mathbf{x}_{k+j} + \mathbf{B}u_{k+j} \quad &\text{(state dynamics)} \\ y_{k+j} &= \mathbf{C}\mathbf{x}_{k+j} \quad &\text{(output equation)} \\ u_{\min} &\leq u_{k+j} \leq u_{\max} \quad &\text{(input constraints)} \end{aligned} $$ Where: 5. Practical Mathematical Challenges 6. Dimensionality Reduction 6.1 Principal Component Analysis (PCA) Objective: $$ \max_{\mathbf{w}} \quad \mathbf{w}^T\mathbf{S}\mathbf{w} \quad \text{s.t.} \quad \|\mathbf{w}\|_2 = 1 $$ Where $\mathbf{S}$ is the sample covariance matrix. Solution: Eigenvectors of $\mathbf{S}$ $$ \mathbf{S} = \mathbf{W}\boldsymbol{\Lambda}\mathbf{W}^T $$ Reduced representation: $$ \mathbf{z} = \mathbf{W}_k^T(\mathbf{x} - \bar{\mathbf{x}}) $$ Where $\mathbf{W}_k$ contains the top $k$ eigenvectors. 6.2 Partial Least Squares (PLS) Objective: Maximize covariance between $\mathbf{X}$ and $\mathbf{Y}$ $$ \max_{\mathbf{w}, \mathbf{c}} \quad \text{Cov}(\mathbf{Xw}, \mathbf{Yc}) \quad \text{s.t.} \quad \|\mathbf{w}\|=\|\mathbf{c}\|=1 $$ 7. Multi-Fidelity Optimization Combine cheap simulations with expensive experiments: Auto-regressive model (Kennedy-O'Hagan): $$ y_{\text{HF}}(\mathbf{x}) = \rho \cdot y_{\text{LF}}(\mathbf{x}) + \delta(\mathbf{x}) $$ Where: Multi-fidelity GP: $$ \begin{bmatrix} \mathbf{y}_{\text{LF}} \\ \mathbf{y}_{\text{HF}} \end{bmatrix} \sim \mathcal{N}\left(\mathbf{0}, \begin{bmatrix} \mathbf{K}_{\text{LL}} & \rho\mathbf{K}_{\text{LH}} \\ \rho\mathbf{K}_{\text{HL}} & \rho^2\mathbf{K}_{\text{LL}} + \mathbf{K}_{\delta} \end{bmatrix}\right) $$ 8. Transfer Learning Domain adaptation for tool-to-tool transfer: $$ y_{\text{target}}(\mathbf{x}) = y_{\text{source}}(\mathbf{x}) + \Delta(\mathbf{x}) $$ Offset model (simple): $$ \Delta(\mathbf{x}) = c_0 \quad \text{(constant offset)} $$ Linear adaptation: $$ \Delta(\mathbf{x}) = \mathbf{c}^T\mathbf{x} + c_0 $$ GP adaptation: $$ \Delta(\mathbf{x}) \sim \mathcal{GP}(0, k_\Delta) $$ 9. Complete Optimization Framework 10. Key Equations Summary Process Modeling Optimization Run-to-Run Control 11. References and Further Reading Textbooks Semiconductor-Specific Bayesian Optimization Notation Reference From EUV lithography to CUDA optimization β search the full knowledge base or chat with our AI assistant.Challenge Mathematical Approach High dimensionality ($n > 50$ parameters) PCA, PLS, sparse regression (LASSO), feature selection Small datasets (limited wafer runs) Bayesian methods, transfer learning, multi-fidelity modeling Nonlinearity GPs, neural networks, tree ensembles (RF, XGBoost) Equipment-to-equipment variation Mixed-effects models, hierarchical Bayesian models Drift over time Adaptive/recursive estimation, change-point detection, Kalman filtering Multiple correlated responses Multi-task learning, co-kriging, multivariate GP Missing data EM algorithm, multiple imputation, probabilistic PCA ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RECIPE OPTIMIZATION FRAMEWORK β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β RECIPE PARAMETERS PROCESS MODEL β
β βββββββββββββββββ βββββββββββββ β
β xβ: Temperature (Β°C) ββββΊ βββββββββββββββββ β
β xβ: Pressure (mTorr) ββββΊ β β β
β xβ: Gas flow 1 (sccm) ββββΊ β y = f(x;ΞΈ) β ββββΊ yβ: Thickness (nm) β
β xβ: Gas flow 2 (sccm) ββββΊ β β ββββΊ yβ: Uniformity (%) β
β xβ
: RF power (W) ββββΊ β + Ξ΅ β ββββΊ yβ: CD (nm) β
β xβ: Time (s) ββββΊ βββββββββββββββββ ββββΊ yβ: Defects (#/cmΒ²) β
β β² β
β β β
β Uncertainty ΞΎ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β OPTIMIZATION PROBLEM: β
β β
β min Ξ£β±Ό wβ±Ό(E[yβ±Ό] - yβ±Ό,target)Β² + λ·Var[y] β
β x β
β β
β subject to: β
β y_L β€ E[y] β€ y_U (specification limits) β
β Pr(y β spec) β₯ 0.9973 (Cpk β₯ 1.0) β
β x_L β€ x β€ x_U (equipment limits) β
β g(x) β€ 0 (process constraints) β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Model Type Equation Linear regression $y = \mathbf{X}\boldsymbol{\beta} + \varepsilon$ Quadratic RSM $y = \beta_0 + \sum_i \beta_i x_i + \sum_i \beta_{ii}x_i^2 + \sum_{i Gaussian Process $y(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x},\mathbf{x}'))$ Neural Network $y = \sigma_L(\mathbf{W}_L\cdots\sigma_1(\mathbf{W}_1\mathbf{x}+\mathbf{b}_1)+\mathbf{b}_L)$ Formulation Equation Least squares $\min_\mathbf{x} \ \mathbf{y}(\mathbf{x}) - \mathbf{y}_{\text{target}}\ _2^2$ Robust $\min_\mathbf{x} \mathbb{E}[f] + \kappa\sqrt{\text{Var}[f]}$ Chance-constrained $\Pr(y \in \text{spec}) \geq 1-\alpha$ Expected Improvement $\text{EI}(\mathbf{x}) = (y^*-\mu)\Phi(z) + \sigma\phi(z)$ Controller Equation EWMA $\hat{y}_{k+1} = \lambda y_k + (1-\lambda)\hat{y}_k$ Recipe update $x_{k+1} = x_k + G^{-1}(y_{\text{target}} - \hat{y}_{k+1})$ Symbol Description $\mathbf{x}$ Recipe parameter vector $\mathbf{y}$ Process output vector $n$ Number of input parameters $m$ Number of outputs $\boldsymbol{\beta}$ Regression coefficients $\mathcal{GP}$ Gaussian Process $k(\cdot,\cdot)$ Kernel/covariance function $\mu(\mathbf{x})$ Predictive mean $\sigma^2(\mathbf{x})$ Predictive variance $\mathbb{E}[\cdot]$ Expectation operator $\text{Var}[\cdot]$ Variance operator $\Phi(\cdot)$ Standard normal CDF $\phi(\cdot)$ Standard normal PDF $\boldsymbol{\xi}$ Random disturbance/uncertainty $G$ Process gain matrix LSL, USL Lower/Upper Specification Limits $C_{pk}$ Process capability index Explore 500+ Semiconductor & AI Topics