← Back to AI Factory Chat

AI Factory Glossary

923 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 13 of 19 (923 entries)

precision,metrology

Repeatability of measurements.

predictive metrology, metrology

Forecast future metrology results.

pressure sensor packaging, packaging

Special considerations for pressure sensors.

process monitor structures, metrology

Test structures tracking process.

process monitoring, semiconductor process control, spc, statistical process control, sensor data, fault detection, run-to-run control, process optimization

# Semiconductor Manufacturing Process Parameters Monitoring: Mathematical Modeling ## 1. The Fundamental Challenge Modern semiconductor fabrication involves 500–1000+ sequential process steps, each with dozens of parameters requiring nanometer-scale precision. ### Key Process Types and Parameters - **Lithography**: exposure dose, focus, overlay alignment, resist thickness - **Etching (dry/wet)**: etch rate, selectivity, uniformity, plasma parameters (power, pressure, gas flows) - **Deposition (CVD, PVD, ALD)**: deposition rate, film thickness, uniformity, stress, composition - **CMP (Chemical Mechanical Polishing)**: removal rate, within-wafer non-uniformity, dishing, erosion - **Implantation**: dose, energy, angle, uniformity - **Thermal processes**: temperature uniformity, ramp rates, time ## 2. Statistical Process Control (SPC) — The Foundation ### 2.1 Univariate Control Charts For a process parameter $X$ with samples $x_1, x_2, \ldots, x_n$: **Sample Mean:** $$ \bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i $$ **Sample Standard Deviation:** $$ \sigma = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2} $$ **Control Limits (3-sigma):** $$ \text{UCL} = \bar{x} + 3\sigma $$ $$ \text{LCL} = \bar{x} - 3\sigma $$ ### 2.2 Process Capability Indices These quantify how well a process meets specifications: - **$C_p$ (Potential Capability):** $$ C_p = \frac{USL - LSL}{6\sigma} $$ - **$C_{pk}$ (Actual Capability)** — accounts for centering: $$ C_{pk} = \min\left[\frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma}\right] $$ - **$C_{pm}$ (Taguchi Index)** — penalizes deviation from target $T$: $$ C_{pm} = \frac{C_p}{\sqrt{1 + \left(\frac{\mu - T}{\sigma}\right)^2}} $$ Semiconductor fabs typically require $C_{pk} \geq 1.67$, corresponding to defect rates below ~1 ppm. ## 3. Multivariate Statistical Monitoring Since process parameters are highly correlated, univariate methods miss interaction effects. ### 3.1 Principal Component Analysis (PCA) Given data matrix $\mathbf{X}$ ($n$ samples × $p$ variables), centered: 1. **Compute covariance matrix:** $$ \mathbf{S} = \frac{1}{n-1}\mathbf{X}^T\mathbf{X} $$ 2. **Eigendecomposition:** $$ \mathbf{S} = \mathbf{V}\mathbf{\Lambda}\mathbf{V}^T $$ 3. **Project to principal components:** $$ \mathbf{T} = \mathbf{X}\mathbf{V} $$ ### 3.2 Monitoring Statistics #### Hotelling's $T^2$ Statistic Captures variation **within** the PCA model: $$ T^2 = \sum_{i=1}^{k} \frac{t_i^2}{\lambda_i} $$ where $k$ is the number of retained components. Under normal operation, $T^2$ follows a scaled F-distribution. #### Q-Statistic (Squared Prediction Error) Captures variation **outside** the model: $$ Q = \sum_{j=1}^{p}(x_j - \hat{x}_j)^2 = \|\mathbf{x} - \mathbf{x}\mathbf{V}_k\mathbf{V}_k^T\|^2 $$ > Often more sensitive to novel faults than $T^2$. ### 3.3 Partial Least Squares (PLS) When relating process inputs $\mathbf{X}$ to quality outputs $\mathbf{Y}$: $$ \mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{E} $$ PLS finds latent variables that maximize covariance between $\mathbf{X}$ and $\mathbf{Y}$, providing both monitoring capability and a predictive model. ## 4. Virtual Metrology (VM) Models Virtual metrology predicts physical measurement outcomes from process sensor data, enabling 100% wafer coverage without costly measurements. ### 4.1 Linear Models For process parameters $\mathbf{x} \in \mathbb{R}^p$ and metrology target $y$: - **Ordinary Least Squares (OLS):** $$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} $$ - **Ridge Regression** ($L_2$ regularization for collinearity): $$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^T\mathbf{y} $$ - **LASSO** ($L_1$ regularization for sparsity/feature selection): $$ \min_{\boldsymbol{\beta}} \|\mathbf{y} - \mathbf{X}\boldsymbol{\beta}\|^2 + \lambda\|\boldsymbol{\beta}\|_1 $$ ### 4.2 Nonlinear Models #### Gaussian Process Regression (GPR) $$ y \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')) $$ **Posterior predictive distribution:** - **Mean:** $$ \mu_* = \mathbf{K}_*^T(\mathbf{K} + \sigma_n^2\mathbf{I})^{-1}\mathbf{y} $$ - **Variance:** $$ \sigma_*^2 = K_{**} - \mathbf{K}_*^T(\mathbf{K} + \sigma_n^2\mathbf{I})^{-1}\mathbf{K}_* $$ GPs provide uncertainty quantification — critical for knowing when to trigger actual metrology. #### Support Vector Regression (SVR) $$ \min \frac{1}{2}\|\mathbf{w}\|^2 + C\sum_i(\xi_i + \xi_i^*) $$ Subject to $\epsilon$-insensitive tube constraints. Kernel trick enables nonlinear modeling. #### Neural Networks - **MLPs**: Multi-layer perceptrons for general function approximation - **CNNs**: Convolutional neural networks for wafer map pattern recognition - **LSTMs**: Long Short-Term Memory networks for time-series FDC traces ## 5. Run-to-Run (R2R) Control R2R control adjusts recipe setpoints between wafers/lots to compensate for drift and disturbances. ### 5.1 EWMA Controller For a process with model $y = a_0 + a_1 u + \epsilon$: **Prediction update:** $$ \hat{y}_{k+1} = \lambda y_k + (1-\lambda)\hat{y}_k $$ **Control action:** $$ u_{k+1} = \frac{T - \hat{y}_{k+1} + a_0}{a_1} $$ where: - $T$ is the target - $\lambda \in (0,1)$ is the smoothing weight ### 5.2 Double EWMA (for Linear Drift) When process drifts linearly: $$ \hat{y}_{k+1} = a_k + b_k $$ $$ a_k = \lambda y_k + (1-\lambda)(a_{k-1} + b_{k-1}) $$ $$ b_k = \gamma(a_k - a_{k-1}) + (1-\gamma)b_{k-1} $$ ### 5.3 State-Space Formulation More general framework: **State equation:** $$ \mathbf{x}_{k+1} = \mathbf{A}\mathbf{x}_k + \mathbf{B}\mathbf{u}_k + \mathbf{w}_k $$ **Observation equation:** $$ \mathbf{y}_k = \mathbf{C}\mathbf{x}_k + \mathbf{D}\mathbf{u}_k + \mathbf{v}_k $$ Use **Kalman filtering** for state estimation and **LQR/MPC** for optimal control. ### 5.4 Model Predictive Control (MPC) **Objective function:** $$ \min \sum_{i=1}^{N} \|\mathbf{y}_{k+i} - \mathbf{r}_{k+i}\|_\mathbf{Q}^2 + \sum_{j=0}^{N-1}\|\Delta\mathbf{u}_{k+j}\|_\mathbf{R}^2 $$ subject to process model and operational constraints. > MPC handles multivariable systems with constraints naturally. ## 6. Fault Detection and Classification (FDC) ### 6.1 Detection Methods #### Mahalanobis Distance $$ D^2 = (\mathbf{x} - \boldsymbol{\mu})^T\mathbf{S}^{-1}(\mathbf{x} - \boldsymbol{\mu}) $$ Follows $\chi^2$ distribution under multivariate normality. #### Other Detection Methods - **One-Class SVM**: Learn boundary of normal operation - **Autoencoders**: Detect anomalies via reconstruction error ### 6.2 Classification Features For trace data (time-series from sensors), extract features: - **Statistical moments**: mean, variance, skewness, kurtosis - **Frequency domain**: FFT coefficients, spectral power - **Wavelet coefficients**: Multi-resolution analysis - **DTW distances**: Dynamic Time Warping to reference signatures ### 6.3 Classification Algorithms - Support Vector Machines (SVM) - Random Forest - CNNs for pattern recognition on wafer maps - Gradient Boosting (XGBoost, LightGBM) ## 7. Spatial Modeling (Within-Wafer Variation) Systematic spatial patterns require explicit modeling. ### 7.1 Polynomial Basis Expansion #### Zernike Polynomials (common in lithography) $$ z(\rho, \theta) = \sum_{n,m} Z_n^m(\rho, \theta) $$ These form an orthogonal basis on the unit disk, capturing radial and azimuthal variation. ### 7.2 Gaussian Process Spatial Models $$ y(\mathbf{s}) \sim \mathcal{GP}(\mu(\mathbf{s}), k(\mathbf{s}, \mathbf{s}')) $$ #### Common Covariance Kernels - **Squared Exponential (RBF):** $$ k(\mathbf{s}, \mathbf{s}') = \sigma^2 \exp\left(-\frac{\|\mathbf{s} - \mathbf{s}'\|^2}{2\ell^2}\right) $$ - **Matérn** (more flexible smoothness): $$ k(r) = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)}\left(\frac{\sqrt{2\nu}r}{\ell}\right)^\nu K_\nu\left(\frac{\sqrt{2\nu}r}{\ell}\right) $$ where $K_\nu$ is the modified Bessel function of the second kind. ## 8. Dynamic/Time-Series Modeling For plasma processes, endpoint detection, and transient behavior. ### 8.1 Autoregressive Models **AR(p) model:** $$ x_t = \sum_{i=1}^{p} \phi_i x_{t-i} + \epsilon_t $$ ARIMA extends this to non-stationary series. ### 8.2 Dynamic PCA Augment data with time-lagged values: $$ \tilde{\mathbf{X}} = [\mathbf{X}(t), \mathbf{X}(t-1), \ldots, \mathbf{X}(t-l)] $$ Then apply standard PCA to capture temporal dynamics. ### 8.3 Deep Sequence Models #### LSTM Networks Gating mechanisms: - **Forget gate:** $f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$ - **Input gate:** $i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$ - **Output gate:** $o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$ **Cell state update:** $$ c_t = f_t \odot c_{t-1} + i_t \odot \tilde{c}_t $$ **Hidden state:** $$ h_t = o_t \odot \tanh(c_t) $$ ## 9. Model Maintenance and Adaptation Semiconductor processes drift — models must adapt. ### 9.1 Drift Detection Methods #### CUSUM (Cumulative Sum) $$ S_k = \max(0, S_{k-1} + (x_k - \mu_0) - k) $$ Signal when $S_k$ exceeds threshold. #### Page-Hinkley Test $$ m_k = \sum_{i=1}^{k}(x_i - \bar{x}_k - \delta) $$ $$ M_k = \max_{i \leq k} m_i $$ Alarm when $M_k - m_k > \lambda$. #### ADWIN (Adaptive Windowing) Automatically detects distribution changes and adjusts window size. ### 9.2 Online Model Updating #### Recursive Least Squares (RLS) $$ \hat{\boldsymbol{\beta}}_k = \hat{\boldsymbol{\beta}}_{k-1} + \mathbf{K}_k(y_k - \mathbf{x}_k^T\hat{\boldsymbol{\beta}}_{k-1}) $$ where $\mathbf{K}_k$ is the gain matrix updated via the Riccati equation: $$ \mathbf{K}_k = \frac{\mathbf{P}_{k-1}\mathbf{x}_k}{\lambda + \mathbf{x}_k^T\mathbf{P}_{k-1}\mathbf{x}_k} $$ $$ \mathbf{P}_k = \frac{1}{\lambda}(\mathbf{P}_{k-1} - \mathbf{K}_k\mathbf{x}_k^T\mathbf{P}_{k-1}) $$ #### Just-in-Time (JIT) Learning Build local models around each new prediction point using nearest historical samples. ## 10. Integrated Framework A complete monitoring system layers these methods: | Layer | Methods | Purpose | |-------|---------|---------| | **Preprocessing** | Cleaning, synchronization, normalization | Data quality | | **Feature Engineering** | Domain features, wavelets, PCA | Dimensionality management | | **Monitoring** | $T^2$, Q-statistic, control charts | Detect out-of-control states | | **Virtual Metrology** | PLS, GPR, neural networks | Predict quality without measurement | | **FDC** | Classification models | Diagnose fault root causes | | **Control** | R2R, MPC | Compensate for drift/disturbances | | **Adaptation** | Online learning, drift detection | Maintain model validity | ## 11. Key Mathematical Challenges 1. **High dimensionality** — hundreds of sensors, requiring regularization and dimension reduction 2. **Collinearity** — process variables are physically coupled 3. **Non-stationarity** — drift, maintenance events, recipe changes 4. **Small sample sizes** — new recipes have limited historical data (transfer learning, Bayesian methods help) 5. **Real-time constraints** — decisions needed in seconds 6. **Rare events** — faults are infrequent, creating class imbalance ## 12. Key Equations ### Process Capability $$ C_{pk} = \min\left[\frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma}\right] $$ ### Multivariate Monitoring $$ T^2 = \sum_{i=1}^{k} \frac{t_i^2}{\lambda_i}, \quad Q = \|\mathbf{x} - \hat{\mathbf{x}}\|^2 $$ ### Virtual Metrology (Ridge Regression) $$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^T\mathbf{y} $$ ### EWMA Control $$ \hat{y}_{k+1} = \lambda y_k + (1-\lambda)\hat{y}_k $$ ### Mahalanobis Distance $$ D^2 = (\mathbf{x} - \boldsymbol{\mu})^T\mathbf{S}^{-1}(\mathbf{x} - \boldsymbol{\mu}) $$

process window analysis, lithography

Determine usable focus-dose range.

process window qualification, pwq, lithography

Verify adequate process window.

process window,exposure-defocus,bossung,depth of focus,dof,exposure latitude,cpk,lithography window,semiconductor process window

# Process Window 1. Fundamental A process window is the region in parameter space where a manufacturing step yields acceptable results. Mathematically, for a response function $y(\mathbf{x})$ depending on parameter vector $\mathbf{x} = (x_1, x_2, \ldots, x_n)$: $$ \text{Process Window} = \{\mathbf{x} : y_{\min} \leq y(\mathbf{x}) \leq y_{\max}\} $$ 2. Single-Parameter Statistics For a single parameter with lower and upper specification limits (LSL, USL): Process Capability Indices - $C_p$ (Process Capability): Measures window width relative to process variation $$ C_p = \frac{USL - LSL}{6\sigma} $$ - $C_{pk}$ (Process Capability Index): Accounts for process centering $$ C_{pk} = \min\left[\frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma}\right] $$ Industry Standards - $C_p \geq 1.0$: Process variation fits within specifications - $C_{pk} \geq 1.33$: 4σ capability (standard requirement) - $C_{pk} \geq 1.67$: 5σ capability (high-reliability applications) - $C_{pk} \geq 2.0$: 6σ capability (Six Sigma standard) 3. Lithography: Exposure-Defocus (E-D) Window The most critical and mathematically developed process window in semiconductor manufacturing. 3.1 Bossung Curve Model Critical dimension (CD) as a function of exposure dose $E$ and defocus $F$: $$ CD(E, F) = CD_0 + a_1 E + a_2 F + a_{11} E^2 + a_{22} F^2 + a_{12} EF + \ldots $$ The process window boundary is defined by: $$ |CD(E, F) - CD_{\text{target}}| = \Delta CD_{\text{tolerance}} $$ 3.2 Key Metrics - Exposure Latitude (EL): Percentage dose range for acceptable CD $$ EL = \frac{E_{\max} - E_{\min}}{E_{\text{nominal}}} \times 100\% $$ - Depth of Focus (DOF): Focus range for acceptable CD (at given EL) $$ DOF = F_{\max} - F_{\min} $$ - Process Window Area: Total acceptable region $$ A_{PW} = \iint_{\text{acceptable}} dE \, dF $$ 3.3 Rayleigh Equations Resolution and DOF scale with wavelength $\lambda$ and numerical aperture $NA$: - Resolution (minimum feature size): $$ R = k_1 \frac{\lambda}{NA} $$ - Depth of Focus: $$ DOF = \pm k_2 \frac{\lambda}{NA^2} $$ Critical insight: As $k_1$ decreases (smaller features), DOF shrinks as $(k_1)^2$ — process windows collapse rapidly at advanced nodes. | Technology Node | $k_1$ Factor | Relative DOF | | --| --| --| | 180nm | 0.6 | 1.0 | | 65nm | 0.4 | 0.44 | | 14nm | 0.3 | 0.25 | | 5nm (EUV) | 0.25 | 0.17 | 4. Image Quality Metrics 4.1 Normalized Image Log-Slope (NILS) $$ NILS = w \cdot \frac{1}{I} \left|\frac{dI}{dx}\right|_{\text{edge}} $$ Where: - $w$ = feature width - $I$ = aerial image intensity - $\frac{dI}{dx}$ = intensity gradient at feature edge For a coherent imaging system with partial coherence $\sigma$: $$ NILS \approx \pi \cdot \frac{w}{\lambda/NA} \cdot \text{(contrast factor)} $$ Interpretation: - Higher NILS → larger process window - NILS > 2.0: Robust process - NILS < 1.5: Marginal process window - NILS < 1.0: Near resolution limit 4.2 Mask Error Enhancement Factor (MEEF) $$ MEEF = \frac{\partial CD_{\text{wafer}}}{\partial CD_{\text{mask}}} $$ Characteristics: - MEEF = 1: Ideal (1:1 transfer from mask to wafer) - MEEF > 1: Mask errors are amplified on wafer - Near resolution limit: MEEF typically 3–4 or higher - Impacts effective process window: mask CD tolerance = wafer CD tolerance / MEEF 5. Multi-Parameter Process Windows 5.1 Ellipsoid Model For $n$ interacting parameters, the window is often an $n$-dimensional ellipsoid: $$ (\mathbf{x} - \mathbf{x}_0)^T \mathbf{A} (\mathbf{x} - \mathbf{x}_0) \leq 1 $$ Where: - $\mathbf{x}$ = parameter vector $(x_1, x_2, \ldots, x_n)$ - $\mathbf{x}_0$ = optimal operating point (center of ellipsoid) - $\mathbf{A}$ = positive definite matrix encoding parameter correlations Geometric interpretation: - Eigenvalues of $\mathbf{A}$: $\lambda_1, \lambda_2, \ldots, \lambda_n$ - Principal axes lengths: $a_i = 1/\sqrt{\lambda_i}$ - Eigenvectors: orientation of principal axes 5.2 Overlapping Windows Real processes require multiple steps to simultaneously work: $$ PW_{\text{total}} = \bigcap_{i=1}^{N} PW_i $$ Example: Combined lithography + etch window $$ PW_{\text{combined}} = PW_{\text{litho}}(E, F) \cap PW_{\text{etch}}(P, W, T) $$ If individual windows are ellipsoids, their intersection is a more complex polytope — often computed numerically via: - Linear programming - Convex hull algorithms - Monte Carlo sampling 6. Response Surface Methodology (RSM) 6.1 Quadratic Model $$ y = \beta_0 + \sum_{i=1}^{n} \beta_i x_i + \sum_{i=1}^{n} \beta_{ii} x_i^2 + \sum_{i 3–5 (typical) - Selectivity > 10 (high aspect ratio features) - Selectivity > 50 (critical etch stop layers) 13. CMP Process Windows 13.1 Preston Equation $$ RR = K_p \cdot P \cdot V $$ Where: - $RR$ = removal rate (nm/min or Å/min) - $K_p$ = Preston coefficient (material/consumable dependent) - $P$ = applied pressure (psi or kPa) - $V$ = relative velocity (m/s) 13.2 Within-Wafer Non-Uniformity (WIWNU) $$ WIWNU = \frac{\sigma_{RR}}{\mu_{RR}} \times 100\% $$ Target: WIWNU < 3–5% 13.3 Dishing and Erosion - Dishing: Excess removal at center of wide features $$ \text{Dishing} = t_{\text{initial}} - t_{\text{center}} $$ - Erosion: Thinning of dielectric between metal lines $$ \text{Erosion} = t_{\text{field}} - t_{\text{local}} $$ 14. Key Equations Summary Table | Metric | Formula | Significance | | --| | --| | Resolution | $R = k_1 \frac{\lambda}{NA}$ | Minimum feature size | | Depth of Focus | $DOF = \pm k_2 \frac{\lambda}{NA^2}$ | Focus tolerance | | NILS | $NILS = \frac{w}{I} \left\|\frac{dI}{dx}\right\|$ | Image contrast at edge | | MEEF | $MEEF = \frac{\partial CD_w}{\partial CD_m}$ | Mask error amplification | | Process Capability | $C_{pk} = \frac{\min(USL-\mu, \mu-LSL)}{3\sigma}$ | Process capability | | Exposure Latitude | $EL = \frac{E_{max} - E_{min}}{E_{nom}} \times 100\%$ | Dose tolerance | | Stochastic LER | $LER \propto \frac{1}{\sqrt{Dose}}$ | Shot noise floor | | Yield (Poisson) | $Y = e^{-DA}$ | Defect-limited yield | | Preston Equation | $RR = K_p P V$ | CMP removal rate | 15. Modern Computational Approaches 15.1 Monte Carlo Simulation Algorithm: Monte Carlo Yield Estimation 1. Define parameter distributions: x_i ~ N(μ_i, σ_i²) 2. For trial = 1 to N_trials: a. Sample x from joint distribution b. Evaluate y(x) for all responses c. Check if y ∈ [y_min, y_max] for all responses d. Record pass/fail 3. Yield = N_pass / N_trials 4. Confidence interval: Y ± z_α √(Y(1-Y)/N) 15.2 Machine Learning Classification - Support Vector Machine (SVM): Decision boundary defines process window - Neural Networks: Complex, non-convex window shapes - Random Forest: Ensemble method for robustness - Gaussian Process: Probabilistic boundaries with uncertainty 15.3 Digital Twin Approach $$ \hat{y}_{t+1} = f(y_t, \mathbf{x}_t, \boldsymbol{\theta}) $$ Where: - $\hat{y}_{t+1}$ = predicted next-step output - $y_t$ = current measured output - $\mathbf{x}_t$ = current process parameters - $\boldsymbol{\theta}$ = model parameters (updated via Bayesian inference) 16. Advanced Node Challenges 16.1 Process Window Shrinkage At advanced nodes (sub-7nm), multiple factors compound: $$ PW_{\text{effective}} = PW_{\text{optical}} \cap PW_{\text{stochastic}} \cap PW_{\text{overlay}} \cap PW_{\text{etch}} $$ 16.2 Multi-Patterning Complexity For N-patterning (e.g., SAQP with N=4): $$ \sigma_{\text{total}}^2 = \sum_{i=1}^{N} \sigma_{\text{step}_i}^2 $$ Error budget per step: $$ \sigma_{\text{step}} = \frac{\sigma_{\text{target}}}{\sqrt{N}} $$ 16.3 Design-Technology Co-Optimization (DTCO) $$ \text{Objective: } \max_{\text{design}, \text{process}} \left[ \text{Performance} \times Y(\text{design}, \text{process}) \right] $$ Subject to: - Design rules: $DR_i(\text{layout}) \geq 0$ - Process windows: $\mathbf{x} \in PW$ - Reliability: $MTTF \geq \text{target}$

product representative structures, metrology

Tests matching actual devices.

profilometry,metrology

Measure surface height profile mechanically or optically.

ptychography, metrology

Phase retrieval technique for imaging.

pvd,physical vapor deposition,what is pvd,sputtering,magnetron sputtering,ipvd,ionized pvd,evaporation

# Mathematical Modeling of Metal Deposition in Semiconductor Manufacturing 1. Overview: Metal Deposition Processes Metal deposition is a critical step in semiconductor fabrication, creating interconnects, contacts, barrier layers, and various metallic structures. The primary deposition methods require distinct mathematical treatments: | Process | Physics Domain | Key Mathematics | |---------|----------------|-----------------| | **PVD (Sputtering)** | Ballistic transport, plasma physics | Boltzmann transport, Monte Carlo | | **CVD/PECVD** | Gas-phase transport, surface reactions | Navier-Stokes, reaction-diffusion | | **ALD** | Self-limiting surface chemistry | Site-balance kinetics | | **Electroplating (ECD)** | Electrochemistry, mass transport | Butler-Volmer, Nernst-Planck | 2. Transport Phenomena Models 2.1 Gas-Phase Transport (CVD/PECVD) The precursor concentration field follows the **convection-diffusion-reaction equation**: $$ \frac{\partial C}{\partial t} + \mathbf{v} \cdot \nabla C = D \nabla^2 C + R_{gas} $$ Where: - $C$ — precursor concentration (mol/m³) - $\mathbf{v}$ — velocity field vector (m/s) - $D$ — diffusion coefficient (m²/s) - $R_{gas}$ — gas-phase reaction source term (mol/m³·s) 2.2 Flow Field Equations The **incompressible Navier-Stokes equations** govern the velocity field: $$ \rho \left( \frac{\partial \mathbf{v}}{\partial t} + \mathbf{v} \cdot \nabla \mathbf{v} \right) = -\nabla p + \mu \nabla^2 \mathbf{v} $$ With continuity equation: $$ \nabla \cdot \mathbf{v} = 0 $$ Where: - $\rho$ — gas density (kg/m³) - $p$ — pressure (Pa) - $\mu$ — dynamic viscosity (Pa·s) ### 2.3 Knudsen Number and Transport Regimes At low pressures, the **Knudsen number** determines the transport regime: $$ Kn = \frac{\lambda}{L} = \frac{k_B T}{\sqrt{2} \pi d^2 p L} $$ Where: - $\lambda$ — mean free path (m) - $L$ — characteristic length (m) - $k_B$ — Boltzmann constant ($1.38 \times 10^{-23}$ J/K) - $T$ — temperature (K) - $d$ — molecular diameter (m) - $p$ — pressure (Pa) **Transport regime classification:** - $Kn < 0.01$ — **Continuum regime** → Navier-Stokes CFD - $0.01 < Kn < 0.1$ — **Slip flow regime** → Modified NS with slip boundary conditions - $0.1 < Kn < 10$ — **Transitional regime** → DSMC, Boltzmann equation - $Kn > 10$ — **Free molecular regime** → Ballistic/Monte Carlo methods 3. Surface Reaction Kinetics 3.1 Langmuir-Hinshelwood Mechanism For bimolecular surface reactions (common in CVD): $$ r = \frac{k \cdot K_A K_B \cdot p_A p_B}{(1 + K_A p_A + K_B p_B)^2} $$ Where: - $r$ — reaction rate (mol/m²·s) - $k$ — surface reaction rate constant (mol/m²·s) - $K_A, K_B$ — adsorption equilibrium constants (Pa⁻¹) - $p_A, p_B$ — partial pressures of reactants A and B (Pa) 3.2 Sticking Coefficient Model The probability that an impinging molecule adsorbs on the surface: $$ S = S_0 \exp\left( -\frac{E_a}{k_B T} \right) \cdot f(\theta) $$ Where: - $S$ — sticking coefficient (dimensionless) - $S_0$ — pre-exponential sticking factor - $E_a$ — activation energy (J) - $f(\theta) = (1 - \theta)^n$ — site blocking function - $\theta$ — surface coverage (dimensionless, 0 to 1) - $n$ — order of site blocking 3.3 Arrhenius Temperature Dependence $$ k(T) = A \exp\left( -\frac{E_a}{RT} \right) $$ Where: - $A$ — pre-exponential factor (frequency factor) - $E_a$ — activation energy (J/mol) - $R$ — universal gas constant (8.314 J/mol·K) - $T$ — absolute temperature (K) 4. Film Growth Models 4.1 Continuum Surface Evolution Edwards-Wilkinson Equation (Linear Growth) $$ \frac{\partial h}{\partial t} = \nu \nabla^2 h + F + \eta(\mathbf{x}, t) $$ Kardar-Parisi-Zhang (KPZ) Equation (Nonlinear Growth) $$ \frac{\partial h}{\partial t} = \nu \nabla^2 h + \frac{\lambda}{2} |\nabla h|^2 + F + \eta $$ Where: - $h(\mathbf{x}, t)$ — surface height at position $\mathbf{x}$ and time $t$ - $\nu$ — surface diffusion coefficient (m²/s) - $\lambda$ — nonlinear growth parameter - $F$ — mean deposition flux (m/s) - $\eta$ — stochastic noise term (Gaussian white noise) 4.2 Scaling Relations Surface roughness evolves according to: $$ W(L, t) = L^\alpha f\left( \frac{t}{L^z} \right) $$ Where: - $W$ — interface width (roughness) - $L$ — system size - $\alpha$ — roughness exponent - $z$ — dynamic exponent - $f$ — scaling function 5. Step Coverage and Conformality 5.1 Thiele Modulus For high-aspect-ratio features, the **Thiele modulus** determines conformality: $$ \phi = L \sqrt{\frac{k_s}{D_{eff}}} $$ Where: - $\phi$ — Thiele modulus (dimensionless) - $L$ — feature depth (m) - $k_s$ — surface reaction rate constant (m/s) - $D_{eff}$ — effective diffusivity (m²/s) **Step coverage regimes:** - $\phi \ll 1$ — **Reaction-limited** → Excellent conformality - $\phi \gg 1$ — **Transport-limited** → Poor step coverage (bread-loafing) 5.2 Knudsen Diffusion in Trenches $$ D_K = \frac{w}{3} \sqrt{\frac{8 R T}{\pi M}} $$ Where: - $D_K$ — Knudsen diffusion coefficient (m²/s) - $w$ — trench width (m) - $R$ — universal gas constant (J/mol·K) - $T$ — temperature (K) - $M$ — molecular weight (kg/mol) 5.3 Feature-Scale Concentration Profile Solving for concentration in a trench with reactive walls: $$ D_{eff} \frac{d^2 C}{dy^2} = \frac{2 k_s C}{w} $$ General solution: $$ C(y) = C_0 \frac{\cosh\left( \phi \frac{L - y}{L} \right)}{\cosh(\phi)} $$ 6. Atomic Layer Deposition (ALD) Models 6.1 Self-Limiting Surface Kinetics Surface site balance equation: $$ \frac{d\theta}{dt} = k_a C (1 - \theta) - k_d \theta $$ Where: - $\theta$ — fractional surface coverage - $k_a$ — adsorption rate constant (m³/mol·s) - $k_d$ — desorption rate constant (s⁻¹) - $C$ — gas-phase precursor concentration (mol/m³) At equilibrium saturation: $$ \theta_{eq} = \frac{k_a C}{k_a C + k_d} \approx 1 \quad \text{(for strong chemisorption)} $$ 6.2 Growth Per Cycle (GPC) $$ \text{GPC} = \Gamma_0 \cdot \Omega \cdot \eta $$ Where: - $\Gamma_0$ — surface site density (sites/m²) - $\Omega$ — volume per deposited atom (m³) - $\eta$ — reaction efficiency (dimensionless) 6.3 Saturation Dose-Time Relationship $$ \theta(t) = 1 - \exp\left( -\frac{S \cdot \Phi \cdot t}{\Gamma_0} \right) $$ **Impingement flux** from kinetic theory: $$ \Phi = \frac{p}{\sqrt{2 \pi m k_B T}} $$ Where: - $\Phi$ — molecular impingement flux (molecules/m²·s) - $p$ — precursor partial pressure (Pa) - $m$ — molecular mass (kg) 7. Plasma Modeling (PVD/PECVD) 7.1 Plasma Sheath Physics **Child-Langmuir law** for ion current density: $$ J_{ion} = \frac{4 \varepsilon_0}{9} \sqrt{\frac{2e}{M_i}} \frac{V_s^{3/2}}{d_s^2} $$ Where: - $J_{ion}$ — ion current density (A/m²) - $\varepsilon_0$ — vacuum permittivity ($8.85 \times 10^{-12}$ F/m) - $e$ — elementary charge ($1.6 \times 10^{-19}$ C) - $M_i$ — ion mass (kg) - $V_s$ — sheath voltage (V) - $d_s$ — sheath thickness (m) 7.2 Ion Energy at Substrate $$ \varepsilon_{ion} \approx e V_s + \frac{1}{2} M_i v_{Bohm}^2 $$ **Bohm velocity:** $$ v_{Bohm} = \sqrt{\frac{k_B T_e}{M_i}} $$ Where: - $T_e$ — electron temperature (K or eV) 7.3 Sputtering Yield (Sigmund Formula) $$ Y(E) = \frac{3 \alpha}{4 \pi^2} \cdot \frac{4 M_1 M_2}{(M_1 + M_2)^2} \cdot \frac{E}{U_0} $$ Where: - $Y$ — sputtering yield (atoms/ion) - $\alpha$ — dimensionless factor (~0.2–0.4) - $M_1$ — incident ion mass - $M_2$ — target atom mass - $E$ — incident ion energy (eV) - $U_0$ — surface binding energy (eV) 7.4 Electron Energy Distribution Function (EEDF) The Boltzmann equation in energy space: $$ \frac{\partial f}{\partial t} + \mathbf{v} \cdot \nabla f + \frac{e \mathbf{E}}{m_e} \cdot \nabla_v f = C[f] $$ Where: - $f$ — electron energy distribution function - $\mathbf{E}$ — electric field - $m_e$ — electron mass - $C[f]$ — collision integral 8. MDP: Markov Decision Process for Process Control 8.1 MDP Formulation A Markov Decision Process is defined by the tuple: $$ \mathcal{M} = (S, A, P, R, \gamma) $$ **Components in semiconductor context:** - **State space $S$**: Film thickness, resistivity, uniformity, equipment state, wafer position - **Action space $A$**: Temperature, pressure, flow rates, RF power, deposition time - **Transition probability $P(s' | s, a)$**: Stochastic process model - **Reward function $R(s, a)$**: Yield, uniformity, throughput, quality metrics - **Discount factor $\gamma$**: Time preference (typically 0.9–0.99) 8.2 Bellman Optimality Equation $$ V^*(s) = \max_{a \in A} \left[ R(s, a) + \gamma \sum_{s'} P(s' | s, a) V^*(s') \right] $$ **Q-function formulation:** $$ Q^*(s, a) = R(s, a) + \gamma \sum_{s'} P(s' | s, a) \max_{a'} Q^*(s', a') $$ 8.3 Run-to-Run (R2R) Control Optimal recipe adjustment after each wafer: $$ \mathbf{u}_{k+1} = \mathbf{u}_k + \mathbf{K} (\mathbf{y}_{target} - \mathbf{y}_k) $$ Where: - $\mathbf{u}_k$ — process recipe parameters at run $k$ - $\mathbf{y}_k$ — measured output at run $k$ - $\mathbf{K}$ — controller gain matrix (from MDP policy optimization) 8.4 Reinforcement Learning Approaches | Method | Application | Characteristics | |--------|-------------|-----------------| | **Q-Learning** | Discrete parameter optimization | Model-free, tabular | | **Deep Q-Network (DQN)** | High-dimensional state spaces | Neural network approximation | | **Policy Gradient** | Continuous process control | Direct policy optimization | | **Actor-Critic (A2C/PPO)** | Complex control tasks | Combined value and policy | | **Model-Based RL** | Physics-informed control | Sample efficient | 9. Electrochemical Deposition (Copper Damascene) 9.1 Butler-Volmer Equation $$ i = i_0 \left[ \exp\left( \frac{\alpha_a F \eta}{RT} \right) - \exp\left( -\frac{\alpha_c F \eta}{RT} \right) \right] $$ Where: - $i$ — current density (A/m²) - $i_0$ — exchange current density (A/m²) - $\alpha_a, \alpha_c$ — anodic and cathodic transfer coefficients - $F$ — Faraday constant (96,485 C/mol) - $\eta = E - E_{eq}$ — overpotential (V) - $R$ — gas constant (J/mol·K) - $T$ — temperature (K) 9.2 Mass Transport Limited Current $$ i_L = \frac{n F D C_b}{\delta} $$ Where: - $i_L$ — limiting current density (A/m²) - $n$ — number of electrons transferred - $D$ — diffusion coefficient of Cu²⁺ (m²/s) - $C_b$ — bulk concentration (mol/m³) - $\delta$ — diffusion layer thickness (m) 9.3 Nernst-Planck Equation $$ \mathbf{J}_i = -D_i \nabla C_i - \frac{z_i F D_i}{RT} C_i \nabla \phi + C_i \mathbf{v} $$ Where: - $\mathbf{J}_i$ — flux of species $i$ - $z_i$ — charge number - $\phi$ — electric potential 9.4 Superfilling (Bottom-Up Fill) The curvature-enhanced accelerator mechanism: $$ v_n = v_0 (1 + \kappa \cdot \Gamma_{acc}) $$ Where: - $v_n$ — local growth velocity normal to surface - $v_0$ — baseline growth velocity - $\kappa$ — local surface curvature (1/m) - $\Gamma_{acc}$ — accelerator surface concentration 10. Multiscale Modeling Framework 10.1 Hierarchical Scale Integration ``` ┌──────────────────────────────────────────────────────────────┐ │ REACTOR SCALE │ │ CFD: Flow, temperature, concentration │ │ Time: seconds | Length: cm │ └─────────────────────────┬────────────────────────────────────┘ │ Boundary fluxes ▼ ┌──────────────────────────────────────────────────────────────┐ │ FEATURE SCALE │ │ Level-set / String method for surface evolution │ │ Time: seconds | Length: μm │ └─────────────────────────┬────────────────────────────────────┘ │ Local rates ▼ ┌──────────────────────────────────────────────────────────────┐ │ MESOSCALE (kMC) │ │ Kinetic Monte Carlo: nucleation, island growth │ │ Time: ms | Length: nm │ └─────────────────────────┬────────────────────────────────────┘ │ Rate parameters ▼ ┌──────────────────────────────────────────────────────────────┐ │ ATOMISTIC (MD/DFT) │ │ Molecular dynamics, ab initio: binding energies, │ │ diffusion barriers, reaction paths │ │ Time: ps | Length: Å │ └──────────────────────────────────────────────────────────────┘ ``` 10.2 Kinetic Monte Carlo (kMC) Event rate from transition state theory: $$ k_i = \nu_0 \exp\left( -\frac{E_{a,i}}{k_B T} \right) $$ Total rate and time step: $$ k_{total} = \sum_i k_i, \quad \Delta t = -\frac{\ln(r)}{k_{total}} $$ Where $r \in (0, 1]$ is a uniform random number. 10.3 Molecular Dynamics Newton's equations of motion: $$ m_i \frac{d^2 \mathbf{r}_i}{dt^2} = -\nabla_i U(\mathbf{r}_1, \mathbf{r}_2, \ldots, \mathbf{r}_N) $$ **Lennard-Jones potential:** $$ U_{LJ}(r) = 4\varepsilon \left[ \left( \frac{\sigma}{r} \right)^{12} - \left( \frac{\sigma}{r} \right)^6 \right] $$ **Embedded Atom Method (EAM) for metals:** $$ U = \sum_i F_i(\rho_i) + \frac{1}{2} \sum_{i \neq j} \phi_{ij}(r_{ij}) $$ Where $\rho_i = \sum_{j \neq i} f_j(r_{ij})$ is the electron density at atom $i$. 11. Uniformity Modeling 11.1 Wafer-Scale Thickness Distribution (Sputtering) For a circular magnetron target: $$ t(r) = \int_{target} \frac{Y \cdot J_{ion} \cdot \cos\theta_t \cdot \cos\theta_w}{\pi R^2} \, dA $$ Where: - $t(r)$ — thickness at radial position $r$ - $\theta_t$ — emission angle from target - $\theta_w$ — incidence angle at wafer 11.2 Uniformity Metrics **Within-Wafer Uniformity (WIW):** $$ \sigma_{WIW} = \frac{1}{\bar{t}} \sqrt{\frac{1}{N} \sum_{i=1}^{N} (t_i - \bar{t})^2} \times 100\% $$ **Wafer-to-Wafer Uniformity (WTW):** $$ \sigma_{WTW} = \frac{1}{\bar{t}_{avg}} \sqrt{\frac{1}{M} \sum_{j=1}^{M} (\bar{t}_j - \bar{t}_{avg})^2} \times 100\% $$ **Target specifications:** - $\sigma_{WIW} < 1\%$ for advanced nodes (≤7 nm) - $\sigma_{WTW} < 0.5\%$ for high-volume manufacturing 12. Virtual Metrology and Statistical Models 12.1 Gaussian Process Regression (GPR) $$ f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')) $$ **Squared exponential (RBF) kernel:** $$ k(\mathbf{x}, \mathbf{x}') = \sigma_f^2 \exp\left( -\frac{|\mathbf{x} - \mathbf{x}'|^2}{2\ell^2} \right) $$ **Predictive distribution:** $$ f_* | \mathbf{X}, \mathbf{y}, \mathbf{x}_* \sim \mathcal{N}(\bar{f}_*, \text{var}(f_*)) $$ 12.2 Partial Least Squares (PLS) $$ \mathbf{Y} = \mathbf{X} \mathbf{B} + \mathbf{E} $$ Where: - $\mathbf{X}$ — process parameter matrix - $\mathbf{Y}$ — quality outcome matrix - $\mathbf{B}$ — regression coefficient matrix - $\mathbf{E}$ — residual matrix 12.3 Principal Component Analysis (PCA) $$ \mathbf{X} = \mathbf{T} \mathbf{P}^T + \mathbf{E} $$ **Hotelling's $T^2$ statistic for fault detection:** $$ T^2 = \sum_{i=1}^{k} \frac{t_i^2}{\lambda_i} $$ 13. Process Optimization 13.1 Response Surface Methodology (RSM) **Second-order polynomial model:** $$ y = \beta_0 + \sum_{i=1}^{k} \beta_i x_i + \sum_{i=1}^{k} \beta_{ii} x_i^2 + \sum_{i < j} \beta_{ij} x_i x_j + \varepsilon $$ 13.2 Constrained Optimization $$ \min_{\mathbf{x}} f(\mathbf{x}) \quad \text{subject to} \quad g_i(\mathbf{x}) \leq 0, \quad h_j(\mathbf{x}) = 0 $$ **Example constraints:** - $g_1$: Non-uniformity ≤ 3% - $g_2$: Resistivity within spec - $g_3$: Throughput ≥ target - $h_1$: Total film thickness = target 13.3 Pareto Multi-Objective Optimization $$ \min_{\mathbf{x}} \left[ f_1(\mathbf{x}), f_2(\mathbf{x}), \ldots, f_m(\mathbf{x}) \right] $$ Common trade-offs: - Uniformity vs. throughput - Film quality vs. cost - Conformality vs. deposition rate 14. Summary: Mathematical Toolkit Reference | Domain | Key Equations | Application | |--------|---------------|-------------| | **Transport** | Navier-Stokes, Convection-Diffusion | Gas flow, precursor delivery | | **Kinetics** | Arrhenius, Langmuir-Hinshelwood | Reaction rates | | **Surface Evolution** | KPZ, Level-set, Edwards-Wilkinson | Film morphology | | **Plasma** | Boltzmann, Child-Langmuir | Ion/electron dynamics | | **Electrochemistry** | Butler-Volmer, Nernst-Planck | Copper plating | | **Control** | Bellman, MDP, RL algorithms | Recipe optimization | | **Statistics** | GPR, PLS, PCA | Virtual metrology | | **Multiscale** | MD, kMC, Continuum | Integrated simulation | 15. Key Physical Constants | Constant | Symbol | Value | Units | |----------|--------|-------|-------| | Boltzmann constant | $k_B$ | $1.38 \times 10^{-23}$ | J/K | | Gas constant | $R$ | $8.314$ | J/(mol·K) | | Faraday constant | $F$ | $96,485$ | C/mol | | Elementary charge | $e$ | $1.60 \times 10^{-19}$ | C | | Vacuum permittivity | $\varepsilon_0$ | $8.85 \times 10^{-12}$ | F/m | | Avogadro's number | $N_A$ | $6.02 \times 10^{23}$ | mol⁻¹ | | Electron mass | $m_e$ | $9.11 \times 10^{-31}$ | kg |

pvd,thin film,physical vapor deposition

Deposit conductor films (Al Cu Ti TiN Ta TaN W etc).

quad flat no-lead, qfn, packaging

No leads extending from body.

quad flat package, qfp, packaging

Four-sided package with leads.

qualification wafers, production

Wafers used to qualify process or equipment.

quantification limit, metrology

Lowest reliably measured amount.

quantum yield,lithography

Fraction of absorbed photons that cause reaction.

quasi-steady-state photoconductance, qsspc, metrology

Measure carrier lifetime.

queueing theory, queuing theory, queue, cycle time, fab scheduling, little law, wip, reentrant, utilization, throughput, semiconductor queueing

# Semiconductor Manufacturing & Queueing Theory: A Mathematical Deep Dive ## 1. Introduction Semiconductor fabrication presents one of the most mathematically rich queueing environments in existence. Key characteristics include: - **Reentrant flow**: Wafers visit the same machine groups multiple times (e.g., photolithography 20–30 times) - **Process complexity**: 400–800 processing steps over 2–3 months - **Batch processing**: Furnaces, wet benches process multiple wafers simultaneously - **Sequence-dependent setups**: Recipe changes require significant time - **Tool dedication**: Some products can only run on specific tools - **High variability**: Equipment failures, rework, yield issues - **Multiple product mix**: Hundreds of different products simultaneously ## 2. Foundational Queueing Mathematics ### 2.1 The M/M/1 Queue The foundational single-server queue with: - **Arrival rate**: $\lambda$ (Poisson process) - **Service rate**: $\mu$ (exponential service times) - **Utilization**: $\rho = \frac{\lambda}{\mu}$ **Key metrics**: $$ W = \frac{\rho}{\mu(1-\rho)} $$ $$ L = \frac{\rho^2}{1-\rho} $$ Where: - $W$ = Average waiting time - $L$ = Average queue length ### 2.2 Kingman's Formula (G/G/1 Approximation) The **core insight** for semiconductor manufacturing—the G/G/1 approximation: $$ W_q \approx \left(\frac{\rho}{1-\rho}\right) \cdot \left(\frac{C_a^2 + C_s^2}{2}\right) \cdot \bar{s} $$ **Variable definitions**: | Symbol | Definition | |--------|------------| | $\rho$ | Utilization (arrival rate / service rate) | | $C_a^2$ | Squared coefficient of variation of interarrival times | | $C_s^2$ | Squared coefficient of variation of service times | | $\bar{s}$ | Mean service time | **Critical insight**: The term $\frac{\rho}{1-\rho}$ is **explosively nonlinear**: | Utilization ($\rho$) | Queueing Multiplier $\frac{\rho}{1-\rho}$ | |---------------------|-------------------------------------------| | 50% | 1.0× | | 70% | 2.3× | | 80% | 4.0× | | 90% | 9.0× | | 95% | 19.0× | | 99% | 99.0× | ### 2.3 Pollaczek-Khinchine Formula (M/G/1) For Poisson arrivals with general service distribution: $$ W_q = \frac{\lambda \mathbb{E}[S^2]}{2(1-\rho)} = \frac{\rho}{1-\rho} \cdot \frac{1+C_s^2}{2} \cdot \frac{1}{\mu} $$ ### 2.4 Little's Law The **universal connector** in queueing theory: $$ L = \lambda W $$ Where: - $L$ = Average number in system (WIP) - $\lambda$ = Throughput (arrival rate) - $W$ = Average time in system (cycle time) **Properties**: - Exact (not an approximation) - Distribution-free - Universally applicable - Foundational for fab metrics ## 3. The VUT Equation (Factory Physics) The practical "working equation" for semiconductor cycle time: $$ CT = T_0 \cdot \left[1 + \left(\frac{C_a^2 + C_s^2}{2}\right) \cdot \left(\frac{\rho}{1-\rho}\right)\right] $$ ### 3.1 Component Breakdown | Factor | Symbol | Meaning | |--------|--------|---------| | **V** (Variability) | $\frac{C_a^2 + C_s^2}{2}$ | Process and arrival randomness | | **U** (Utilization) | $\frac{\rho}{1-\rho}$ | Congestion penalty | | **T** (Time) | $T_0$ | Raw (irreducible) processing time | ### 3.2 Cycle Time Bounds **Best Case Cycle Time**: $$ CT_{best} = T_0 + \frac{(W_0 - 1)}{r_{bottleneck}} \cdot \mathbf{1}_{W_0 > 1} $$ **Practical Worst Case (PWC)**: $$ CT_{PWC} = T_0 + \frac{(n-1) \cdot W_0}{r_{bottleneck}} $$ Where: - $T_0$ = Raw processing time - $W_0$ = WIP level - $n$ = Number of stations - $r_{bottleneck}$ = Bottleneck rate ## 4. Reentrant Line Theory ### 4.1 Mathematical Formulation A reentrant line has: - $K$ stations (machine groups) - $J$ steps (operations) - Each step $j$ is processed at station $s(j)$ - Products visit the same station multiple times **State descriptor**: $$ \mathbf{n} = (n_1, n_2, \ldots, n_J) $$ where $n_j$ = number of jobs at step $j$. ### 4.2 Stability Conditions For a reentrant line to be stable: $$ \rho_k = \sum_{j:\, s(j)=k} \frac{\lambda}{\mu_j} < 1 \quad \forall k \in \{1, \ldots, K\} $$ > **Critical Result**: This condition is **necessary but NOT sufficient**! > > The **Lu-Kumar network** demonstrated that even with all $\rho_k < 1$, certain scheduling policies (including FIFO) can make the system **unstable**—queues grow unboundedly. ### 4.3 Fluid Models Deterministic approximation treating jobs as continuous flow: $$ \frac{dq_j(t)}{dt} = \lambda_j(t) - \mu_j(t) $$ **Applications**: - Capacity planning - Stability analysis - Bottleneck identification - Long-run behavior prediction ### 4.4 Diffusion Limits (Heavy Traffic) In heavy traffic ($\rho \to 1$), the queue length process converges to **Reflected Brownian Motion (RBM)**: $$ Z(t) = X(t) + L(t) $$ Where: - $Z(t)$ = Queue length process - $X(t)$ = Net input process (Brownian motion) - $L(t)$ = Regulator process (reflection at zero) **Brownian motion parameters**: - Drift: $\theta = \lambda - \mu$ - Variance: $\sigma^2 = \lambda \cdot C_a^2 + \mu \cdot C_s^2$ ## 5. Variability Propagation ### 5.1 Sources of Variability 1. **Arrival variability** ($C_a^2$): Order patterns, lot releases 2. **Process variability** ($C_s^2$): Equipment, recipes, operators 3. **Flow variability**: Propagation through network 4. **Failure variability**: Random equipment downs ### 5.2 The Linking Equations For departures from a queue: $$ C_d^2 = \rho^2 C_s^2 + (1-\rho^2) C_a^2 $$ **Interpretation**: - High-utilization stations ($\rho \to 1$): Export **service variability** - Low-utilization stations ($\rho \to 0$): Export **arrival variability** ### 5.3 Equipment Failures and Effective Variability When tools fail randomly: $$ C_{s,eff}^2 = C_{s,0}^2 + 2 \cdot \frac{(1-A)}{A} \cdot \frac{MTTR}{t_0} $$ Where: - $C_{s,0}^2$ = Inherent process variability - $A = \frac{MTBF}{MTBF + MTTR}$ = Availability - $MTBF$ = Mean Time Between Failures - $MTTR$ = Mean Time To Repair - $t_0$ = Processing time **Example calculation**: For $A = 0.95$, $MTTR = t_0$: $$ \Delta C_s^2 = 2 \cdot \frac{0.05}{0.95} \cdot 1 \approx 0.105 $$ ## 6. Batch Processing Mathematics ### 6.1 Bulk Service Queues (M/G^b/1) Characteristics: - Customers arrive singly (Poisson) - Server processes up to $b$ customers simultaneously - Service time same regardless of batch size **Analysis tools**: - Probability generating functions - Embedded Markov chains at departure epochs ### 6.2 Minimum Batch Trigger (MBT) Policies Wait until at least $b$ items accumulate before processing. **Effects**: - Creates artificial correlation between arrivals - Dramatically increases effective $C_a^2$ - Higher cycle times despite efficient tool usage **Effective arrival variability** can increase by factors of **2–5×**. ### 6.3 Optimal Batch Size Balancing setup efficiency against queue time: $$ B^* = \sqrt{\frac{2DS}{ph}} $$ Where: - $D$ = Demand rate - $S$ = Setup cost/time - $p$ = Processing cost per item - $h$ = Holding cost **Trade-off**: - Smaller batches → More setups, less waiting - Larger batches → Fewer setups, longer queues ## 7. Queueing Network Analysis ### 7.1 Jackson Networks **Assumptions**: - Poisson external arrivals - Exponential service times - Probabilistic routing **Product-form solution**: $$ \pi(\mathbf{n}) = \prod_{i=1}^{K} \pi_i(n_i) $$ Each queue behaves independently in steady state. ### 7.2 BCMP Networks Extensions to Jackson networks: - Multiple job classes - Various service disciplines (FCFS, PS, LCFS-PR, IS) - General service time distributions (with constraints) **Product-form maintained**: $$ \pi(n_1, n_2, \ldots, n_K) = C \prod_{i=1}^{K} f_i(n_i) $$ ### 7.3 Mean Value Analysis (MVA) For closed networks (fixed WIP): $$ W_k(n) = \frac{1}{\mu_k}\left(1 + Q_k(n-1)\right) $$ **Iterative algorithm**: 1. Compute wait times given queue lengths at $n-1$ jobs 2. Calculate queue lengths at $n$ jobs 3. Determine throughput 4. Repeat ### 7.4 Decomposition Approximations (QNA) For realistic fabs, use **decomposition methods**: 1. **Traffic equations**: Solve for effective arrival rates $\lambda_i$ $$ \lambda_i = \gamma_i + \sum_{j=1}^{K} \lambda_j p_{ji} $$ 2. **Linking equations**: Track $C_a^2$ propagation 3. **G/G/m formulas**: Apply at each station independently 4. **Aggregation**: Combine results for system metrics ## 8. Scheduling Theory for Fabs ### 8.1 Basic Priority Rules | Rule | Description | Optimal For | |------|-------------|-------------| | FIFO | First In, First Out | Fairness | | SRPT | Shortest Remaining Processing Time | Mean flow time | | EDD | Earliest Due Date | On-time delivery | | SPT | Shortest Processing Time | Mean waiting time | ### 8.2 Fluctuation Smoothing Policies Developed specifically for semiconductor manufacturing: - **FSMCT** (Fluctuation Smoothing for Mean Cycle Time): - Prioritizes jobs that smooth the output stream - Reduces mean cycle time - **FSVCT** (Fluctuation Smoothing for Variance of Cycle Time): - Reduces cycle time variability - Improves delivery predictability ### 8.3 Heavy Traffic Scheduling In the limit as $\rho \to 1$, optimal policies often take forms: - **cμ-rule**: Prioritize class with highest $c_i \mu_i$ $$ \text{Priority index} = c_i \cdot \mu_i $$ where $c_i$ = holding cost, $\mu_i$ = service rate - **Threshold policies**: Switch based on queue length thresholds - **State-dependent priorities**: Dynamic adjustment based on system state ### 8.4 Computational Complexity **State space dimension** = Number of (step × product) combinations For realistic fabs: **thousands of dimensions** Dynamic programming approaches suffer the **curse of dimensionality**: $$ |\mathcal{S}| = \prod_{j=1}^{J} (N_{max} + 1) $$ Where $J$ = number of steps, $N_{max}$ = maximum queue size per step. ## 9. Key Mathematical Insights ### 9.1 Summary Table | Insight | Mathematical Expression | Practical Implication | |---------|------------------------|----------------------| | Nonlinear congestion | $\frac{\rho}{1-\rho}$ | Small utilization increases near capacity cause huge cycle time jumps | | Variability multiplies | $\frac{C_a^2 + C_s^2}{2}$ | Reducing variability is as powerful as reducing utilization | | Variability propagates | $C_d^2 = \rho^2 C_s^2 + (1-\rho^2) C_a^2$ | Upstream problems cascade downstream | | Batching costs | MBT inflates $C_a^2$ | "Efficient" batching often increases total cycle time | | Reentrant instability | Lu-Kumar example | Simple policies can destabilize feasible systems | | Universal law | $L = \lambda W$ | Connects WIP, throughput, and cycle time | ### 9.2 The Central Trade-off $$ \text{Cycle Time} \propto \frac{1}{1-\rho} \times \text{Variability} $$ **The fundamental tension**: Pushing utilization higher improves asset ROI but triggers explosive cycle time growth through the $\frac{\rho}{1-\rho}$ nonlinearity—amplified by every source of variability. ## 10. Modern Developments ### 10.1 Stochastic Processing Networks Generalizations of classical queueing: - Simultaneous resource possession - Complex synchronization constraints - Non-idling constraints ### 10.2 Robust Queueing Theory Optimize for **worst-case performance** over uncertainty sets: $$ \min_{\pi} \max_{\theta \in \Theta} J(\pi, \theta) $$ Rather than assuming specific stochastic distributions. ### 10.3 Machine Learning Integration - **Reinforcement Learning**: Train dispatch policies from simulation $$ Q(s, a) \leftarrow Q(s, a) + \alpha \left[ r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right] $$ - **Neural Networks**: Approximate complex distributions - **Data-driven estimation**: Real-time parameter learning ### 10.4 Digital Twin Technology Combines: - Analytical queueing models (fast, interpretable) - High-fidelity simulation (detailed, accurate) - Real-time sensor data (current state) For predictive control and optimization. ## Common Notation Reference | Symbol | Meaning | |--------|---------| | $\lambda$ | Arrival rate | | $\mu$ | Service rate | | $\rho$ | Utilization ($\lambda/\mu$) | | $C_a^2$ | Squared CV of interarrival times | | $C_s^2$ | Squared CV of service times | | $W$ | Waiting time | | $W_q$ | Waiting time in queue | | $L$ | Number in system | | $L_q$ | Number in queue | | $CT$ | Cycle time | | $T_0$ | Raw processing time | | $WIP$ | Work in process | ## Key Formulas Quick Reference ### B.1 Single Server Queues ``` M/M/1: W = 1/(μ - λ) M/G/1: W_q = λE[S²]/(2(1-ρ)) G/G/1 (Kingman): W_q ≈ (ρ/(1-ρ)) × ((C_a² + C_s²)/2) × (1/μ) ``` ### B.2 Factory Physics ``` VUT Equation: CT = T₀ × [1 + ((C_a² + C_s²)/2) × (ρ/(1-ρ))] Little's Law: L = λW Departure CV: C_d² = ρ²C_s² + (1-ρ²)C_a² ``` ### B.3 Availability ``` Availability: A = MTBF/(MTBF + MTTR) Effective C_s²: C_s² = C_s0² + 2((1-A)/A)(MTTR/t₀) ```

raman mapping, metrology

Spatial stress or composition mapping.

raman spectroscopy,metrology

Analyze molecular vibrations and stress.

ramp rate, packaging

Heating/cooling speed.

random defects,metrology

Unpredictable particle-caused defects.

random signature, metrology

No clear pattern in failures.

reactive ion etching (sample prep),reactive ion etching,sample prep,metrology

Etch samples for analysis.

recombination parameter extraction, metrology

Determine SRH parameters.

redistribution layer (rdl),redistribution layer,rdl,advanced packaging

Reroute connections from die pads to larger pitch for packaging.

redistribution layer for tsv, rdl, advanced packaging

Routing layer connecting TSVs.

reel diameter, packaging

Size of component reel.

reference material,metrology

Standard sample with certified properties for tool calibration.

reference standard,metrology

Certified artifact for calibration.

reflection high-energy electron diffraction (rheed),reflection high-energy electron diffraction,rheed,metrology

In-situ surface crystallography.

reflection interferometry,metrology

Monitor etch depth using interference.

reflective optics (euv),reflective optics,euv,lithography

Mirrors instead of lenses for EUV light.

reflectometry,metrology

Measure film thickness from interference of reflected light.

reflow profile, packaging

Temperature vs time during reflow.

reflow soldering for smt, packaging

Solder paste melted to attach.

regression analysis,regression,ols,least squares,pls,partial least squares,ridge,lasso,semiconductor regression,process regression

# Regression Analysis Semiconductor fabrication involves hundreds of sequential process steps, each governed by dozens of parameters. Regression analysis serves critical functions: - Process Modeling: Understanding relationships between inputs and quality outputs - Virtual Metrology: Predicting measurements from real-time sensor data - Run-to-Run Control: Adaptive process adjustment - Yield Optimization: Maximizing device performance and throughput - Fault Detection: Identifying and diagnosing process excursions Core Mathematical Framework Ordinary Least Squares (OLS) The foundational linear regression model: $$ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon} $$ Variable Definitions: - $\mathbf{y}$ — $n \times 1$ response vector (e.g., film thickness, etch rate, yield) - $\mathbf{X}$ — $n \times (k+1)$ design matrix of process parameters - $\boldsymbol{\beta}$ — $(k+1) \times 1$ coefficient vector - $\boldsymbol{\varepsilon} \sim N(\mathbf{0}, \sigma^2\mathbf{I})$ — error term OLS Estimator: $$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{y} $$ Variance-Covariance Matrix of Estimator: $$ \text{Var}(\hat{\boldsymbol{\beta}}) = \sigma^2(\mathbf{X}^\top\mathbf{X})^{-1} $$ Unbiased Variance Estimate: $$ \hat{\sigma}^2 = \frac{\mathbf{e}^\top\mathbf{e}}{n - k - 1} = \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{n - k - 1} $$ Response Surface Methodology (RSM) Critical for semiconductor process optimization, RSM uses second-order polynomial models. Second-Order Model $$ y = \beta_0 + \sum_{i=1}^{k}\beta_i x_i + \sum_{i=1}^{k}\beta_{ii}x_i^2 + \sum_{i n$) - Addresses multicollinearity - Captures latent variable structures - Simultaneously models X and Y relationships NIPALS Algorithm 1. Initialize: $\mathbf{u} = \mathbf{y}$ 2. X-weight: $$\mathbf{w} = \frac{\mathbf{X}^\top\mathbf{u}}{\|\mathbf{X}^\top\mathbf{u}\|}$$ 3. X-score: $$\mathbf{t} = \mathbf{X}\mathbf{w}$$ 4. Y-loading: $$q = \frac{\mathbf{y}^\top\mathbf{t}}{\mathbf{t}^\top\mathbf{t}}$$ 5. Y-score update: $$\mathbf{u} = \frac{\mathbf{y}q}{q^2}$$ 6. Iterate until convergence 7. Deflate X and Y, extract next component Model Structure $$ \mathbf{X} = \mathbf{T}\mathbf{P}^\top + \mathbf{E} $$ $$ \mathbf{Y} = \mathbf{T}\mathbf{Q}^\top + \mathbf{F} $$ Where: - $\mathbf{T}$ — score matrix (latent variables) - $\mathbf{P}$ — X-loadings - $\mathbf{Q}$ — Y-loadings - $\mathbf{E}, \mathbf{F}$ — residuals Spatial Regression for Wafer Maps Wafer-level variation exhibits spatial patterns requiring specialized models. Zernike Polynomial Decomposition General Form: $$ Z(r,\theta) = \sum_{n,m} a_{nm} Z_n^m(r,\theta) $$ Standard Zernike Polynomials (first few terms): | Index | Name | Formula | |-------|------|---------| | $Z_0^0$ | Piston | $1$ | | $Z_1^{-1}$ | Tilt Y | $r\sin\theta$ | | $Z_1^{1}$ | Tilt X | $r\cos\theta$ | | $Z_2^{-2}$ | Astigmatism 45° | $r^2\sin 2\theta$ | | $Z_2^{0}$ | Defocus | $2r^2 - 1$ | | $Z_2^{2}$ | Astigmatism 0° | $r^2\cos 2\theta$ | | $Z_3^{-1}$ | Coma Y | $(3r^3 - 2r)\sin\theta$ | | $Z_3^{1}$ | Coma X | $(3r^3 - 2r)\cos\theta$ | | $Z_4^{0}$ | Spherical | $6r^4 - 6r^2 + 1$ | Orthogonality Property: $$ \int_0^1 \int_0^{2\pi} Z_n^m(r,\theta) Z_{n'}^{m'}(r,\theta) \, r \, dr \, d\theta = \frac{\pi}{n+1}\delta_{nn'}\delta_{mm'} $$ Gaussian Process Regression (Kriging) Prior Distribution: $$ f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')) $$ Common Kernel Functions: *Squared Exponential (RBF)*: $$ k(\mathbf{x}, \mathbf{x}') = \sigma^2 \exp\left(-\frac{\|\mathbf{x} - \mathbf{x}'\|^2}{2\ell^2}\right) $$ *Matérn Kernel*: $$ k(r) = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)}\left(\frac{\sqrt{2\nu}r}{\ell}\right)^\nu K_\nu\left(\frac{\sqrt{2\nu}r}{\ell}\right) $$ Where $K_\nu$ is the modified Bessel function of the second kind. Posterior Predictive Mean: $$ \bar{f}_* = \mathbf{k}_*^\top(\mathbf{K} + \sigma_n^2\mathbf{I})^{-1}\mathbf{y} $$ Posterior Predictive Variance: $$ \text{Var}(f_*) = k(\mathbf{x}_*, \mathbf{x}_*) - \mathbf{k}_*^\top(\mathbf{K} + \sigma_n^2\mathbf{I})^{-1}\mathbf{k}_* $$ Mixed Effects Models Semiconductor data has hierarchical structure (wafers within lots, lots within tools). General Model $$ y_{ijk} = \mathbf{x}_{ijk}^\top\boldsymbol{\beta} + b_i^{(\text{tool})} + b_{ij}^{(\text{lot})} + \varepsilon_{ijk} $$ Random Effects Distribution: - $b_i^{(\text{tool})} \sim N(0, \sigma_{\text{tool}}^2)$ - $b_{ij}^{(\text{lot})} \sim N(0, \sigma_{\text{lot}}^2)$ - $\varepsilon_{ijk} \sim N(0, \sigma^2)$ Matrix Notation $$ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\mathbf{b} + \boldsymbol{\varepsilon} $$ Where: - $\mathbf{b} \sim N(\mathbf{0}, \mathbf{G})$ - $\boldsymbol{\varepsilon} \sim N(\mathbf{0}, \mathbf{R})$ - $\text{Var}(\mathbf{y}) = \mathbf{V} = \mathbf{Z}\mathbf{G}\mathbf{Z}^\top + \mathbf{R}$ REML Estimation Restricted Log-Likelihood: $$ \ell_{\text{REML}}(\boldsymbol{\theta}) = -\frac{1}{2}\left[\log|\mathbf{V}| + \log|\mathbf{X}^\top\mathbf{V}^{-1}\mathbf{X}| + \mathbf{r}^\top\mathbf{V}^{-1}\mathbf{r}\right] $$ Where $\mathbf{r} = \mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}$. Physics-Informed Regression Models Arrhenius-Based Models (Thermal Processes) Rate Equation: $$ k = A \exp\left(-\frac{E_a}{RT}\right) $$ Linearized Form (for regression): $$ \ln(k) = \ln(A) - \frac{E_a}{R} \cdot \frac{1}{T} $$ Parameters: - $k$ — rate constant - $A$ — pre-exponential factor - $E_a$ — activation energy (J/mol) - $R$ — gas constant (8.314 J/mol·K) - $T$ — absolute temperature (K) Preston's Equation (CMP) Basic Form: $$ \text{MRR} = K_p \cdot P \cdot V $$ Extended Model: $$ \text{MRR} = K_p \cdot P^a \cdot V^b \cdot f(\text{slurry}, \text{pad}) $$ Where: - MRR — material removal rate - $K_p$ — Preston coefficient - $P$ — applied pressure - $V$ — relative velocity Lithography Focus-Exposure Model $$ \text{CD} = \beta_0 + \beta_1 E + \beta_2 F + \beta_3 E^2 + \beta_4 F^2 + \beta_5 EF + \varepsilon $$ Variables: - CD — critical dimension - $E$ — exposure dose - $F$ — focus offset Bossung Curve: Plot of CD vs. focus at various exposure levels. Virtual Metrology Mathematics Predicting quality measurements from equipment sensor data in real-time. Model Structure $$ \hat{y} = f(\mathbf{x}_{\text{FDC}}; \boldsymbol{\theta}) $$ Where $\mathbf{x}_{\text{FDC}}$ is Fault Detection and Classification sensor data. EWMA Run-to-Run Control Exponentially Weighted Moving Average: $$ \hat{T}_{n+1} = \lambda y_n + (1-\lambda)\hat{T}_n $$ Properties: - $\lambda \in (0,1]$ — smoothing parameter - Smaller $\lambda$ → more smoothing - Larger $\lambda$ → faster response to changes Kalman Filter Approach State Equation: $$ \mathbf{x}_{k} = \mathbf{A}\mathbf{x}_{k-1} + \mathbf{w}_k, \quad \mathbf{w}_k \sim N(\mathbf{0}, \mathbf{Q}) $$ Measurement Equation: $$ y_k = \mathbf{H}\mathbf{x}_k + v_k, \quad v_k \sim N(0, R) $$ Update Equations: *Predict*: $$ \hat{\mathbf{x}}_{k|k-1} = \mathbf{A}\hat{\mathbf{x}}_{k-1|k-1} $$ $$ \mathbf{P}_{k|k-1} = \mathbf{A}\mathbf{P}_{k-1|k-1}\mathbf{A}^\top + \mathbf{Q} $$ *Update*: $$ \mathbf{K}_k = \mathbf{P}_{k|k-1}\mathbf{H}^\top(\mathbf{H}\mathbf{P}_{k|k-1}\mathbf{H}^\top + R)^{-1} $$ $$ \hat{\mathbf{x}}_{k|k} = \hat{\mathbf{x}}_{k|k-1} + \mathbf{K}_k(y_k - \mathbf{H}\hat{\mathbf{x}}_{k|k-1}) $$ Classification and Count Models Logistic Regression (Binary Outcomes) For pass/fail or defect/no-defect classification: Model: $$ P(Y=1|\mathbf{x}) = \frac{1}{1 + \exp(-\mathbf{x}^\top\boldsymbol{\beta})} = \sigma(\mathbf{x}^\top\boldsymbol{\beta}) $$ Logit Link: $$ \text{logit}(p) = \ln\left(\frac{p}{1-p}\right) = \mathbf{x}^\top\boldsymbol{\beta} $$ Log-Likelihood: $$ \ell(\boldsymbol{\beta}) = \sum_{i=1}^{n}\left[y_i \log(\pi_i) + (1-y_i)\log(1-\pi_i)\right] $$ Newton-Raphson Update: $$ \boldsymbol{\beta}^{(t+1)} = \boldsymbol{\beta}^{(t)} + (\mathbf{X}^\top\mathbf{W}\mathbf{X})^{-1}\mathbf{X}^\top(\mathbf{y} - \boldsymbol{\pi}) $$ Where $\mathbf{W} = \text{diag}(\pi_i(1-\pi_i))$. Poisson Regression (Defect Counts) Model: $$ \log(\mu) = \mathbf{x}^\top\boldsymbol{\beta}, \quad Y \sim \text{Poisson}(\mu) $$ Probability Mass Function: $$ P(Y = y) = \frac{\mu^y e^{-\mu}}{y!} $$ Model Validation and Diagnostics Goodness of Fit Metrics Coefficient of Determination: $$ R^2 = 1 - \frac{\text{SSE}}{\text{SST}} = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2} $$ Adjusted R-Squared: $$ R^2_{\text{adj}} = 1 - (1-R^2)\frac{n-1}{n-k-1} $$ Root Mean Square Error: $$ \text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2} $$ Mean Absolute Error: $$ \text{MAE} = \frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y}_i| $$ Cross-Validation K-Fold CV Error: $$ \text{CV}_{(K)} = \frac{1}{K}\sum_{k=1}^{K}\text{MSE}_k $$ Leave-One-Out CV: $$ \text{LOOCV} = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_{(-i)})^2 $$ Information Criteria Akaike Information Criterion: $$ \text{AIC} = 2k - 2\ln(\hat{L}) $$ Bayesian Information Criterion: $$ \text{BIC} = k\ln(n) - 2\ln(\hat{L}) $$ Diagnostic Statistics Variance Inflation Factor: $$ \text{VIF}_j = \frac{1}{1-R_j^2} $$ Where $R_j^2$ is the $R^2$ from regressing $x_j$ on all other predictors. Rule of thumb: VIF > 10 indicates problematic multicollinearity. Cook's Distance: $$ D_i = \frac{(\hat{\mathbf{y}} - \hat{\mathbf{y}}_{(-i)})^\top(\hat{\mathbf{y}} - \hat{\mathbf{y}}_{(-i)})}{k \cdot \text{MSE}} $$ Leverage: $$ h_{ii} = [\mathbf{H}]_{ii} $$ Where $\mathbf{H} = \mathbf{X}(\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top$ is the hat matrix. Studentized Residuals: $$ r_i = \frac{e_i}{\hat{\sigma}\sqrt{1 - h_{ii}}} $$ Bayesian Regression Provides full uncertainty quantification for risk-sensitive manufacturing decisions. Bayesian Linear Regression Prior: $$ \boldsymbol{\beta} | \sigma^2 \sim N(\boldsymbol{\beta}_0, \sigma^2\mathbf{V}_0) $$ $$ \sigma^2 \sim \text{Inverse-Gamma}(a_0, b_0) $$ Posterior: $$ \boldsymbol{\beta} | \mathbf{y}, \sigma^2 \sim N(\boldsymbol{\beta}_n, \sigma^2\mathbf{V}_n) $$ Posterior Parameters: $$ \mathbf{V}_n = (\mathbf{V}_0^{-1} + \mathbf{X}^\top\mathbf{X})^{-1} $$ $$ \boldsymbol{\beta}_n = \mathbf{V}_n(\mathbf{V}_0^{-1}\boldsymbol{\beta}_0 + \mathbf{X}^\top\mathbf{y}) $$ Predictive Distribution $$ p(y_*|\mathbf{x}_*, \mathbf{y}) = \int p(y_*|\mathbf{x}_*, \boldsymbol{\beta}, \sigma^2) \, p(\boldsymbol{\beta}, \sigma^2|\mathbf{y}) \, d\boldsymbol{\beta} \, d\sigma^2 $$ For conjugate priors, this is a Student-t distribution. Credible Intervals 95% Credible Interval for $\beta_j$: $$ \beta_j \in \left[\hat{\beta}_j - t_{0.025,\nu}\cdot \text{SE}(\hat{\beta}_j), \quad \hat{\beta}_j + t_{0.025,\nu}\cdot \text{SE}(\hat{\beta}_j)\right] $$ Design of Experiments (DOE) Full Factorial Design For $k$ factors at 2 levels: $$ N = 2^k \text{ runs} $$ Fractional Factorial Design $$ N = 2^{k-p} \text{ runs} $$ Resolution: - Resolution III: Main effects aliased with 2-factor interactions - Resolution IV: Main effects clear; 2FIs aliased with each other - Resolution V: Main effects and 2FIs clear Central Composite Design (CCD) Components: - $2^k$ factorial points - $2k$ axial (star) points at distance $\alpha$ - $n_0$ center points Rotatability Condition: $$ \alpha = (2^k)^{1/4} $$ D-Optimal Design Maximizes the determinant of the information matrix: $$ \max_{\mathbf{X}} |\mathbf{X}^\top\mathbf{X}| $$ Equivalently, minimizes the generalized variance of $\hat{\boldsymbol{\beta}}$. I-Optimal Design Minimizes average prediction variance: $$ \min_{\mathbf{X}} \int_{\mathcal{R}} \text{Var}(\hat{y}(\mathbf{x})) \, d\mathbf{x} $$ Reliability Analysis Cox Proportional Hazards Model Hazard Function: $$ h(t|\mathbf{x}) = h_0(t) \cdot \exp(\mathbf{x}^\top\boldsymbol{\beta}) $$ Where: - $h(t|\mathbf{x})$ — hazard at time $t$ given covariates $\mathbf{x}$ - $h_0(t)$ — baseline hazard - $\boldsymbol{\beta}$ — regression coefficients Partial Likelihood $$ L(\boldsymbol{\beta}) = \prod_{i: \delta_i = 1} \frac{\exp(\mathbf{x}_i^\top\boldsymbol{\beta})}{\sum_{j \in \mathcal{R}(t_i)} \exp(\mathbf{x}_j^\top\boldsymbol{\beta})} $$ Where $\mathcal{R}(t_i)$ is the risk set at time $t_i$. Challenge-Method Mapping | Manufacturing Challenge | Mathematical Approach | |------------------------|----------------------| | High dimensionality | PLS, LASSO, Elastic Net | | Multicollinearity | Ridge regression, PCR, VIF analysis | | Spatial wafer patterns | Zernike polynomials, GP regression | | Hierarchical data | Mixed effects models, REML | | Nonlinear processes | RSM, polynomial models, transformations | | Physics constraints | Arrhenius, Preston equation integration | | Uncertainty quantification | Bayesian methods, bootstrap, prediction intervals | | Binary outcomes | Logistic regression | | Count data | Poisson regression | | Real-time control | Kalman filter, EWMA | | Time-to-failure | Cox proportional hazards | Equations Quick Reference Estimation $$ \hat{\boldsymbol{\beta}}_{\text{OLS}} = (\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{y} $$ $$ \hat{\boldsymbol{\beta}}_{\text{Ridge}} = (\mathbf{X}^\top\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^\top\mathbf{y} $$ Prediction Interval $$ \hat{y}_0 \pm t_{\alpha/2, n-k-1} \cdot \sqrt{\text{MSE}\left(1 + \mathbf{x}_0^\top(\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{x}_0\right)} $$ Confidence Interval for $\beta_j$ $$ \hat{\beta}_j \pm t_{\alpha/2, n-k-1} \cdot \text{SE}(\hat{\beta}_j) $$ Process Capability $$ C_p = \frac{\text{USL} - \text{LSL}}{6\sigma} $$ $$ C_{pk} = \min\left(\frac{\text{USL} - \mu}{3\sigma}, \frac{\mu - \text{LSL}}{3\sigma}\right) $$ Reference | Symbol | Description | |--------|-------------| | $\mathbf{y}$ | Response vector | | $\mathbf{X}$ | Design matrix | | $\boldsymbol{\beta}$ | Coefficient vector | | $\hat{\boldsymbol{\beta}}$ | Estimated coefficients | | $\boldsymbol{\varepsilon}$ | Error vector | | $\sigma^2$ | Error variance | | $\lambda$ | Regularization parameter | | $\mathbf{I}$ | Identity matrix | | $\|\cdot\|_1$ | L1 norm (sum of absolute values) | | $\|\cdot\|_2$ | L2 norm (Euclidean) | | $\mathbf{A}^\top$ | Matrix transpose | | $\mathbf{A}^{-1}$ | Matrix inverse | | $|\mathbf{A}|$ | Matrix determinant | | $N(\mu, \sigma^2)$ | Normal distribution | | $\mathcal{GP}$ | Gaussian Process |

regression-based ocd, metrology

Fit model parameters directly.

resin bleed, packaging

Compound on surfaces.

resist profile simulation,lithography

Predict 3D resist shape after develop.

resist sensitivity,lithography

Amount of energy needed to expose resist.

resist spin coating,lithography

Apply liquid resist by spinning wafer at high speed.

resist strip / ashing,lithography

Remove resist after etching using plasma or solvents.

resolution,lithography

Smallest feature size that can be printed.

resolution,metrology

Smallest measurable difference.

resonant ionization mass spectrometry, rims, metrology

Selective ionization for ultra-sensitive detection.

resonant raman, metrology

Enhanced scattering at absorption resonance.

resonant soft x-ray scatterometry, metrology

Element-specific CD measurement.