Regression Analysis
Semiconductor fabrication involves hundreds of sequential process steps, each governed by dozens of parameters. Regression analysis serves critical functions:
- Process Modeling: Understanding relationships between inputs and quality outputs
- Virtual Metrology: Predicting measurements from real-time sensor data
- Run-to-Run Control: Adaptive process adjustment
- Yield Optimization: Maximizing device performance and throughput
- Fault Detection: Identifying and diagnosing process excursions
Core Mathematical Framework
Ordinary Least Squares (OLS)
The foundational linear regression model:
$$ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon} $$
Variable Definitions:
- $\mathbf{y}$ β $n \times 1$ response vector (e.g., film thickness, etch rate, yield)
- $\mathbf{X}$ β $n \times (k+1)$ design matrix of process parameters
- $\boldsymbol{\beta}$ β $(k+1) \times 1$ coefficient vector
- $\boldsymbol{\varepsilon} \sim N(\mathbf{0}, \sigma^2\mathbf{I})$ β error term
OLS Estimator:
$$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{y} $$
Variance-Covariance Matrix of Estimator:
$$ \text{Var}(\hat{\boldsymbol{\beta}}) = \sigma^2(\mathbf{X}^\top\mathbf{X})^{-1} $$
Unbiased Variance Estimate:
$$ \hat{\sigma}^2 = \frac{\mathbf{e}^\top\mathbf{e}}{n - k - 1} = \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{n - k - 1} $$
Response Surface Methodology (RSM)
Critical for semiconductor process optimization, RSM uses second-order polynomial models.
Second-Order Model
$$ y = \beta_0 + \sum_{i=1}^{k}\beta_i x_i + \sum_{i=1}^{k}\beta_{ii}x_i^2 + \sum_{i Matrix Form: $$ y = \beta_0 + \mathbf{x}^\top\mathbf{b} + \mathbf{x}^\top\mathbf{B}\mathbf{x} + \varepsilon $$ Where: Stationary Point Analysis Stationary Point (Optimum): $$ \mathbf{x}_s = -\frac{1}{2}\mathbf{B}^{-1}\mathbf{b} $$ Nature Determination: Canonical Analysis $$ \hat{y} = \hat{y}_s + \sum_{i=1}^{k}\lambda_i w_i^2 $$ Where $\lambda_i$ are eigenvalues and $w_i$ are canonical variables. Regularized Regression Methods Semiconductor data often exhibits multicollinearity and high dimensionality. Ridge Regression (L2 Penalty) Objective Function: $$ \min_{\boldsymbol{\beta}} \left\{ \|\mathbf{y} - \mathbf{X}\boldsymbol{\beta}\|_2^2 + \lambda\|\boldsymbol{\beta}\|_2^2 \right\} $$ Closed-Form Solution: $$ \hat{\boldsymbol{\beta}}_{\text{ridge}} = (\mathbf{X}^\top\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^\top\mathbf{y} $$ Properties: LASSO (L1 Penalty) Objective Function: $$ \min_{\boldsymbol{\beta}} \left\{ \|\mathbf{y} - \mathbf{X}\boldsymbol{\beta}\|_2^2 + \lambda\|\boldsymbol{\beta}\|_1 \right\} $$ Properties: Elastic Net Objective Function: $$ \min_{\boldsymbol{\beta}} \left\{ \|\mathbf{y} - \mathbf{X}\boldsymbol{\beta}\|_2^2 + \lambda_1\|\boldsymbol{\beta}\|_1 + \lambda_2\|\boldsymbol{\beta}\|_2^2 \right\} $$ Alternative Parameterization: $$ \min_{\boldsymbol{\beta}} \left\{ \|\mathbf{y} - \mathbf{X}\boldsymbol{\beta}\|_2^2 + \lambda\left[\alpha\|\boldsymbol{\beta}\|_1 + (1-\alpha)\|\boldsymbol{\beta}\|_2^2\right] \right\} $$ Where $\alpha \in [0,1]$ controls the mix between L1 and L2 penalties. Partial Least Squares (PLS) Regression The most important technique for semiconductor process modeling. Why PLS? NIPALS Algorithm 1. Initialize: $\mathbf{u} = \mathbf{y}$ 2. X-weight: $$\mathbf{w} = \frac{\mathbf{X}^\top\mathbf{u}}{\|\mathbf{X}^\top\mathbf{u}\|}$$ 3. X-score: $$\mathbf{t} = \mathbf{X}\mathbf{w}$$ 4. Y-loading: $$q = \frac{\mathbf{y}^\top\mathbf{t}}{\mathbf{t}^\top\mathbf{t}}$$ 5. Y-score update: $$\mathbf{u} = \frac{\mathbf{y}q}{q^2}$$ 6. Iterate until convergence 7. Deflate X and Y, extract next component Model Structure $$ \mathbf{X} = \mathbf{T}\mathbf{P}^\top + \mathbf{E} $$ $$ \mathbf{Y} = \mathbf{T}\mathbf{Q}^\top + \mathbf{F} $$ Where: Spatial Regression for Wafer Maps Wafer-level variation exhibits spatial patterns requiring specialized models. Zernike Polynomial Decomposition General Form: $$ Z(r,\theta) = \sum_{n,m} a_{nm} Z_n^m(r,\theta) $$ Standard Zernike Polynomials (first few terms): Orthogonality Property: $$ \int_0^1 \int_0^{2\pi} Z_n^m(r,\theta) Z_{n'}^{m'}(r,\theta) \, r \, dr \, d\theta = \frac{\pi}{n+1}\delta_{nn'}\delta_{mm'} $$ Gaussian Process Regression (Kriging) Prior Distribution: $$ f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')) $$ Common Kernel Functions: Squared Exponential (RBF): $$ k(\mathbf{x}, \mathbf{x}') = \sigma^2 \exp\left(-\frac{\|\mathbf{x} - \mathbf{x}'\|^2}{2\ell^2}\right) $$ MatΓ©rn Kernel: $$ k(r) = \sigma^2 \frac{2^{1- u}}{\Gamma( u)}\left(\frac{\sqrt{2 u}r}{\ell}\right)^ u K_ u\left(\frac{\sqrt{2 u}r}{\ell}\right) $$ Where $K_ u$ is the modified Bessel function of the second kind. Posterior Predictive Mean: $$ \bar{f}_ = \mathbf{k}_^\top(\mathbf{K} + \sigma_n^2\mathbf{I})^{-1}\mathbf{y} $$ Posterior Predictive Variance: $$ \text{Var}(f_) = k(\mathbf{x}_, \mathbf{x}_) - \mathbf{k}_^\top(\mathbf{K} + \sigma_n^2\mathbf{I})^{-1}\mathbf{k}_* $$ Mixed Effects Models Semiconductor data has hierarchical structure (wafers within lots, lots within tools). General Model $$ y_{ijk} = \mathbf{x}_{ijk}^\top\boldsymbol{\beta} + b_i^{(\text{tool})} + b_{ij}^{(\text{lot})} + \varepsilon_{ijk} $$ Random Effects Distribution: Matrix Notation $$ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\mathbf{b} + \boldsymbol{\varepsilon} $$ Where: REML Estimation Restricted Log-Likelihood: $$ \ell_{\text{REML}}(\boldsymbol{\theta}) = -\frac{1}{2}\left[\log|\mathbf{V}| + \log|\mathbf{X}^\top\mathbf{V}^{-1}\mathbf{X}| + \mathbf{r}^\top\mathbf{V}^{-1}\mathbf{r}\right] $$ Where $\mathbf{r} = \mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}$. Physics-Informed Regression Models Arrhenius-Based Models (Thermal Processes) Rate Equation: $$ k = A \exp\left(-\frac{E_a}{RT}\right) $$ Linearized Form (for regression): $$ \ln(k) = \ln(A) - \frac{E_a}{R} \cdot \frac{1}{T} $$ Parameters: Preston's Equation (CMP) Basic Form: $$ \text{MRR} = K_p \cdot P \cdot V $$ Extended Model: $$ \text{MRR} = K_p \cdot P^a \cdot V^b \cdot f(\text{slurry}, \text{pad}) $$ Where: Lithography Focus-Exposure Model $$ \text{CD} = \beta_0 + \beta_1 E + \beta_2 F + \beta_3 E^2 + \beta_4 F^2 + \beta_5 EF + \varepsilon $$ Variables: Bossung Curve: Plot of CD vs. focus at various exposure levels. Virtual Metrology Mathematics Predicting quality measurements from equipment sensor data in real-time. Model Structure $$ \hat{y} = f(\mathbf{x}_{\text{FDC}}; \boldsymbol{\theta}) $$ Where $\mathbf{x}_{\text{FDC}}$ is Fault Detection and Classification sensor data. EWMA Run-to-Run Control Exponentially Weighted Moving Average: $$ \hat{T}_{n+1} = \lambda y_n + (1-\lambda)\hat{T}_n $$ Properties: Kalman Filter Approach State Equation: $$ \mathbf{x}_{k} = \mathbf{A}\mathbf{x}_{k-1} + \mathbf{w}_k, \quad \mathbf{w}_k \sim N(\mathbf{0}, \mathbf{Q}) $$ Measurement Equation: $$ y_k = \mathbf{H}\mathbf{x}_k + v_k, \quad v_k \sim N(0, R) $$ Update Equations: Predict: $$ \hat{\mathbf{x}}_{k|k-1} = \mathbf{A}\hat{\mathbf{x}}_{k-1|k-1} $$ $$ \mathbf{P}_{k|k-1} = \mathbf{A}\mathbf{P}_{k-1|k-1}\mathbf{A}^\top + \mathbf{Q} $$ Update: $$ \mathbf{K}_k = \mathbf{P}_{k|k-1}\mathbf{H}^\top(\mathbf{H}\mathbf{P}_{k|k-1}\mathbf{H}^\top + R)^{-1} $$ $$ \hat{\mathbf{x}}_{k|k} = \hat{\mathbf{x}}_{k|k-1} + \mathbf{K}_k(y_k - \mathbf{H}\hat{\mathbf{x}}_{k|k-1}) $$ Classification and Count Models Logistic Regression (Binary Outcomes) For pass/fail or defect/no-defect classification: Model: $$ P(Y=1|\mathbf{x}) = \frac{1}{1 + \exp(-\mathbf{x}^\top\boldsymbol{\beta})} = \sigma(\mathbf{x}^\top\boldsymbol{\beta}) $$ Logit Link: $$ \text{logit}(p) = \ln\left(\frac{p}{1-p}\right) = \mathbf{x}^\top\boldsymbol{\beta} $$ Log-Likelihood: $$ \ell(\boldsymbol{\beta}) = \sum_{i=1}^{n}\left[y_i \log(\pi_i) + (1-y_i)\log(1-\pi_i)\right] $$ Newton-Raphson Update: $$ \boldsymbol{\beta}^{(t+1)} = \boldsymbol{\beta}^{(t)} + (\mathbf{X}^\top\mathbf{W}\mathbf{X})^{-1}\mathbf{X}^\top(\mathbf{y} - \boldsymbol{\pi}) $$ Where $\mathbf{W} = \text{diag}(\pi_i(1-\pi_i))$. Poisson Regression (Defect Counts) Model: $$ \log(\mu) = \mathbf{x}^\top\boldsymbol{\beta}, \quad Y \sim \text{Poisson}(\mu) $$ Probability Mass Function: $$ P(Y = y) = \frac{\mu^y e^{-\mu}}{y!} $$ Model Validation and Diagnostics Goodness of Fit Metrics Coefficient of Determination: $$ R^2 = 1 - \frac{\text{SSE}}{\text{SST}} = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2} $$ Adjusted R-Squared: $$ R^2_{\text{adj}} = 1 - (1-R^2)\frac{n-1}{n-k-1} $$ Root Mean Square Error: $$ \text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2} $$ Mean Absolute Error: $$ \text{MAE} = \frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y}_i| $$ Cross-Validation K-Fold CV Error: $$ \text{CV}_{(K)} = \frac{1}{K}\sum_{k=1}^{K}\text{MSE}_k $$ Leave-One-Out CV: $$ \text{LOOCV} = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_{(-i)})^2 $$ Information Criteria Akaike Information Criterion: $$ \text{AIC} = 2k - 2\ln(\hat{L}) $$ Bayesian Information Criterion: $$ \text{BIC} = k\ln(n) - 2\ln(\hat{L}) $$ Diagnostic Statistics Variance Inflation Factor: $$ \text{VIF}_j = \frac{1}{1-R_j^2} $$ Where $R_j^2$ is the $R^2$ from regressing $x_j$ on all other predictors. Rule of thumb: VIF > 10 indicates problematic multicollinearity. Cook's Distance: $$ D_i = \frac{(\hat{\mathbf{y}} - \hat{\mathbf{y}}_{(-i)})^\top(\hat{\mathbf{y}} - \hat{\mathbf{y}}_{(-i)})}{k \cdot \text{MSE}} $$ Leverage: $$ h_{ii} = [\mathbf{H}]_{ii} $$ Where $\mathbf{H} = \mathbf{X}(\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top$ is the hat matrix. Studentized Residuals: $$ r_i = \frac{e_i}{\hat{\sigma}\sqrt{1 - h_{ii}}} $$ Bayesian Regression Provides full uncertainty quantification for risk-sensitive manufacturing decisions. Bayesian Linear Regression Prior: $$ \boldsymbol{\beta} | \sigma^2 \sim N(\boldsymbol{\beta}_0, \sigma^2\mathbf{V}_0) $$ $$ \sigma^2 \sim \text{Inverse-Gamma}(a_0, b_0) $$ Posterior: $$ \boldsymbol{\beta} | \mathbf{y}, \sigma^2 \sim N(\boldsymbol{\beta}_n, \sigma^2\mathbf{V}_n) $$ Posterior Parameters: $$ \mathbf{V}_n = (\mathbf{V}_0^{-1} + \mathbf{X}^\top\mathbf{X})^{-1} $$ $$ \boldsymbol{\beta}_n = \mathbf{V}_n(\mathbf{V}_0^{-1}\boldsymbol{\beta}_0 + \mathbf{X}^\top\mathbf{y}) $$ Predictive Distribution $$ p(y_|\mathbf{x}_, \mathbf{y}) = \int p(y_|\mathbf{x}_, \boldsymbol{\beta}, \sigma^2) \, p(\boldsymbol{\beta}, \sigma^2|\mathbf{y}) \, d\boldsymbol{\beta} \, d\sigma^2 $$ For conjugate priors, this is a Student-t distribution. Credible Intervals 95% Credible Interval for $\beta_j$: $$ \beta_j \in \left[\hat{\beta}_j - t_{0.025, u}\cdot \text{SE}(\hat{\beta}_j), \quad \hat{\beta}_j + t_{0.025, u}\cdot \text{SE}(\hat{\beta}_j)\right] $$ Design of Experiments (DOE) Full Factorial Design For $k$ factors at 2 levels: $$ N = 2^k \text{ runs} $$ Fractional Factorial Design $$ N = 2^{k-p} \text{ runs} $$ Resolution: Central Composite Design (CCD) Components: Rotatability Condition: $$ \alpha = (2^k)^{1/4} $$ D-Optimal Design Maximizes the determinant of the information matrix: $$ \max_{\mathbf{X}} |\mathbf{X}^\top\mathbf{X}| $$ Equivalently, minimizes the generalized variance of $\hat{\boldsymbol{\beta}}$. I-Optimal Design Minimizes average prediction variance: $$ \min_{\mathbf{X}} \int_{\mathcal{R}} \text{Var}(\hat{y}(\mathbf{x})) \, d\mathbf{x} $$ Reliability Analysis Cox Proportional Hazards Model Hazard Function: $$ h(t|\mathbf{x}) = h_0(t) \cdot \exp(\mathbf{x}^\top\boldsymbol{\beta}) $$ Where: Partial Likelihood $$ L(\boldsymbol{\beta}) = \prod_{i: \delta_i = 1} \frac{\exp(\mathbf{x}_i^\top\boldsymbol{\beta})}{\sum_{j \in \mathcal{R}(t_i)} \exp(\mathbf{x}_j^\top\boldsymbol{\beta})} $$ Where $\mathcal{R}(t_i)$ is the risk set at time $t_i$. Challenge-Method Mapping Equations Quick Reference Estimation $$ \hat{\boldsymbol{\beta}}_{\text{OLS}} = (\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{y} $$ $$ \hat{\boldsymbol{\beta}}_{\text{Ridge}} = (\mathbf{X}^\top\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^\top\mathbf{y} $$ Prediction Interval $$ \hat{y}_0 \pm t_{\alpha/2, n-k-1} \cdot \sqrt{\text{MSE}\left(1 + \mathbf{x}_0^\top(\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{x}_0\right)} $$ Confidence Interval for $\beta_j$ $$ \hat{\beta}_j \pm t_{\alpha/2, n-k-1} \cdot \text{SE}(\hat{\beta}_j) $$ Process Capability $$ C_p = \frac{\text{USL} - \text{LSL}}{6\sigma} $$ $$ C_{pk} = \min\left(\frac{\text{USL} - \mu}{3\sigma}, \frac{\mu - \text{LSL}}{3\sigma}\right) $$ Reference From EUV lithography to CUDA optimization β search the full knowledge base or chat with our AI assistant.Index Name Formula $Z_0^0$ Piston $1$ $Z_1^{-1}$ Tilt Y $r\sin\theta$ $Z_1^{1}$ Tilt X $r\cos\theta$ $Z_2^{-2}$ Astigmatism 45Β° $r^2\sin 2\theta$ $Z_2^{0}$ Defocus $2r^2 - 1$ $Z_2^{2}$ Astigmatism 0Β° $r^2\cos 2\theta$ $Z_3^{-1}$ Coma Y $(3r^3 - 2r)\sin\theta$ $Z_3^{1}$ Coma X $(3r^3 - 2r)\cos\theta$ $Z_4^{0}$ Spherical $6r^4 - 6r^2 + 1$ Manufacturing Challenge Mathematical Approach High dimensionality PLS, LASSO, Elastic Net Multicollinearity Ridge regression, PCR, VIF analysis Spatial wafer patterns Zernike polynomials, GP regression Hierarchical data Mixed effects models, REML Nonlinear processes RSM, polynomial models, transformations Physics constraints Arrhenius, Preston equation integration Uncertainty quantification Bayesian methods, bootstrap, prediction intervals Binary outcomes Logistic regression Count data Poisson regression Real-time control Kalman filter, EWMA Time-to-failure Cox proportional hazards Symbol Description $\mathbf{y}$ Response vector $\mathbf{X}$ Design matrix $\boldsymbol{\beta}$ Coefficient vector $\hat{\boldsymbol{\beta}}$ Estimated coefficients $\boldsymbol{\varepsilon}$ Error vector $\sigma^2$ Error variance $\lambda$ Regularization parameter $\mathbf{I}$ Identity matrix $\ \cdot\ _1$ L1 norm (sum of absolute values) $\ \cdot\ _2$ L2 norm (Euclidean) $\mathbf{A}^\top$ Matrix transpose $\mathbf{A}^{-1}$ Matrix inverse $ \mathbf{A} $ Matrix determinant $N(\mu, \sigma^2)$ Normal distribution $\mathcal{GP}$ Gaussian Process Explore 500+ Semiconductor & AI Topics