Homeβ€Ί Knowledge Baseβ€Ί Semiconductor Manufacturing Process Parameters Monitoring: Mathematical Modeling

Semiconductor Manufacturing Process Parameters Monitoring: Mathematical Modeling

Keywords: process monitoring, semiconductor process control, spc, statistical process control, sensor data, fault detection, run-to-run control, process optimization


Semiconductor Manufacturing Process Parameters Monitoring: Mathematical Modeling

1. The Fundamental Challenge

Modern semiconductor fabrication involves 500–1000+ sequential process steps, each with dozens of parameters requiring nanometer-scale precision.

Key Process Types and Parameters

2. Statistical Process Control (SPC) β€” The Foundation

2.1 Univariate Control Charts

For a process parameter $X$ with samples $x_1, x_2, \ldots, x_n$:

Sample Mean:

$$ \bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i $$

Sample Standard Deviation:

$$ \sigma = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2} $$

Control Limits (3-sigma):

$$ \text{UCL} = \bar{x} + 3\sigma $$

$$ \text{LCL} = \bar{x} - 3\sigma $$

2.2 Process Capability Indices

These quantify how well a process meets specifications:

$$ C_p = \frac{USL - LSL}{6\sigma} $$

$$ C_{pk} = \min\left[\frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma}\right] $$

$$ C_{pm} = \frac{C_p}{\sqrt{1 + \left(\frac{\mu - T}{\sigma}\right)^2}} $$

Semiconductor fabs typically require $C_{pk} \geq 1.67$, corresponding to defect rates below ~1 ppm.

3. Multivariate Statistical Monitoring

Since process parameters are highly correlated, univariate methods miss interaction effects.

3.1 Principal Component Analysis (PCA)

Given data matrix $\mathbf{X}$ ($n$ samples Γ— $p$ variables), centered:

1. Compute covariance matrix:

$$ \mathbf{S} = \frac{1}{n-1}\mathbf{X}^T\mathbf{X} $$

2. Eigendecomposition:

$$ \mathbf{S} = \mathbf{V}\mathbf{\Lambda}\mathbf{V}^T $$

3. Project to principal components:

$$ \mathbf{T} = \mathbf{X}\mathbf{V} $$

3.2 Monitoring Statistics

Hotelling's $T^2$ Statistic

Captures variation within the PCA model:

$$ T^2 = \sum_{i=1}^{k} \frac{t_i^2}{\lambda_i} $$

where $k$ is the number of retained components. Under normal operation, $T^2$ follows a scaled F-distribution.

Q-Statistic (Squared Prediction Error)

Captures variation outside the model:

$$ Q = \sum_{j=1}^{p}(x_j - \hat{x}_j)^2 = \|\mathbf{x} - \mathbf{x}\mathbf{V}_k\mathbf{V}_k^T\|^2 $$

Often more sensitive to novel faults than $T^2$.

3.3 Partial Least Squares (PLS)

When relating process inputs $\mathbf{X}$ to quality outputs $\mathbf{Y}$:

$$ \mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{E} $$

PLS finds latent variables that maximize covariance between $\mathbf{X}$ and $\mathbf{Y}$, providing both monitoring capability and a predictive model.

4. Virtual Metrology (VM) Models

Virtual metrology predicts physical measurement outcomes from process sensor data, enabling 100% wafer coverage without costly measurements.

4.1 Linear Models

For process parameters $\mathbf{x} \in \mathbb{R}^p$ and metrology target $y$:

$$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} $$

$$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^T\mathbf{y} $$

$$ \min_{\boldsymbol{\beta}} \|\mathbf{y} - \mathbf{X}\boldsymbol{\beta}\|^2 + \lambda\|\boldsymbol{\beta}\|_1 $$

4.2 Nonlinear Models

Gaussian Process Regression (GPR)

$$ y \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')) $$

Posterior predictive distribution:

$$ \mu_ = \mathbf{K}_^T(\mathbf{K} + \sigma_n^2\mathbf{I})^{-1}\mathbf{y} $$

$$ \sigma_^2 = K_{**} - \mathbf{K}_^T(\mathbf{K} + \sigma_n^2\mathbf{I})^{-1}\mathbf{K}_* $$

GPs provide uncertainty quantification β€” critical for knowing when to trigger actual metrology.

Support Vector Regression (SVR)

$$ \min \frac{1}{2}\|\mathbf{w}\|^2 + C\sum_i(\xi_i + \xi_i^*) $$

Subject to $\epsilon$-insensitive tube constraints. Kernel trick enables nonlinear modeling.

Neural Networks

5. Run-to-Run (R2R) Control

R2R control adjusts recipe setpoints between wafers/lots to compensate for drift and disturbances.

5.1 EWMA Controller

For a process with model $y = a_0 + a_1 u + \epsilon$:

Prediction update:

$$ \hat{y}_{k+1} = \lambda y_k + (1-\lambda)\hat{y}_k $$

Control action:

$$ u_{k+1} = \frac{T - \hat{y}_{k+1} + a_0}{a_1} $$

where:

5.2 Double EWMA (for Linear Drift)

When process drifts linearly:

$$ \hat{y}_{k+1} = a_k + b_k $$

$$ a_k = \lambda y_k + (1-\lambda)(a_{k-1} + b_{k-1}) $$

$$ b_k = \gamma(a_k - a_{k-1}) + (1-\gamma)b_{k-1} $$

5.3 State-Space Formulation

More general framework:

State equation:

$$ \mathbf{x}_{k+1} = \mathbf{A}\mathbf{x}_k + \mathbf{B}\mathbf{u}_k + \mathbf{w}_k $$

Observation equation:

$$ \mathbf{y}_k = \mathbf{C}\mathbf{x}_k + \mathbf{D}\mathbf{u}_k + \mathbf{v}_k $$

Use Kalman filtering for state estimation and LQR/MPC for optimal control.

5.4 Model Predictive Control (MPC)

Objective function:

$$ \min \sum_{i=1}^{N} \|\mathbf{y}_{k+i} - \mathbf{r}_{k+i}\|_\mathbf{Q}^2 + \sum_{j=0}^{N-1}\|\Delta\mathbf{u}_{k+j}\|_\mathbf{R}^2 $$

subject to process model and operational constraints.

MPC handles multivariable systems with constraints naturally.

6. Fault Detection and Classification (FDC)

6.1 Detection Methods

Mahalanobis Distance

$$ D^2 = (\mathbf{x} - \boldsymbol{\mu})^T\mathbf{S}^{-1}(\mathbf{x} - \boldsymbol{\mu}) $$

Follows $\chi^2$ distribution under multivariate normality.

Other Detection Methods

6.2 Classification Features

For trace data (time-series from sensors), extract features:

6.3 Classification Algorithms

7. Spatial Modeling (Within-Wafer Variation)

Systematic spatial patterns require explicit modeling.

7.1 Polynomial Basis Expansion

Zernike Polynomials (common in lithography)

$$ z(\rho, \theta) = \sum_{n,m} Z_n^m(\rho, \theta) $$

These form an orthogonal basis on the unit disk, capturing radial and azimuthal variation.

7.2 Gaussian Process Spatial Models

$$ y(\mathbf{s}) \sim \mathcal{GP}(\mu(\mathbf{s}), k(\mathbf{s}, \mathbf{s}')) $$

Common Covariance Kernels

$$ k(\mathbf{s}, \mathbf{s}') = \sigma^2 \exp\left(-\frac{\|\mathbf{s} - \mathbf{s}'\|^2}{2\ell^2}\right) $$

$$ k(r) = \sigma^2 \frac{2^{1- u}}{\Gamma( u)}\left(\frac{\sqrt{2 u}r}{\ell}\right)^ u K_ u\left(\frac{\sqrt{2 u}r}{\ell}\right) $$

where $K_ u$ is the modified Bessel function of the second kind.

8. Dynamic/Time-Series Modeling

For plasma processes, endpoint detection, and transient behavior.

8.1 Autoregressive Models

AR(p) model:

$$ x_t = \sum_{i=1}^{p} \phi_i x_{t-i} + \epsilon_t $$

ARIMA extends this to non-stationary series.

8.2 Dynamic PCA

Augment data with time-lagged values:

$$ \tilde{\mathbf{X}} = [\mathbf{X}(t), \mathbf{X}(t-1), \ldots, \mathbf{X}(t-l)] $$

Then apply standard PCA to capture temporal dynamics.

8.3 Deep Sequence Models

LSTM Networks

Gating mechanisms:

Cell state update:

$$ c_t = f_t \odot c_{t-1} + i_t \odot \tilde{c}_t $$

Hidden state:

$$ h_t = o_t \odot \tanh(c_t) $$

9. Model Maintenance and Adaptation

Semiconductor processes drift β€” models must adapt.

9.1 Drift Detection Methods

CUSUM (Cumulative Sum)

$$ S_k = \max(0, S_{k-1} + (x_k - \mu_0) - k) $$

Signal when $S_k$ exceeds threshold.

Page-Hinkley Test

$$ m_k = \sum_{i=1}^{k}(x_i - \bar{x}_k - \delta) $$

$$ M_k = \max_{i \leq k} m_i $$

Alarm when $M_k - m_k > \lambda$.

ADWIN (Adaptive Windowing)

Automatically detects distribution changes and adjusts window size.

9.2 Online Model Updating

Recursive Least Squares (RLS)

$$ \hat{\boldsymbol{\beta}}_k = \hat{\boldsymbol{\beta}}_{k-1} + \mathbf{K}_k(y_k - \mathbf{x}_k^T\hat{\boldsymbol{\beta}}_{k-1}) $$

where $\mathbf{K}_k$ is the gain matrix updated via the Riccati equation:

$$ \mathbf{K}_k = \frac{\mathbf{P}_{k-1}\mathbf{x}_k}{\lambda + \mathbf{x}_k^T\mathbf{P}_{k-1}\mathbf{x}_k} $$

$$ \mathbf{P}_k = \frac{1}{\lambda}(\mathbf{P}_{k-1} - \mathbf{K}_k\mathbf{x}_k^T\mathbf{P}_{k-1}) $$

Just-in-Time (JIT) Learning

Build local models around each new prediction point using nearest historical samples.

10. Integrated Framework

A complete monitoring system layers these methods:

LayerMethodsPurpose
PreprocessingCleaning, synchronization, normalizationData quality
Feature EngineeringDomain features, wavelets, PCADimensionality management
Monitoring$T^2$, Q-statistic, control chartsDetect out-of-control states
Virtual MetrologyPLS, GPR, neural networksPredict quality without measurement
FDCClassification modelsDiagnose fault root causes
ControlR2R, MPCCompensate for drift/disturbances
AdaptationOnline learning, drift detectionMaintain model validity

11. Key Mathematical Challenges

1. High dimensionality β€” hundreds of sensors, requiring regularization and dimension reduction 2. Collinearity β€” process variables are physically coupled 3. Non-stationarity β€” drift, maintenance events, recipe changes 4. Small sample sizes β€” new recipes have limited historical data (transfer learning, Bayesian methods help) 5. Real-time constraints β€” decisions needed in seconds 6. Rare events β€” faults are infrequent, creating class imbalance

12. Key Equations

Process Capability

$$ C_{pk} = \min\left[\frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma}\right] $$

Multivariate Monitoring

$$ T^2 = \sum_{i=1}^{k} \frac{t_i^2}{\lambda_i}, \quad Q = \|\mathbf{x} - \hat{\mathbf{x}}\|^2 $$

Virtual Metrology (Ridge Regression)

$$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^T\mathbf{y} $$

EWMA Control

$$ \hat{y}_{k+1} = \lambda y_k + (1-\lambda)\hat{y}_k $$

Mahalanobis Distance

$$ D^2 = (\mathbf{x} - \boldsymbol{\mu})^T\mathbf{S}^{-1}(\mathbf{x} - \boldsymbol{\mu}) $$


Source: ChipFoundryServices β€” Search this topic β€” Ask CFSGPT

process monitoringsemiconductor process controlspcstatistical process controlsensor datafault detectionrun-to-run controlprocess optimization

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization β€” search the full knowledge base or chat with our AI assistant.