process digital twin, digital manufacturing
Simulate process physics.
9,967 technical terms and definitions
Simulate process physics.
Complete sequence of process steps to build a chip.
Individual chamber in multi-chamber or cluster tool.
Test structures tracking process.
Measure effective process corner.
# Semiconductor Manufacturing Process Parameters Monitoring: Mathematical Modeling ## 1. The Fundamental Challenge Modern semiconductor fabrication involves 500–1000+ sequential process steps, each with dozens of parameters requiring nanometer-scale precision. ### Key Process Types and Parameters - **Lithography**: exposure dose, focus, overlay alignment, resist thickness - **Etching (dry/wet)**: etch rate, selectivity, uniformity, plasma parameters (power, pressure, gas flows) - **Deposition (CVD, PVD, ALD)**: deposition rate, film thickness, uniformity, stress, composition - **CMP (Chemical Mechanical Polishing)**: removal rate, within-wafer non-uniformity, dishing, erosion - **Implantation**: dose, energy, angle, uniformity - **Thermal processes**: temperature uniformity, ramp rates, time ## 2. Statistical Process Control (SPC) — The Foundation ### 2.1 Univariate Control Charts For a process parameter $X$ with samples $x_1, x_2, \ldots, x_n$: **Sample Mean:** $$ \bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i $$ **Sample Standard Deviation:** $$ \sigma = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2} $$ **Control Limits (3-sigma):** $$ \text{UCL} = \bar{x} + 3\sigma $$ $$ \text{LCL} = \bar{x} - 3\sigma $$ ### 2.2 Process Capability Indices These quantify how well a process meets specifications: - **$C_p$ (Potential Capability):** $$ C_p = \frac{USL - LSL}{6\sigma} $$ - **$C_{pk}$ (Actual Capability)** — accounts for centering: $$ C_{pk} = \min\left[\frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma}\right] $$ - **$C_{pm}$ (Taguchi Index)** — penalizes deviation from target $T$: $$ C_{pm} = \frac{C_p}{\sqrt{1 + \left(\frac{\mu - T}{\sigma}\right)^2}} $$ Semiconductor fabs typically require $C_{pk} \geq 1.67$, corresponding to defect rates below ~1 ppm. ## 3. Multivariate Statistical Monitoring Since process parameters are highly correlated, univariate methods miss interaction effects. ### 3.1 Principal Component Analysis (PCA) Given data matrix $\mathbf{X}$ ($n$ samples × $p$ variables), centered: 1. **Compute covariance matrix:** $$ \mathbf{S} = \frac{1}{n-1}\mathbf{X}^T\mathbf{X} $$ 2. **Eigendecomposition:** $$ \mathbf{S} = \mathbf{V}\mathbf{\Lambda}\mathbf{V}^T $$ 3. **Project to principal components:** $$ \mathbf{T} = \mathbf{X}\mathbf{V} $$ ### 3.2 Monitoring Statistics #### Hotelling's $T^2$ Statistic Captures variation **within** the PCA model: $$ T^2 = \sum_{i=1}^{k} \frac{t_i^2}{\lambda_i} $$ where $k$ is the number of retained components. Under normal operation, $T^2$ follows a scaled F-distribution. #### Q-Statistic (Squared Prediction Error) Captures variation **outside** the model: $$ Q = \sum_{j=1}^{p}(x_j - \hat{x}_j)^2 = \|\mathbf{x} - \mathbf{x}\mathbf{V}_k\mathbf{V}_k^T\|^2 $$ > Often more sensitive to novel faults than $T^2$. ### 3.3 Partial Least Squares (PLS) When relating process inputs $\mathbf{X}$ to quality outputs $\mathbf{Y}$: $$ \mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{E} $$ PLS finds latent variables that maximize covariance between $\mathbf{X}$ and $\mathbf{Y}$, providing both monitoring capability and a predictive model. ## 4. Virtual Metrology (VM) Models Virtual metrology predicts physical measurement outcomes from process sensor data, enabling 100% wafer coverage without costly measurements. ### 4.1 Linear Models For process parameters $\mathbf{x} \in \mathbb{R}^p$ and metrology target $y$: - **Ordinary Least Squares (OLS):** $$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} $$ - **Ridge Regression** ($L_2$ regularization for collinearity): $$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^T\mathbf{y} $$ - **LASSO** ($L_1$ regularization for sparsity/feature selection): $$ \min_{\boldsymbol{\beta}} \|\mathbf{y} - \mathbf{X}\boldsymbol{\beta}\|^2 + \lambda\|\boldsymbol{\beta}\|_1 $$ ### 4.2 Nonlinear Models #### Gaussian Process Regression (GPR) $$ y \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')) $$ **Posterior predictive distribution:** - **Mean:** $$ \mu_* = \mathbf{K}_*^T(\mathbf{K} + \sigma_n^2\mathbf{I})^{-1}\mathbf{y} $$ - **Variance:** $$ \sigma_*^2 = K_{**} - \mathbf{K}_*^T(\mathbf{K} + \sigma_n^2\mathbf{I})^{-1}\mathbf{K}_* $$ GPs provide uncertainty quantification — critical for knowing when to trigger actual metrology. #### Support Vector Regression (SVR) $$ \min \frac{1}{2}\|\mathbf{w}\|^2 + C\sum_i(\xi_i + \xi_i^*) $$ Subject to $\epsilon$-insensitive tube constraints. Kernel trick enables nonlinear modeling. #### Neural Networks - **MLPs**: Multi-layer perceptrons for general function approximation - **CNNs**: Convolutional neural networks for wafer map pattern recognition - **LSTMs**: Long Short-Term Memory networks for time-series FDC traces ## 5. Run-to-Run (R2R) Control R2R control adjusts recipe setpoints between wafers/lots to compensate for drift and disturbances. ### 5.1 EWMA Controller For a process with model $y = a_0 + a_1 u + \epsilon$: **Prediction update:** $$ \hat{y}_{k+1} = \lambda y_k + (1-\lambda)\hat{y}_k $$ **Control action:** $$ u_{k+1} = \frac{T - \hat{y}_{k+1} + a_0}{a_1} $$ where: - $T$ is the target - $\lambda \in (0,1)$ is the smoothing weight ### 5.2 Double EWMA (for Linear Drift) When process drifts linearly: $$ \hat{y}_{k+1} = a_k + b_k $$ $$ a_k = \lambda y_k + (1-\lambda)(a_{k-1} + b_{k-1}) $$ $$ b_k = \gamma(a_k - a_{k-1}) + (1-\gamma)b_{k-1} $$ ### 5.3 State-Space Formulation More general framework: **State equation:** $$ \mathbf{x}_{k+1} = \mathbf{A}\mathbf{x}_k + \mathbf{B}\mathbf{u}_k + \mathbf{w}_k $$ **Observation equation:** $$ \mathbf{y}_k = \mathbf{C}\mathbf{x}_k + \mathbf{D}\mathbf{u}_k + \mathbf{v}_k $$ Use **Kalman filtering** for state estimation and **LQR/MPC** for optimal control. ### 5.4 Model Predictive Control (MPC) **Objective function:** $$ \min \sum_{i=1}^{N} \|\mathbf{y}_{k+i} - \mathbf{r}_{k+i}\|_\mathbf{Q}^2 + \sum_{j=0}^{N-1}\|\Delta\mathbf{u}_{k+j}\|_\mathbf{R}^2 $$ subject to process model and operational constraints. > MPC handles multivariable systems with constraints naturally. ## 6. Fault Detection and Classification (FDC) ### 6.1 Detection Methods #### Mahalanobis Distance $$ D^2 = (\mathbf{x} - \boldsymbol{\mu})^T\mathbf{S}^{-1}(\mathbf{x} - \boldsymbol{\mu}) $$ Follows $\chi^2$ distribution under multivariate normality. #### Other Detection Methods - **One-Class SVM**: Learn boundary of normal operation - **Autoencoders**: Detect anomalies via reconstruction error ### 6.2 Classification Features For trace data (time-series from sensors), extract features: - **Statistical moments**: mean, variance, skewness, kurtosis - **Frequency domain**: FFT coefficients, spectral power - **Wavelet coefficients**: Multi-resolution analysis - **DTW distances**: Dynamic Time Warping to reference signatures ### 6.3 Classification Algorithms - Support Vector Machines (SVM) - Random Forest - CNNs for pattern recognition on wafer maps - Gradient Boosting (XGBoost, LightGBM) ## 7. Spatial Modeling (Within-Wafer Variation) Systematic spatial patterns require explicit modeling. ### 7.1 Polynomial Basis Expansion #### Zernike Polynomials (common in lithography) $$ z(\rho, \theta) = \sum_{n,m} Z_n^m(\rho, \theta) $$ These form an orthogonal basis on the unit disk, capturing radial and azimuthal variation. ### 7.2 Gaussian Process Spatial Models $$ y(\mathbf{s}) \sim \mathcal{GP}(\mu(\mathbf{s}), k(\mathbf{s}, \mathbf{s}')) $$ #### Common Covariance Kernels - **Squared Exponential (RBF):** $$ k(\mathbf{s}, \mathbf{s}') = \sigma^2 \exp\left(-\frac{\|\mathbf{s} - \mathbf{s}'\|^2}{2\ell^2}\right) $$ - **Matérn** (more flexible smoothness): $$ k(r) = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)}\left(\frac{\sqrt{2\nu}r}{\ell}\right)^\nu K_\nu\left(\frac{\sqrt{2\nu}r}{\ell}\right) $$ where $K_\nu$ is the modified Bessel function of the second kind. ## 8. Dynamic/Time-Series Modeling For plasma processes, endpoint detection, and transient behavior. ### 8.1 Autoregressive Models **AR(p) model:** $$ x_t = \sum_{i=1}^{p} \phi_i x_{t-i} + \epsilon_t $$ ARIMA extends this to non-stationary series. ### 8.2 Dynamic PCA Augment data with time-lagged values: $$ \tilde{\mathbf{X}} = [\mathbf{X}(t), \mathbf{X}(t-1), \ldots, \mathbf{X}(t-l)] $$ Then apply standard PCA to capture temporal dynamics. ### 8.3 Deep Sequence Models #### LSTM Networks Gating mechanisms: - **Forget gate:** $f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$ - **Input gate:** $i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$ - **Output gate:** $o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$ **Cell state update:** $$ c_t = f_t \odot c_{t-1} + i_t \odot \tilde{c}_t $$ **Hidden state:** $$ h_t = o_t \odot \tanh(c_t) $$ ## 9. Model Maintenance and Adaptation Semiconductor processes drift — models must adapt. ### 9.1 Drift Detection Methods #### CUSUM (Cumulative Sum) $$ S_k = \max(0, S_{k-1} + (x_k - \mu_0) - k) $$ Signal when $S_k$ exceeds threshold. #### Page-Hinkley Test $$ m_k = \sum_{i=1}^{k}(x_i - \bar{x}_k - \delta) $$ $$ M_k = \max_{i \leq k} m_i $$ Alarm when $M_k - m_k > \lambda$. #### ADWIN (Adaptive Windowing) Automatically detects distribution changes and adjusts window size. ### 9.2 Online Model Updating #### Recursive Least Squares (RLS) $$ \hat{\boldsymbol{\beta}}_k = \hat{\boldsymbol{\beta}}_{k-1} + \mathbf{K}_k(y_k - \mathbf{x}_k^T\hat{\boldsymbol{\beta}}_{k-1}) $$ where $\mathbf{K}_k$ is the gain matrix updated via the Riccati equation: $$ \mathbf{K}_k = \frac{\mathbf{P}_{k-1}\mathbf{x}_k}{\lambda + \mathbf{x}_k^T\mathbf{P}_{k-1}\mathbf{x}_k} $$ $$ \mathbf{P}_k = \frac{1}{\lambda}(\mathbf{P}_{k-1} - \mathbf{K}_k\mathbf{x}_k^T\mathbf{P}_{k-1}) $$ #### Just-in-Time (JIT) Learning Build local models around each new prediction point using nearest historical samples. ## 10. Integrated Framework A complete monitoring system layers these methods: | Layer | Methods | Purpose | |-------|---------|---------| | **Preprocessing** | Cleaning, synchronization, normalization | Data quality | | **Feature Engineering** | Domain features, wavelets, PCA | Dimensionality management | | **Monitoring** | $T^2$, Q-statistic, control charts | Detect out-of-control states | | **Virtual Metrology** | PLS, GPR, neural networks | Predict quality without measurement | | **FDC** | Classification models | Diagnose fault root causes | | **Control** | R2R, MPC | Compensate for drift/disturbances | | **Adaptation** | Online learning, drift detection | Maintain model validity | ## 11. Key Mathematical Challenges 1. **High dimensionality** — hundreds of sensors, requiring regularization and dimension reduction 2. **Collinearity** — process variables are physically coupled 3. **Non-stationarity** — drift, maintenance events, recipe changes 4. **Small sample sizes** — new recipes have limited historical data (transfer learning, Bayesian methods help) 5. **Real-time constraints** — decisions needed in seconds 6. **Rare events** — faults are infrequent, creating class imbalance ## 12. Key Equations ### Process Capability $$ C_{pk} = \min\left[\frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma}\right] $$ ### Multivariate Monitoring $$ T^2 = \sum_{i=1}^{k} \frac{t_i^2}{\lambda_i}, \quad Q = \|\mathbf{x} - \hat{\mathbf{x}}\|^2 $$ ### Virtual Metrology (Ridge Regression) $$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^T\mathbf{y} $$ ### EWMA Control $$ \hat{y}_{k+1} = \lambda y_k + (1-\lambda)\hat{y}_k $$ ### Mahalanobis Distance $$ D^2 = (\mathbf{x} - \boldsymbol{\mu})^T\mathbf{S}^{-1}(\mathbf{x} - \boldsymbol{\mu}) $$
Process node (7nm, 5nm, 3nm) indicates transistor density. Smaller = faster, lower power, more expensive.
Semiconductor technology generation (7nm 5nm 3nm etc).
Process optimization reduces energy by improving efficiency cycle times and yields.
# Optimization: Mathematical Modeling
1. Context
A recipe is a vector of controllable parameters:
$$
\mathbf{x} = \begin{bmatrix} T \\ P \\ Q_1 \\ Q_2 \\ \vdots \\ t \\ P_{\text{RF}} \end{bmatrix} \in \mathbb{R}^n
$$
Where:
- $T$ = Temperature (°C or K)
- $P$ = Pressure (mTorr or Pa)
- $Q_i$ = Gas flow rates (sccm)
- $t$ = Process time (seconds)
- $P_{\text{RF}}$ = RF power (Watts)
Goal : Find optimal $\mathbf{x}$ such that output properties $\mathbf{y}$ meet specifications while accounting for variability.
2. Mathematical Modeling Approaches
2.1 Physics-Based (First-Principles) Models
Chemical Vapor Deposition (CVD) Example
Mass transport and reaction equation:
$$
\frac{\partial C}{\partial t} + \nabla \cdot (\mathbf{u}C) = D\nabla^2 C + R(C, T)
$$
Where:
- $C$ = Species concentration
- $\mathbf{u}$ = Velocity field
- $D$ = Diffusion coefficient
- $R(C, T)$ = Reaction rate
Surface reaction kinetics (Arrhenius form):
$$
k_s = A \exp\left(-\frac{E_a}{RT}\right)
$$
Where:
- $A$ = Pre-exponential factor
- $E_a$ = Activation energy
- $R$ = Gas constant
- $T$ = Temperature
Deposition rate (transport-limited regime):
$$
r = \frac{k_s C_s}{1 + \frac{k_s}{h_g}}
$$
Where:
- $C_s$ = Surface concentration
- $h_g$ = Gas-phase mass transfer coefficient
Characteristics:
- Advantages : Extrapolates outside training data, physically interpretable
- Disadvantages : Computationally expensive, requires detailed mechanism knowledge
2.2 Empirical/Statistical Models (Response Surface Methodology)
Second-order polynomial model:
$$
y = \beta_0 + \sum_{i=1}^{n}\beta_i x_i + \sum_{i=1}^{n}\beta_{ii}x_i^2 + \sum_{i
Process performance indices use actual variation including assignable causes.
Long-term capability.
Duplicate process at new site.
Chain simulators for sequential steps.
Model how process steps affect device structure and properties.
Consistency over time.
Process variations arise from manufacturing tolerances affecting transistor parameters.
Determine usable focus-dose range.
Quantify robustness of process window.
Verify adequate process window.
Range where all specifications are met.
# Process Window
1. Fundamental
A process window is the region in parameter space where a manufacturing step yields acceptable results. Mathematically, for a response function $y(\mathbf{x})$ depending on parameter vector $\mathbf{x} = (x_1, x_2, \ldots, x_n)$:
$$
\text{Process Window} = \{\mathbf{x} : y_{\min} \leq y(\mathbf{x}) \leq y_{\max}\}
$$
2. Single-Parameter Statistics
For a single parameter with lower and upper specification limits (LSL, USL):
Process Capability Indices
- $C_p$ (Process Capability): Measures window width relative to process variation
$$
C_p = \frac{USL - LSL}{6\sigma}
$$
- $C_{pk}$ (Process Capability Index): Accounts for process centering
$$
C_{pk} = \min\left[\frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma}\right]
$$
Industry Standards
- $C_p \geq 1.0$: Process variation fits within specifications
- $C_{pk} \geq 1.33$: 4σ capability (standard requirement)
- $C_{pk} \geq 1.67$: 5σ capability (high-reliability applications)
- $C_{pk} \geq 2.0$: 6σ capability (Six Sigma standard)
3. Lithography: Exposure-Defocus (E-D) Window
The most critical and mathematically developed process window in semiconductor manufacturing.
3.1 Bossung Curve Model
Critical dimension (CD) as a function of exposure dose $E$ and defocus $F$:
$$
CD(E, F) = CD_0 + a_1 E + a_2 F + a_{11} E^2 + a_{22} F^2 + a_{12} EF + \ldots
$$
The process window boundary is defined by:
$$
|CD(E, F) - CD_{\text{target}}| = \Delta CD_{\text{tolerance}}
$$
3.2 Key Metrics
- Exposure Latitude (EL): Percentage dose range for acceptable CD
$$
EL = \frac{E_{\max} - E_{\min}}{E_{\text{nominal}}} \times 100\%
$$
- Depth of Focus (DOF): Focus range for acceptable CD (at given EL)
$$
DOF = F_{\max} - F_{\min}
$$
- Process Window Area: Total acceptable region
$$
A_{PW} = \iint_{\text{acceptable}} dE \, dF
$$
3.3 Rayleigh Equations
Resolution and DOF scale with wavelength $\lambda$ and numerical aperture $NA$:
- Resolution (minimum feature size):
$$
R = k_1 \frac{\lambda}{NA}
$$
- Depth of Focus:
$$
DOF = \pm k_2 \frac{\lambda}{NA^2}
$$
Critical insight: As $k_1$ decreases (smaller features), DOF shrinks as $(k_1)^2$ — process windows collapse rapidly at advanced nodes.
| Technology Node | $k_1$ Factor | Relative DOF |
| --| --| --|
| 180nm | 0.6 | 1.0 |
| 65nm | 0.4 | 0.44 |
| 14nm | 0.3 | 0.25 |
| 5nm (EUV) | 0.25 | 0.17 |
4. Image Quality Metrics
4.1 Normalized Image Log-Slope (NILS)
$$
NILS = w \cdot \frac{1}{I} \left|\frac{dI}{dx}\right|_{\text{edge}}
$$
Where:
- $w$ = feature width
- $I$ = aerial image intensity
- $\frac{dI}{dx}$ = intensity gradient at feature edge
For a coherent imaging system with partial coherence $\sigma$:
$$
NILS \approx \pi \cdot \frac{w}{\lambda/NA} \cdot \text{(contrast factor)}
$$
Interpretation:
- Higher NILS → larger process window
- NILS > 2.0: Robust process
- NILS < 1.5: Marginal process window
- NILS < 1.0: Near resolution limit
4.2 Mask Error Enhancement Factor (MEEF)
$$
MEEF = \frac{\partial CD_{\text{wafer}}}{\partial CD_{\text{mask}}}
$$
Characteristics:
- MEEF = 1: Ideal (1:1 transfer from mask to wafer)
- MEEF > 1: Mask errors are amplified on wafer
- Near resolution limit: MEEF typically 3–4 or higher
- Impacts effective process window: mask CD tolerance = wafer CD tolerance / MEEF
5. Multi-Parameter Process Windows
5.1 Ellipsoid Model
For $n$ interacting parameters, the window is often an $n$-dimensional ellipsoid:
$$
(\mathbf{x} - \mathbf{x}_0)^T \mathbf{A} (\mathbf{x} - \mathbf{x}_0) \leq 1
$$
Where:
- $\mathbf{x}$ = parameter vector $(x_1, x_2, \ldots, x_n)$
- $\mathbf{x}_0$ = optimal operating point (center of ellipsoid)
- $\mathbf{A}$ = positive definite matrix encoding parameter correlations
Geometric interpretation:
- Eigenvalues of $\mathbf{A}$: $\lambda_1, \lambda_2, \ldots, \lambda_n$
- Principal axes lengths: $a_i = 1/\sqrt{\lambda_i}$
- Eigenvectors: orientation of principal axes
5.2 Overlapping Windows
Real processes require multiple steps to simultaneously work:
$$
PW_{\text{total}} = \bigcap_{i=1}^{N} PW_i
$$
Example: Combined lithography + etch window
$$
PW_{\text{combined}} = PW_{\text{litho}}(E, F) \cap PW_{\text{etch}}(P, W, T)
$$
If individual windows are ellipsoids, their intersection is a more complex polytope — often computed numerically via:
- Linear programming
- Convex hull algorithms
- Monte Carlo sampling
6. Response Surface Methodology (RSM)
6.1 Quadratic Model
$$
y = \beta_0 + \sum_{i=1}^{n} \beta_i x_i + \sum_{i=1}^{n} \beta_{ii} x_i^2 + \sum_{i
Process-induced stress from STI spacers and epitaxial layers modulates channel carrier mobility.
Variation from manufacturing.
Processes have isolated memory. Fork for parallelism. More overhead than threads.
Unnecessary process steps.
Prodigy is active learning annotation tool. Efficient labeling. SpaCy integration.
Producer's risk is probability of rejecting good lots due to sampling variation.
Product audits inspect finished goods for specification compliance.
Product carbon footprint quantifies greenhouse gas emissions attributable to specific products.
Write descriptions for products.
Write product descriptions. Features, benefits.
AI-assisted product concepts.
Product lifetime spans from introduction to discontinuation in market.
Handle multiple products.
Product quantization decomposes vectors into subvectors quantized independently for compression.
Product quantization compresses vectors into compact codes for efficient search.
Compress vectors for efficient search.
Tests matching actual devices.
Product stewardship extends manufacturer responsibility to entire product lifecycle including design use and end-of-life management.
AI features should solve real user problems. Avoid AI for AI sake. Measure user value, not just tech metrics.
Production leveling distributes work evenly over time reducing peaks and enabling stable operations.
Plan manufacturing schedule.
Increase production volume.
Production scheduling sequences manufacturing operations optimizing throughput and resource utilization.
Time processing product wafers.
Test lab competence.
Monitor functional relationships.
GPU profilers (Nsight, rocprof) identify bottlenecks. Measure memory, compute, occupancy. Essential for optimization.
Analyze performance bottlenecks.