← Back to AI Factory Chat

AI Factory Glossary

758 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 12 of 16 (758 entries)

probe yield,production

Percentage passing wafer probe test.

probe,mechanistic,interpretability

Mechanistic interpretability reverse-engineers model internals: circuits, features, representations.

probing classifier, interpretability

Probing classifiers test whether representations encode specific linguistic or semantic properties.

probing classifiers, explainable ai

Train classifiers on representations.

probing,ai safety

Train classifiers on internal representations to see what information is encoded.

probing,representation,layer

Probing trains classifiers on hidden states. Reveals what layers encode. Understanding model internals.

problem escalation, quality & reliability

Problem escalation ensures unresolved issues reach appropriate decision makers.

problem notification, quality & reliability

Problem notification systems alert appropriate personnel of issues requiring attention.

procedural generation with ai,content creation

AI-assisted procedural content creation.

process audit, quality & reliability

Process audits examine activities for adherence to procedures.

process capability index,cpk index

Process capability indices compare process spread to specification width.

process capability ratio, spc

Compare capability to requirements.

process capability study, quality

Assess process ability to meet specs.

process capability study, quality & reliability

Process capability studies quantify ability of stable processes to meet specifications.

process capability vs equipment capability, production

Separate assessments.

process capability, cpk, cp, capability index, process capability index, six sigma, dpmo, defect rate, yield

# Semiconductor Manufacturing Process Capability Analysis ## Mathematical Framework for Statistical Process Control ## 1. Foundational Capability Indices ### 1.1 Basic Indices **Process Capability ($C_p$)** — measures process spread relative to specifications: $$ C_p = \frac{USL - LSL}{6\sigma} $$ Where: - $USL$ = Upper Specification Limit - $LSL$ = Lower Specification Limit - $\sigma$ = Process standard deviation (within-subgroup) **Centered Capability ($C_{pk}$)** — accounts for process centering: $$ C_{pk} = \min\left(\frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma}\right) $$ Alternative formulation: $$ C_{pk} = C_p(1 - k) $$ Where the centering factor $k$ is: $$ k = \frac{|\text{Target} - \mu|}{(USL - LSL)/2} $$ ### 1.2 Performance Indices **Process Performance ($P_p$)** — uses overall standard deviation: $$ P_p = \frac{USL - LSL}{6s_{overall}} $$ **Centered Performance ($P_{pk}$)**: $$ P_{pk} = \min\left(\frac{USL - \mu}{3s}, \frac{\mu - LSL}{3s}\right) $$ Key distinction: - $C_p$, $C_{pk}$ use **within-subgroup** variation ($\sigma$) - $P_p$, $P_{pk}$ use **overall** variation ($s$), including between-subgroup effects ## 2. Semiconductor Industry Requirements ### 2.1 Capability Targets Semiconductor manufacturing demands exceptional precision due to: - Nanometer-scale feature sizes (3nm, 5nm, 7nm nodes) - Hundreds of sequential process steps - Extremely tight tolerances - High cost of defects | $C_{pk}$ Value | Sigma Level | DPMO | Typical Application | |----------------|-------------|------|---------------------| | 1.00 | $3\sigma$ | 2,700 | Unacceptable for production | | 1.33 | $4\sigma$ | 63 | Minimum for established processes | | 1.67 | $5\sigma$ | 0.57 | Critical parameters | | 2.00 | $6\sigma$ | 0.002 | Most critical dimensions (CD, overlay) | ### 2.2 Defect Rate Calculation Assuming normal distribution: $$ \text{DPMO} = 10^6 \times 2\Phi(-3C_{pk}) $$ Where $\Phi$ is the standard normal cumulative distribution function. For one-sided specifications: $$ \text{DPMO}_{upper} = 10^6 \times \Phi\left(-\frac{USL - \mu}{\sigma}\right) $$ $$ \text{DPMO}_{lower} = 10^6 \times \Phi\left(-\frac{\mu - LSL}{\sigma}\right) $$ ## 3. Variance Component Decomposition ### 3.1 Hierarchical Variation Model Semiconductor processes exhibit hierarchical variation: $$ \sigma^2_{total} = \sigma^2_{W} + \sigma^2_{W2W} + \sigma^2_{L2L} + \sigma^2_{T2T} $$ Where: - $\sigma^2_{W}$ = Within-wafer variation - $\sigma^2_{W2W}$ = Wafer-to-wafer variation - $\sigma^2_{L2L}$ = Lot-to-lot variation - $\sigma^2_{T2T}$ = Tool-to-tool variation ### 3.2 ANOVA-Based Estimation For nested random effects model: $$ x_{ijkl} = \mu + \alpha_i + \beta_{j(i)} + \gamma_{k(ij)} + \epsilon_{l(ijk)} $$ Variance component estimates: $$ \hat{\sigma}^2_{between} = \frac{MS_{between} - MS_{within}}{n} $$ Expected Mean Squares: $$ E[MS_{lots}] = \sigma^2_W + n_w \sigma^2_{W2W} + n_w n_{wafer} \sigma^2_{L2L} $$ $$ E[MS_{wafers}] = \sigma^2_W + n_w \sigma^2_{W2W} $$ $$ E[MS_{within}] = \sigma^2_W $$ ### 3.3 Practical Implications | Variation Source | Root Cause | Improvement Strategy | |------------------|------------|---------------------| | Within-wafer ($\sigma^2_W$) | Equipment uniformity | Hardware tuning, flow optimization | | Wafer-to-wafer ($\sigma^2_{W2W}$) | Process stability | Run-to-run control, PM schedules | | Lot-to-lot ($\sigma^2_{L2L}$) | Material variation | Incoming inspection, supplier control | | Tool-to-tool ($\sigma^2_{T2T}$) | Equipment matching | Tool qualification, offset adjustment | ## 4. Non-Normal Distributions ### 4.1 Common Non-Normal Parameters Many semiconductor parameters violate normality assumptions: | Parameter | Typical Distribution | Characteristics | |-----------|---------------------|-----------------| | Particle counts | Poisson | Discrete, bounded below by zero | | Contamination levels | Log-normal | Right-skewed, multiplicative effects | | Defect density | Negative binomial | Overdispersed counts | | Overlay errors | Potentially bimodal | Multiple systematic components | | Line-edge roughness | Often skewed | Physical boundary constraints | ### 4.2 Box-Cox Transformation $$ y^{(\lambda)} = \begin{cases} \frac{y^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0 \\[8pt] \ln(y) & \text{if } \lambda = 0 \end{cases} $$ Optimal $\lambda$ found by maximizing the log-likelihood: $$ \ell(\lambda) = -\frac{n}{2}\ln\left(\frac{SS_E(\lambda)}{n}\right) + (\lambda - 1)\sum_{i=1}^n \ln(y_i) $$ Common transformations: - $\lambda = 1$: No transformation - $\lambda = 0.5$: Square root (count data) - $\lambda = 0$: Natural logarithm (multiplicative) - $\lambda = -1$: Reciprocal ### 4.3 Johnson Transformation System Three families covering all continuous distributions: **$S_B$ (Bounded):** $$ z = \gamma + \delta \ln\left(\frac{x - \xi}{\lambda + \xi - x}\right) $$ **$S_L$ (Log-normal):** $$ z = \gamma + \delta \ln(x - \xi) $$ **$S_U$ (Unbounded):** $$ z = \gamma + \delta \sinh^{-1}\left(\frac{x - \xi}{\lambda}\right) $$ ### 4.4 Percentile-Based Capability (Distribution-Free) $$ C_{np} = \frac{USL - LSL}{X_{99.865} - X_{0.135}} $$ $$ C_{npk} = \min\left(\frac{USL - \tilde{x}}{X_{99.865} - \tilde{x}}, \frac{\tilde{x} - LSL}{\tilde{x} - X_{0.135}}\right) $$ Where $\tilde{x}$ is the median. ### 4.5 Clements' Method (Pearson Distributions) $$ C_p = \frac{USL - LSL}{U_p - L_p} $$ $$ C_{pk} = \min\left(\frac{USL - M}{U_p - M}, \frac{M - LSL}{M - L_p}\right) $$ Where: - $U_p$ = 99.865th percentile - $L_p$ = 0.135th percentile - $M$ = Median ## 5. Spatial Process Capability ### 5.1 Spatial Variation Models Wafers exhibit systematic spatial patterns requiring decomposition: **General Model:** $$ x(r, \theta) = \mu + f(r, \theta) + \epsilon $$ **Zernike Polynomial Expansion:** $$ x(r, \theta) = \mu + \sum_{n=0}^{N} \sum_{m=-n}^{n} a_{nm} Z_n^m(r, \theta) + \epsilon $$ Where $Z_n^m(r, \theta)$ are Zernike polynomials. ### 5.2 Practical Spatial Model **Radial Model:** $$ x_{ij} = \mu + \beta_1 r_i + \beta_2 r_i^2 + \epsilon_{ij} $$ **Radial + Angular Model:** $$ x_{ij} = \mu + \beta_1 r_i + \beta_2 r_i^2 + \beta_3 \cos(\theta_j) + \beta_4 \sin(\theta_j) + \epsilon_{ij} $$ ### 5.3 Spatial Capability Index $$ C_{pk,spatial} = \min_{(r,\theta) \in \text{wafer}} \left[ \frac{USL - \hat{\mu}(r,\theta)}{3\hat{\sigma}(r,\theta)}, \frac{\hat{\mu}(r,\theta) - LSL}{3\hat{\sigma}(r,\theta)} \right] $$ ### 5.4 Within-Wafer Non-Uniformity (WIWNU) $$ \text{WIWNU} = \frac{\sigma_{within-wafer}}{\bar{x}} \times 100\% $$ Range-based uniformity: $$ \text{Uniformity}_{\%} = \frac{x_{max} - x_{min}}{2 \bar{x}} \times 100\% $$ ## 6. Multivariate Process Capability ### 6.1 Motivation Critical for correlated parameters: - CD (Critical Dimension) and sidewall angle - Film thickness and uniformity - Overlay X and Y components - Etch depth and profile ### 6.2 Multivariate Capability Indices For $p$-dimensional quality vector $\mathbf{X} \sim N(\boldsymbol{\mu}, \boldsymbol{\Sigma})$: **Taam's Index ($MC_{pm}$):** $$ MC_{pm} = \frac{C_p^*}{d(\boldsymbol{\mu}, \mathbf{T})} $$ Where $d$ is the Mahalanobis distance from process mean to target. ### 6.3 Geometric Approach $$ MC_p = \left[\frac{V(\text{Specification Region})}{V(\text{Process Region})}\right]^{1/p} $$ For ellipsoidal regions: $$ MC_p = \frac{|\mathbf{T}|^{1/(2p)}}{|\boldsymbol{\Sigma}|^{1/(2p)} \cdot (\chi^2_{p, 0.9973})^{1/2}} $$ Where: - $|\mathbf{T}|$ = Determinant of tolerance matrix - $|\boldsymbol{\Sigma}|$ = Determinant of covariance matrix - $\chi^2_{p, 0.9973}$ = Chi-squared critical value ### 6.4 Principal Component Analysis (PCA) Approach Transform correlated variables to uncorrelated principal components: $$ \mathbf{Z} = \mathbf{P}^T(\mathbf{X} - \boldsymbol{\mu}) $$ Where $\mathbf{P}$ contains eigenvectors of $\boldsymbol{\Sigma}$. Individual PC capability: $$ C_{pk,i} = \min\left(\frac{USL_{z_i} - 0}{3\sqrt{\lambda_i}}, \frac{0 - LSL_{z_i}}{3\sqrt{\lambda_i}}\right) $$ ## 7. Yield Models Integration ### 7.1 Defect-Limited Yield Models **Poisson Model:** $$ Y = e^{-D_0 A} $$ **Murphy's Model (Clustered Defects):** $$ Y = \left(\frac{1 - e^{-D_0 A}}{D_0 A}\right)^2 $$ **Seeds' Compound Poisson (Negative Binomial):** $$ Y = \left(1 + \frac{D_0 A}{\alpha}\right)^{-\alpha} $$ Where: - $D_0$ = Average defect density (defects/cm²) - $A$ = Chip area (cm²) - $\alpha$ = Clustering parameter ### 7.2 Parametric Yield For Gaussian parameters: $$ Y_{parametric} = \Phi\left(\frac{USL - \mu}{\sigma}\right) - \Phi\left(\frac{LSL - \mu}{\sigma}\right) $$ Relationship to $C_{pk}$: $$ Y_{parametric} = 2\Phi(3C_{pk}) - 1 $$ ### 7.3 Combined Yield For $n$ independent parameters: $$ Y_{total} = Y_{defect} \times \prod_{i=1}^n Y_{parametric,i} $$ With correlation (multivariate normal): $$ Y_{total} = Y_{defect} \times P(\mathbf{X} \in \text{Spec Region}) $$ ## 8. Measurement System Analysis ### 8.1 Gauge R&R Components $$ \sigma^2_{observed} = \sigma^2_{actual} + \sigma^2_{measurement} $$ $$ \sigma^2_{measurement} = \sigma^2_{repeatability} + \sigma^2_{reproducibility} $$ Expanded: $$ \sigma^2_{reproducibility} = \sigma^2_{operator} + \sigma^2_{operator \times part} $$ ### 8.2 Key Metrics **Precision-to-Tolerance Ratio (P/T):** $$ P/T = \frac{6\sigma_{measurement}}{USL - LSL} \times 100\% $$ Requirement: $P/T < 10\%$ **%GRR:** $$ \%GRR = \frac{\sigma_{measurement}}{\sigma_{total}} \times 100\% $$ **Discrimination Ratio (DR):** $$ DR = \frac{\sigma_{parts}}{\sigma_{gauge}} \times \sqrt{2} $$ Requirement: $DR \geq 4$ (can distinguish 4+ categories) **Number of Distinct Categories (ndc):** $$ ndc = 1.41 \times \frac{\sigma_{parts}}{\sigma_{gauge}} $$ Requirement: $ndc \geq 5$ ### 8.3 True Process Capability $$ \sigma^2_{actual} = \sigma^2_{observed} - \sigma^2_{measurement} $$ $$ C_{pk,true} = C_{pk,observed} \times \sqrt{\frac{\sigma^2_{observed}}{\sigma^2_{observed} - \sigma^2_{measurement}}} $$ ## 9. Confidence Intervals for Capability Indices ### 9.1 Confidence Interval for $C_p$ $$ P\left(\hat{C}_p \sqrt{\frac{\chi^2_{n-1, \alpha/2}}{n-1}} \leq C_p \leq \hat{C}_p \sqrt{\frac{\chi^2_{n-1, 1-\alpha/2}}{n-1}}\right) = 1-\alpha $$ ### 9.2 Confidence Interval for $C_{pk}$ (Approximate) $$ \hat{C}_{pk} \pm z_{\alpha/2}\sqrt{\frac{1}{9n} + \frac{\hat{C}_{pk}^2}{2(n-1)}} $$ ### 9.3 Sample Size Requirements For desired relative precision $\epsilon$: $$ n \approx \frac{z_{\alpha/2}^2}{2\epsilon^2} + 1 $$ Practical guidelines: - 30 samples: Rough estimate - 50 samples: Reasonable precision - 100+ samples: Production qualification ### 9.4 Lower Confidence Bound Often used for acceptance decisions: $$ C_{pk,lower} = \hat{C}_{pk} - z_{\alpha}\sqrt{\frac{1}{9n} + \frac{\hat{C}_{pk}^2}{2(n-1)}} $$ ## 10. Dynamic Process Capability ### 10.1 Time-Varying Process Model Semiconductor processes drift due to: - Chamber conditioning/seasoning - Target erosion (PVD) - Consumable wear - Environmental drift **Drift Model:** $$ \mu(t) = \mu_0 + \delta t $$ **Periodic + Drift:** $$ \mu(t) = \mu_0 + \delta t + \sum_{k=1}^{K} A_k \sin(2\pi f_k t + \phi_k) $$ ### 10.2 EWMA-Based Monitoring **Mean Estimate:** $$ \hat{\mu}_t = \lambda x_t + (1-\lambda)\hat{\mu}_{t-1} $$ **Variance Estimate:** $$ \hat{\sigma}^2_t = \lambda(x_t - \hat{\mu}_{t-1})^2 + (1-\lambda)\hat{\sigma}^2_{t-1} $$ Where $0 < \lambda \leq 1$ is the smoothing constant. ### 10.3 Dynamic Capability Index $$ C_{pk}(t) = \min\left(\frac{USL - \hat{\mu}_t}{3\hat{\sigma}_t}, \frac{\hat{\mu}_t - LSL}{3\hat{\sigma}_t}\right) $$ ### 10.4 Control Chart Integration **EWMA Control Limits:** $$ UCL = \mu_0 + L\sigma\sqrt{\frac{\lambda}{2-\lambda}\left[1-(1-\lambda)^{2t}\right]} $$ $$ LCL = \mu_0 - L\sigma\sqrt{\frac{\lambda}{2-\lambda}\left[1-(1-\lambda)^{2t}\right]} $$ Where $L$ is the control limit factor (typically 2.7-3.0). ## 11. Run-to-Run Control Integration ### 11.1 Basic EWMA Controller $$ u_k = u_{k-1} + \frac{\eta}{\beta}(T - y_{k-1}) $$ Where: - $u_k$ = Recipe setting at run $k$ - $T$ = Target value - $\eta$ = Controller gain $(0 < \eta < 1)$ - $\beta$ = Process gain (sensitivity) ### 11.2 Double EWMA Controller For processes with drift: $$ \hat{a}_k = \lambda_1 y_k + (1-\lambda_1)(\hat{a}_{k-1} + \hat{b}_{k-1}) $$ $$ \hat{b}_k = \lambda_2(\hat{a}_k - \hat{a}_{k-1}) + (1-\lambda_2)\hat{b}_{k-1} $$ $$ u_k = \frac{T - \hat{a}_k - \hat{b}_k}{\beta} $$ ### 11.3 Achieved Capability Under Control **Variance of Controlled Output:** $$ \sigma^2_{controlled} = \frac{\sigma^2_\epsilon}{2\eta - \eta^2} $$ **Optimal Gain (Minimum Variance):** $$ \eta_{opt} = 1 \quad \text{(for i.i.d. disturbances)} $$ For autocorrelated disturbances, optimal gain depends on disturbance model. ### 11.4 Capability with APC $$ C_{pk,APC} = \min\left(\frac{USL - T}{3\sigma_{controlled}}, \frac{T - LSL}{3\sigma_{controlled}}\right) $$ ## 12. Advanced Topics ### 12.1 Bayesian Capability Analysis Useful for small sample sizes in development: **Posterior Distribution:** $$ P(C_{pk} | \text{data}) \propto L(\text{data} | C_{pk}) \cdot \pi(C_{pk}) $$ **With Non-informative Prior:** $$ C_{pk} | \text{data} \sim \text{Scaled-}t \text{ distribution} $$ **Credible Interval:** $$ P(C_{pk,L} < C_{pk} < C_{pk,U} | \text{data}) = 1 - \alpha $$ ### 12.2 Process Capability for Attributes **Equivalent Capability:** $$ C_{pk,attribute} = \frac{-\ln(p)}{3} $$ Where $p$ is the proportion defective. **For Defect Counts (Poisson):** $$ C_{pk,Poisson} = \frac{-\ln(1 - P(\text{acceptable}))}{3} $$ ### 12.3 Six Sigma and 1.5σ Shift **Short-term vs. Long-term:** $$ Z_{LT} = Z_{ST} - 1.5 $$ | Sigma Level | $Z_{ST}$ | $Z_{LT}$ | DPMO (Long-term) | |-------------|----------|----------|------------------| | 3σ | 3.0 | 1.5 | 66,807 | | 4σ | 4.0 | 2.5 | 6,210 | | 5σ | 5.0 | 3.5 | 233 | | 6σ | 6.0 | 4.5 | 3.4 | ### 12.4 Cpm and Cpkm (Taguchi Indices) **Cpm (accounts for deviation from target):** $$ C_{pm} = \frac{USL - LSL}{6\sqrt{\sigma^2 + (\mu - T)^2}} $$ $$ C_{pm} = \frac{USL - LSL}{6\tau} $$ Where $\tau = \sqrt{\sigma^2 + (\mu - T)^2}$ is the Taguchi loss function parameter. **Cpkm:** $$ C_{pkm} = \frac{C_{pk}}{\sqrt{1 + \left(\frac{\mu - T}{\sigma}\right)^2}} $$ ## 13. Practical Implementation Framework ### 13.1 Data Collection Strategy **Minimum Samples:** - Development: 30-50 wafers - Qualification: 100+ wafers - Monitoring: Per control chart rules **Rational Subgrouping:** - 5-25 wafers per lot - 9-49 measurement sites per wafer - Multiple lots across time windows ### 13.2 Capability Study Protocol 1. **Verify measurement system** - Complete Gauge R&R study - Requirement: P/T < 10%, ndc ≥ 5 2. **Collect data across variation sources** - Multiple lots - Multiple tools (if applicable) - Full wafer coverage 3. **Test for normality** - Shapiro-Wilk test - Anderson-Darling test - Visual: histogram, Q-Q plot 4. **Handle non-normality** - Transform (Box-Cox, Johnson) - Use percentile methods - Document approach 5. **Decompose variance components** - ANOVA or REML - Identify dominant sources 6. **Calculate indices with confidence intervals** - $C_p$, $C_{pk}$, $P_p$, $P_{pk}$ - Lower confidence bounds 7. **Assess spatial patterns** - Wafer maps - Radial plots - Systematic vs. random 8. **Document and establish monitoring** - Control charts - Trending - Review frequency ### 13.3 Decision Thresholds | $C_{pk}$ Range | Assessment | Required Action | |----------------|------------|-----------------| | < 1.0 | Not capable | Immediate improvement, 100% inspection | | 1.0 – 1.33 | Marginal | Improvement plan, enhanced monitoring | | 1.33 – 1.67 | Capable | Standard production controls | | > 1.67 | Highly capable | Reduced sampling possible | ## 14. Key Formulas ### Basic Indices $$ C_p = \frac{USL - LSL}{6\sigma} $$ $$ C_{pk} = \min\left(\frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma}\right) $$ ### Variance Decomposition $$ \sigma^2_{total} = \sigma^2_{within} + \sigma^2_{between} $$ ### Yield Relationship $$ Y = 2\Phi(3C_{pk}) - 1 $$ ### Confidence Interval $$ CI_{C_{pk}} = \hat{C}_{pk} \pm z_{\alpha/2}\sqrt{\frac{1}{9n} + \frac{\hat{C}_{pk}^2}{2(n-1)}} $$ ### Measurement System $$ \sigma^2_{observed} = \sigma^2_{actual} + \sigma^2_{measurement} $$

process change control, production

Manage modifications to processes.

process compensation,design

Adjust voltage/bias to compensate for process variation.

process control loop, manufacturing operations

Process control loops continuously adjust parameters maintaining targets despite disturbances.

process control monitor, yield enhancement

Process control monitors are specialized structures measuring process parameters for SPC.

process cooling water (pcw),process cooling water,pcw,facility

Chilled water loop for cooling process tools and chambers.

process cooling, manufacturing equipment

Process cooling maintains precise temperatures for chemical reactions and depositions.

process defects, production

Quality issues requiring rework.

process development kit (pdk),process development kit,pdk,design

Files and models needed to design for specific foundry process.

process digital twin, digital manufacturing

Simulate process physics.

process flow,process

Complete sequence of process steps to build a chip.

process module,production

Individual chamber in multi-chamber or cluster tool.

process monitor structures, metrology

Test structures tracking process.

process monitor,design

Measure effective process corner.

process monitoring, semiconductor process control, spc, statistical process control, sensor data, fault detection, run-to-run control, process optimization

# Semiconductor Manufacturing Process Parameters Monitoring: Mathematical Modeling ## 1. The Fundamental Challenge Modern semiconductor fabrication involves 500–1000+ sequential process steps, each with dozens of parameters requiring nanometer-scale precision. ### Key Process Types and Parameters - **Lithography**: exposure dose, focus, overlay alignment, resist thickness - **Etching (dry/wet)**: etch rate, selectivity, uniformity, plasma parameters (power, pressure, gas flows) - **Deposition (CVD, PVD, ALD)**: deposition rate, film thickness, uniformity, stress, composition - **CMP (Chemical Mechanical Polishing)**: removal rate, within-wafer non-uniformity, dishing, erosion - **Implantation**: dose, energy, angle, uniformity - **Thermal processes**: temperature uniformity, ramp rates, time ## 2. Statistical Process Control (SPC) — The Foundation ### 2.1 Univariate Control Charts For a process parameter $X$ with samples $x_1, x_2, \ldots, x_n$: **Sample Mean:** $$ \bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i $$ **Sample Standard Deviation:** $$ \sigma = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2} $$ **Control Limits (3-sigma):** $$ \text{UCL} = \bar{x} + 3\sigma $$ $$ \text{LCL} = \bar{x} - 3\sigma $$ ### 2.2 Process Capability Indices These quantify how well a process meets specifications: - **$C_p$ (Potential Capability):** $$ C_p = \frac{USL - LSL}{6\sigma} $$ - **$C_{pk}$ (Actual Capability)** — accounts for centering: $$ C_{pk} = \min\left[\frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma}\right] $$ - **$C_{pm}$ (Taguchi Index)** — penalizes deviation from target $T$: $$ C_{pm} = \frac{C_p}{\sqrt{1 + \left(\frac{\mu - T}{\sigma}\right)^2}} $$ Semiconductor fabs typically require $C_{pk} \geq 1.67$, corresponding to defect rates below ~1 ppm. ## 3. Multivariate Statistical Monitoring Since process parameters are highly correlated, univariate methods miss interaction effects. ### 3.1 Principal Component Analysis (PCA) Given data matrix $\mathbf{X}$ ($n$ samples × $p$ variables), centered: 1. **Compute covariance matrix:** $$ \mathbf{S} = \frac{1}{n-1}\mathbf{X}^T\mathbf{X} $$ 2. **Eigendecomposition:** $$ \mathbf{S} = \mathbf{V}\mathbf{\Lambda}\mathbf{V}^T $$ 3. **Project to principal components:** $$ \mathbf{T} = \mathbf{X}\mathbf{V} $$ ### 3.2 Monitoring Statistics #### Hotelling's $T^2$ Statistic Captures variation **within** the PCA model: $$ T^2 = \sum_{i=1}^{k} \frac{t_i^2}{\lambda_i} $$ where $k$ is the number of retained components. Under normal operation, $T^2$ follows a scaled F-distribution. #### Q-Statistic (Squared Prediction Error) Captures variation **outside** the model: $$ Q = \sum_{j=1}^{p}(x_j - \hat{x}_j)^2 = \|\mathbf{x} - \mathbf{x}\mathbf{V}_k\mathbf{V}_k^T\|^2 $$ > Often more sensitive to novel faults than $T^2$. ### 3.3 Partial Least Squares (PLS) When relating process inputs $\mathbf{X}$ to quality outputs $\mathbf{Y}$: $$ \mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{E} $$ PLS finds latent variables that maximize covariance between $\mathbf{X}$ and $\mathbf{Y}$, providing both monitoring capability and a predictive model. ## 4. Virtual Metrology (VM) Models Virtual metrology predicts physical measurement outcomes from process sensor data, enabling 100% wafer coverage without costly measurements. ### 4.1 Linear Models For process parameters $\mathbf{x} \in \mathbb{R}^p$ and metrology target $y$: - **Ordinary Least Squares (OLS):** $$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} $$ - **Ridge Regression** ($L_2$ regularization for collinearity): $$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^T\mathbf{y} $$ - **LASSO** ($L_1$ regularization for sparsity/feature selection): $$ \min_{\boldsymbol{\beta}} \|\mathbf{y} - \mathbf{X}\boldsymbol{\beta}\|^2 + \lambda\|\boldsymbol{\beta}\|_1 $$ ### 4.2 Nonlinear Models #### Gaussian Process Regression (GPR) $$ y \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')) $$ **Posterior predictive distribution:** - **Mean:** $$ \mu_* = \mathbf{K}_*^T(\mathbf{K} + \sigma_n^2\mathbf{I})^{-1}\mathbf{y} $$ - **Variance:** $$ \sigma_*^2 = K_{**} - \mathbf{K}_*^T(\mathbf{K} + \sigma_n^2\mathbf{I})^{-1}\mathbf{K}_* $$ GPs provide uncertainty quantification — critical for knowing when to trigger actual metrology. #### Support Vector Regression (SVR) $$ \min \frac{1}{2}\|\mathbf{w}\|^2 + C\sum_i(\xi_i + \xi_i^*) $$ Subject to $\epsilon$-insensitive tube constraints. Kernel trick enables nonlinear modeling. #### Neural Networks - **MLPs**: Multi-layer perceptrons for general function approximation - **CNNs**: Convolutional neural networks for wafer map pattern recognition - **LSTMs**: Long Short-Term Memory networks for time-series FDC traces ## 5. Run-to-Run (R2R) Control R2R control adjusts recipe setpoints between wafers/lots to compensate for drift and disturbances. ### 5.1 EWMA Controller For a process with model $y = a_0 + a_1 u + \epsilon$: **Prediction update:** $$ \hat{y}_{k+1} = \lambda y_k + (1-\lambda)\hat{y}_k $$ **Control action:** $$ u_{k+1} = \frac{T - \hat{y}_{k+1} + a_0}{a_1} $$ where: - $T$ is the target - $\lambda \in (0,1)$ is the smoothing weight ### 5.2 Double EWMA (for Linear Drift) When process drifts linearly: $$ \hat{y}_{k+1} = a_k + b_k $$ $$ a_k = \lambda y_k + (1-\lambda)(a_{k-1} + b_{k-1}) $$ $$ b_k = \gamma(a_k - a_{k-1}) + (1-\gamma)b_{k-1} $$ ### 5.3 State-Space Formulation More general framework: **State equation:** $$ \mathbf{x}_{k+1} = \mathbf{A}\mathbf{x}_k + \mathbf{B}\mathbf{u}_k + \mathbf{w}_k $$ **Observation equation:** $$ \mathbf{y}_k = \mathbf{C}\mathbf{x}_k + \mathbf{D}\mathbf{u}_k + \mathbf{v}_k $$ Use **Kalman filtering** for state estimation and **LQR/MPC** for optimal control. ### 5.4 Model Predictive Control (MPC) **Objective function:** $$ \min \sum_{i=1}^{N} \|\mathbf{y}_{k+i} - \mathbf{r}_{k+i}\|_\mathbf{Q}^2 + \sum_{j=0}^{N-1}\|\Delta\mathbf{u}_{k+j}\|_\mathbf{R}^2 $$ subject to process model and operational constraints. > MPC handles multivariable systems with constraints naturally. ## 6. Fault Detection and Classification (FDC) ### 6.1 Detection Methods #### Mahalanobis Distance $$ D^2 = (\mathbf{x} - \boldsymbol{\mu})^T\mathbf{S}^{-1}(\mathbf{x} - \boldsymbol{\mu}) $$ Follows $\chi^2$ distribution under multivariate normality. #### Other Detection Methods - **One-Class SVM**: Learn boundary of normal operation - **Autoencoders**: Detect anomalies via reconstruction error ### 6.2 Classification Features For trace data (time-series from sensors), extract features: - **Statistical moments**: mean, variance, skewness, kurtosis - **Frequency domain**: FFT coefficients, spectral power - **Wavelet coefficients**: Multi-resolution analysis - **DTW distances**: Dynamic Time Warping to reference signatures ### 6.3 Classification Algorithms - Support Vector Machines (SVM) - Random Forest - CNNs for pattern recognition on wafer maps - Gradient Boosting (XGBoost, LightGBM) ## 7. Spatial Modeling (Within-Wafer Variation) Systematic spatial patterns require explicit modeling. ### 7.1 Polynomial Basis Expansion #### Zernike Polynomials (common in lithography) $$ z(\rho, \theta) = \sum_{n,m} Z_n^m(\rho, \theta) $$ These form an orthogonal basis on the unit disk, capturing radial and azimuthal variation. ### 7.2 Gaussian Process Spatial Models $$ y(\mathbf{s}) \sim \mathcal{GP}(\mu(\mathbf{s}), k(\mathbf{s}, \mathbf{s}')) $$ #### Common Covariance Kernels - **Squared Exponential (RBF):** $$ k(\mathbf{s}, \mathbf{s}') = \sigma^2 \exp\left(-\frac{\|\mathbf{s} - \mathbf{s}'\|^2}{2\ell^2}\right) $$ - **Matérn** (more flexible smoothness): $$ k(r) = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)}\left(\frac{\sqrt{2\nu}r}{\ell}\right)^\nu K_\nu\left(\frac{\sqrt{2\nu}r}{\ell}\right) $$ where $K_\nu$ is the modified Bessel function of the second kind. ## 8. Dynamic/Time-Series Modeling For plasma processes, endpoint detection, and transient behavior. ### 8.1 Autoregressive Models **AR(p) model:** $$ x_t = \sum_{i=1}^{p} \phi_i x_{t-i} + \epsilon_t $$ ARIMA extends this to non-stationary series. ### 8.2 Dynamic PCA Augment data with time-lagged values: $$ \tilde{\mathbf{X}} = [\mathbf{X}(t), \mathbf{X}(t-1), \ldots, \mathbf{X}(t-l)] $$ Then apply standard PCA to capture temporal dynamics. ### 8.3 Deep Sequence Models #### LSTM Networks Gating mechanisms: - **Forget gate:** $f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$ - **Input gate:** $i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$ - **Output gate:** $o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$ **Cell state update:** $$ c_t = f_t \odot c_{t-1} + i_t \odot \tilde{c}_t $$ **Hidden state:** $$ h_t = o_t \odot \tanh(c_t) $$ ## 9. Model Maintenance and Adaptation Semiconductor processes drift — models must adapt. ### 9.1 Drift Detection Methods #### CUSUM (Cumulative Sum) $$ S_k = \max(0, S_{k-1} + (x_k - \mu_0) - k) $$ Signal when $S_k$ exceeds threshold. #### Page-Hinkley Test $$ m_k = \sum_{i=1}^{k}(x_i - \bar{x}_k - \delta) $$ $$ M_k = \max_{i \leq k} m_i $$ Alarm when $M_k - m_k > \lambda$. #### ADWIN (Adaptive Windowing) Automatically detects distribution changes and adjusts window size. ### 9.2 Online Model Updating #### Recursive Least Squares (RLS) $$ \hat{\boldsymbol{\beta}}_k = \hat{\boldsymbol{\beta}}_{k-1} + \mathbf{K}_k(y_k - \mathbf{x}_k^T\hat{\boldsymbol{\beta}}_{k-1}) $$ where $\mathbf{K}_k$ is the gain matrix updated via the Riccati equation: $$ \mathbf{K}_k = \frac{\mathbf{P}_{k-1}\mathbf{x}_k}{\lambda + \mathbf{x}_k^T\mathbf{P}_{k-1}\mathbf{x}_k} $$ $$ \mathbf{P}_k = \frac{1}{\lambda}(\mathbf{P}_{k-1} - \mathbf{K}_k\mathbf{x}_k^T\mathbf{P}_{k-1}) $$ #### Just-in-Time (JIT) Learning Build local models around each new prediction point using nearest historical samples. ## 10. Integrated Framework A complete monitoring system layers these methods: | Layer | Methods | Purpose | |-------|---------|---------| | **Preprocessing** | Cleaning, synchronization, normalization | Data quality | | **Feature Engineering** | Domain features, wavelets, PCA | Dimensionality management | | **Monitoring** | $T^2$, Q-statistic, control charts | Detect out-of-control states | | **Virtual Metrology** | PLS, GPR, neural networks | Predict quality without measurement | | **FDC** | Classification models | Diagnose fault root causes | | **Control** | R2R, MPC | Compensate for drift/disturbances | | **Adaptation** | Online learning, drift detection | Maintain model validity | ## 11. Key Mathematical Challenges 1. **High dimensionality** — hundreds of sensors, requiring regularization and dimension reduction 2. **Collinearity** — process variables are physically coupled 3. **Non-stationarity** — drift, maintenance events, recipe changes 4. **Small sample sizes** — new recipes have limited historical data (transfer learning, Bayesian methods help) 5. **Real-time constraints** — decisions needed in seconds 6. **Rare events** — faults are infrequent, creating class imbalance ## 12. Key Equations ### Process Capability $$ C_{pk} = \min\left[\frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma}\right] $$ ### Multivariate Monitoring $$ T^2 = \sum_{i=1}^{k} \frac{t_i^2}{\lambda_i}, \quad Q = \|\mathbf{x} - \hat{\mathbf{x}}\|^2 $$ ### Virtual Metrology (Ridge Regression) $$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^T\mathbf{y} $$ ### EWMA Control $$ \hat{y}_{k+1} = \lambda y_k + (1-\lambda)\hat{y}_k $$ ### Mahalanobis Distance $$ D^2 = (\mathbf{x} - \boldsymbol{\mu})^T\mathbf{S}^{-1}(\mathbf{x} - \boldsymbol{\mu}) $$

process node,nm,nanometer

Process node (7nm, 5nm, 3nm) indicates transistor density. Smaller = faster, lower power, more expensive.

process node,process

Semiconductor technology generation (7nm 5nm 3nm etc).

process optimization energy, environmental & sustainability

Process optimization reduces energy by improving efficiency cycle times and yields.

process optimization,recipe optimization,response surface methodology,rsm,gaussian process,bayesian optimization,run to run control,r2r,robust optimization,multi-objective optimization

# Optimization: Mathematical Modeling 1. Context A recipe is a vector of controllable parameters: $$ \mathbf{x} = \begin{bmatrix} T \\ P \\ Q_1 \\ Q_2 \\ \vdots \\ t \\ P_{\text{RF}} \end{bmatrix} \in \mathbb{R}^n $$ Where: - $T$ = Temperature (°C or K) - $P$ = Pressure (mTorr or Pa) - $Q_i$ = Gas flow rates (sccm) - $t$ = Process time (seconds) - $P_{\text{RF}}$ = RF power (Watts) Goal : Find optimal $\mathbf{x}$ such that output properties $\mathbf{y}$ meet specifications while accounting for variability. 2. Mathematical Modeling Approaches 2.1 Physics-Based (First-Principles) Models Chemical Vapor Deposition (CVD) Example Mass transport and reaction equation: $$ \frac{\partial C}{\partial t} + \nabla \cdot (\mathbf{u}C) = D\nabla^2 C + R(C, T) $$ Where: - $C$ = Species concentration - $\mathbf{u}$ = Velocity field - $D$ = Diffusion coefficient - $R(C, T)$ = Reaction rate Surface reaction kinetics (Arrhenius form): $$ k_s = A \exp\left(-\frac{E_a}{RT}\right) $$ Where: - $A$ = Pre-exponential factor - $E_a$ = Activation energy - $R$ = Gas constant - $T$ = Temperature Deposition rate (transport-limited regime): $$ r = \frac{k_s C_s}{1 + \frac{k_s}{h_g}} $$ Where: - $C_s$ = Surface concentration - $h_g$ = Gas-phase mass transfer coefficient Characteristics: - Advantages : Extrapolates outside training data, physically interpretable - Disadvantages : Computationally expensive, requires detailed mechanism knowledge 2.2 Empirical/Statistical Models (Response Surface Methodology) Second-order polynomial model: $$ y = \beta_0 + \sum_{i=1}^{n}\beta_i x_i + \sum_{i=1}^{n}\beta_{ii}x_i^2 + \sum_{i 50$ parameters) | PCA, PLS, sparse regression (LASSO), feature selection | | Small datasets (limited wafer runs) | Bayesian methods, transfer learning, multi-fidelity modeling | | Nonlinearity | GPs, neural networks, tree ensembles (RF, XGBoost) | | Equipment-to-equipment variation | Mixed-effects models, hierarchical Bayesian models | | Drift over time | Adaptive/recursive estimation, change-point detection, Kalman filtering | | Multiple correlated responses | Multi-task learning, co-kriging, multivariate GP | | Missing data | EM algorithm, multiple imputation, probabilistic PCA | 6. Dimensionality Reduction 6.1 Principal Component Analysis (PCA) Objective: $$ \max_{\mathbf{w}} \quad \mathbf{w}^T\mathbf{S}\mathbf{w} \quad \text{s.t.} \quad \|\mathbf{w}\|_2 = 1 $$ Where $\mathbf{S}$ is the sample covariance matrix. Solution: Eigenvectors of $\mathbf{S}$ $$ \mathbf{S} = \mathbf{W}\boldsymbol{\Lambda}\mathbf{W}^T $$ Reduced representation: $$ \mathbf{z} = \mathbf{W}_k^T(\mathbf{x} - \bar{\mathbf{x}}) $$ Where $\mathbf{W}_k$ contains the top $k$ eigenvectors. 6.2 Partial Least Squares (PLS) Objective: Maximize covariance between $\mathbf{X}$ and $\mathbf{Y}$ $$ \max_{\mathbf{w}, \mathbf{c}} \quad \text{Cov}(\mathbf{Xw}, \mathbf{Yc}) \quad \text{s.t.} \quad \|\mathbf{w}\|=\|\mathbf{c}\|=1 $$ 7. Multi-Fidelity Optimization Combine cheap simulations with expensive experiments: Auto-regressive model (Kennedy-O'Hagan): $$ y_{\text{HF}}(\mathbf{x}) = \rho \cdot y_{\text{LF}}(\mathbf{x}) + \delta(\mathbf{x}) $$ Where: - $y_{\text{HF}}$ = High-fidelity (experimental) response - $y_{\text{LF}}$ = Low-fidelity (simulation) response - $\rho$ = Scaling factor - $\delta(\mathbf{x}) \sim \mathcal{GP}$ = Discrepancy function Multi-fidelity GP: $$ \begin{bmatrix} \mathbf{y}_{\text{LF}} \\ \mathbf{y}_{\text{HF}} \end{bmatrix} \sim \mathcal{N}\left(\mathbf{0}, \begin{bmatrix} \mathbf{K}_{\text{LL}} & \rho\mathbf{K}_{\text{LH}} \\ \rho\mathbf{K}_{\text{HL}} & \rho^2\mathbf{K}_{\text{LL}} + \mathbf{K}_{\delta} \end{bmatrix}\right) $$ 8. Transfer Learning Domain adaptation for tool-to-tool transfer: $$ y_{\text{target}}(\mathbf{x}) = y_{\text{source}}(\mathbf{x}) + \Delta(\mathbf{x}) $$ Offset model (simple): $$ \Delta(\mathbf{x}) = c_0 \quad \text{(constant offset)} $$ Linear adaptation: $$ \Delta(\mathbf{x}) = \mathbf{c}^T\mathbf{x} + c_0 $$ GP adaptation: $$ \Delta(\mathbf{x}) \sim \mathcal{GP}(0, k_\Delta) $$ 9. Complete Optimization Framework ┌───────────────────────────────────────────────────────┐ │ RECIPE OPTIMIZATION FRAMEWORK │ ├───────────────────────────────────────────────────────┤ │ │ │ INPUTS MODEL OUTPUTS │ │ ────── ───── ─────── │ │ ┌─────────┐ │ │ x₁: Temp ───► │ │ ───► y₁: Thickness │ │ x₂: Press ───► │ y=f(x;θ)│ ───► y₂: Uniformity │ │ x₃: Flow1 ───► │ │ ───► y₃: CD │ │ x₄: Flow2 ───► │ + ε │ ───► y₄: Defects │ │ x₅: Power ───► │ │ │ │ x₆: Time ───► └─────────┘ │ │ ▲ │ │ Uncertainty ξ │ │ │ ├───────────────────────────────────────────────────────┤ │ OPTIMIZATION PROBLEM: │ │ │ │ min Σⱼ wⱼ(E[yⱼ] - yⱼ,target)² + λ·Var[y] │ │ x │ │ │ │ subject to: │ │ y_L ≤ E[y] ≤ y_U (spec limits) │ │ Pr(y ∈ spec) ≥ 0.9973 (Cpk ≥ 1.0) │ │ x_L ≤ x ≤ x_U (equipment limits) │ │ g(x) ≤ 0 (process constraints) │ │ │ └───────────────────────────────────────────────────────┘ 10. Equations: Process Modeling | Model Type | Equation | |:-----------|:---------| | Linear regression | $y = \mathbf{X}\boldsymbol{\beta} + \varepsilon$ | | Quadratic RSM | $y = \beta_0 + \sum_i \beta_i x_i + \sum_i \beta_{ii}x_i^2 + \sum_{i

process performance, quality & reliability

Process performance indices use actual variation including assignable causes.

process performance, spc

Long-term capability.

process replication, production

Duplicate process at new site.

process simulation flow,simulation

Chain simulators for sequential steps.

process simulation,design

Model how process steps affect device structure and properties.

process stability, manufacturing

Consistency over time.

process variation, design & verification

Process variations arise from manufacturing tolerances affecting transistor parameters.

process window analysis, lithography

Determine usable focus-dose range.

process window index, pwi, process

Quantify robustness of process window.

process window qualification, pwq, lithography

Verify adequate process window.

process window, process

Range where all specifications are met.

process window,exposure-defocus,bossung,depth of focus,dof,exposure latitude,cpk,lithography window,semiconductor process window

# Process Window 1. Fundamental A process window is the region in parameter space where a manufacturing step yields acceptable results. Mathematically, for a response function $y(\mathbf{x})$ depending on parameter vector $\mathbf{x} = (x_1, x_2, \ldots, x_n)$: $$ \text{Process Window} = \{\mathbf{x} : y_{\min} \leq y(\mathbf{x}) \leq y_{\max}\} $$ 2. Single-Parameter Statistics For a single parameter with lower and upper specification limits (LSL, USL): Process Capability Indices - $C_p$ (Process Capability): Measures window width relative to process variation $$ C_p = \frac{USL - LSL}{6\sigma} $$ - $C_{pk}$ (Process Capability Index): Accounts for process centering $$ C_{pk} = \min\left[\frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma}\right] $$ Industry Standards - $C_p \geq 1.0$: Process variation fits within specifications - $C_{pk} \geq 1.33$: 4σ capability (standard requirement) - $C_{pk} \geq 1.67$: 5σ capability (high-reliability applications) - $C_{pk} \geq 2.0$: 6σ capability (Six Sigma standard) 3. Lithography: Exposure-Defocus (E-D) Window The most critical and mathematically developed process window in semiconductor manufacturing. 3.1 Bossung Curve Model Critical dimension (CD) as a function of exposure dose $E$ and defocus $F$: $$ CD(E, F) = CD_0 + a_1 E + a_2 F + a_{11} E^2 + a_{22} F^2 + a_{12} EF + \ldots $$ The process window boundary is defined by: $$ |CD(E, F) - CD_{\text{target}}| = \Delta CD_{\text{tolerance}} $$ 3.2 Key Metrics - Exposure Latitude (EL): Percentage dose range for acceptable CD $$ EL = \frac{E_{\max} - E_{\min}}{E_{\text{nominal}}} \times 100\% $$ - Depth of Focus (DOF): Focus range for acceptable CD (at given EL) $$ DOF = F_{\max} - F_{\min} $$ - Process Window Area: Total acceptable region $$ A_{PW} = \iint_{\text{acceptable}} dE \, dF $$ 3.3 Rayleigh Equations Resolution and DOF scale with wavelength $\lambda$ and numerical aperture $NA$: - Resolution (minimum feature size): $$ R = k_1 \frac{\lambda}{NA} $$ - Depth of Focus: $$ DOF = \pm k_2 \frac{\lambda}{NA^2} $$ Critical insight: As $k_1$ decreases (smaller features), DOF shrinks as $(k_1)^2$ — process windows collapse rapidly at advanced nodes. | Technology Node | $k_1$ Factor | Relative DOF | | --| --| --| | 180nm | 0.6 | 1.0 | | 65nm | 0.4 | 0.44 | | 14nm | 0.3 | 0.25 | | 5nm (EUV) | 0.25 | 0.17 | 4. Image Quality Metrics 4.1 Normalized Image Log-Slope (NILS) $$ NILS = w \cdot \frac{1}{I} \left|\frac{dI}{dx}\right|_{\text{edge}} $$ Where: - $w$ = feature width - $I$ = aerial image intensity - $\frac{dI}{dx}$ = intensity gradient at feature edge For a coherent imaging system with partial coherence $\sigma$: $$ NILS \approx \pi \cdot \frac{w}{\lambda/NA} \cdot \text{(contrast factor)} $$ Interpretation: - Higher NILS → larger process window - NILS > 2.0: Robust process - NILS < 1.5: Marginal process window - NILS < 1.0: Near resolution limit 4.2 Mask Error Enhancement Factor (MEEF) $$ MEEF = \frac{\partial CD_{\text{wafer}}}{\partial CD_{\text{mask}}} $$ Characteristics: - MEEF = 1: Ideal (1:1 transfer from mask to wafer) - MEEF > 1: Mask errors are amplified on wafer - Near resolution limit: MEEF typically 3–4 or higher - Impacts effective process window: mask CD tolerance = wafer CD tolerance / MEEF 5. Multi-Parameter Process Windows 5.1 Ellipsoid Model For $n$ interacting parameters, the window is often an $n$-dimensional ellipsoid: $$ (\mathbf{x} - \mathbf{x}_0)^T \mathbf{A} (\mathbf{x} - \mathbf{x}_0) \leq 1 $$ Where: - $\mathbf{x}$ = parameter vector $(x_1, x_2, \ldots, x_n)$ - $\mathbf{x}_0$ = optimal operating point (center of ellipsoid) - $\mathbf{A}$ = positive definite matrix encoding parameter correlations Geometric interpretation: - Eigenvalues of $\mathbf{A}$: $\lambda_1, \lambda_2, \ldots, \lambda_n$ - Principal axes lengths: $a_i = 1/\sqrt{\lambda_i}$ - Eigenvectors: orientation of principal axes 5.2 Overlapping Windows Real processes require multiple steps to simultaneously work: $$ PW_{\text{total}} = \bigcap_{i=1}^{N} PW_i $$ Example: Combined lithography + etch window $$ PW_{\text{combined}} = PW_{\text{litho}}(E, F) \cap PW_{\text{etch}}(P, W, T) $$ If individual windows are ellipsoids, their intersection is a more complex polytope — often computed numerically via: - Linear programming - Convex hull algorithms - Monte Carlo sampling 6. Response Surface Methodology (RSM) 6.1 Quadratic Model $$ y = \beta_0 + \sum_{i=1}^{n} \beta_i x_i + \sum_{i=1}^{n} \beta_{ii} x_i^2 + \sum_{i 3–5 (typical) - Selectivity > 10 (high aspect ratio features) - Selectivity > 50 (critical etch stop layers) 13. CMP Process Windows 13.1 Preston Equation $$ RR = K_p \cdot P \cdot V $$ Where: - $RR$ = removal rate (nm/min or Å/min) - $K_p$ = Preston coefficient (material/consumable dependent) - $P$ = applied pressure (psi or kPa) - $V$ = relative velocity (m/s) 13.2 Within-Wafer Non-Uniformity (WIWNU) $$ WIWNU = \frac{\sigma_{RR}}{\mu_{RR}} \times 100\% $$ Target: WIWNU < 3–5% 13.3 Dishing and Erosion - Dishing: Excess removal at center of wide features $$ \text{Dishing} = t_{\text{initial}} - t_{\text{center}} $$ - Erosion: Thinning of dielectric between metal lines $$ \text{Erosion} = t_{\text{field}} - t_{\text{local}} $$ 14. Key Equations Summary Table | Metric | Formula | Significance | | --| | --| | Resolution | $R = k_1 \frac{\lambda}{NA}$ | Minimum feature size | | Depth of Focus | $DOF = \pm k_2 \frac{\lambda}{NA^2}$ | Focus tolerance | | NILS | $NILS = \frac{w}{I} \left\|\frac{dI}{dx}\right\|$ | Image contrast at edge | | MEEF | $MEEF = \frac{\partial CD_w}{\partial CD_m}$ | Mask error amplification | | Process Capability | $C_{pk} = \frac{\min(USL-\mu, \mu-LSL)}{3\sigma}$ | Process capability | | Exposure Latitude | $EL = \frac{E_{max} - E_{min}}{E_{nom}} \times 100\%$ | Dose tolerance | | Stochastic LER | $LER \propto \frac{1}{\sqrt{Dose}}$ | Shot noise floor | | Yield (Poisson) | $Y = e^{-DA}$ | Defect-limited yield | | Preston Equation | $RR = K_p P V$ | CMP removal rate | 15. Modern Computational Approaches 15.1 Monte Carlo Simulation Algorithm: Monte Carlo Yield Estimation 1. Define parameter distributions: x_i ~ N(μ_i, σ_i²) 2. For trial = 1 to N_trials: a. Sample x from joint distribution b. Evaluate y(x) for all responses c. Check if y ∈ [y_min, y_max] for all responses d. Record pass/fail 3. Yield = N_pass / N_trials 4. Confidence interval: Y ± z_α √(Y(1-Y)/N) 15.2 Machine Learning Classification - Support Vector Machine (SVM): Decision boundary defines process window - Neural Networks: Complex, non-convex window shapes - Random Forest: Ensemble method for robustness - Gaussian Process: Probabilistic boundaries with uncertainty 15.3 Digital Twin Approach $$ \hat{y}_{t+1} = f(y_t, \mathbf{x}_t, \boldsymbol{\theta}) $$ Where: - $\hat{y}_{t+1}$ = predicted next-step output - $y_t$ = current measured output - $\mathbf{x}_t$ = current process parameters - $\boldsymbol{\theta}$ = model parameters (updated via Bayesian inference) 16. Advanced Node Challenges 16.1 Process Window Shrinkage At advanced nodes (sub-7nm), multiple factors compound: $$ PW_{\text{effective}} = PW_{\text{optical}} \cap PW_{\text{stochastic}} \cap PW_{\text{overlay}} \cap PW_{\text{etch}} $$ 16.2 Multi-Patterning Complexity For N-patterning (e.g., SAQP with N=4): $$ \sigma_{\text{total}}^2 = \sum_{i=1}^{N} \sigma_{\text{step}_i}^2 $$ Error budget per step: $$ \sigma_{\text{step}} = \frac{\sigma_{\text{target}}}{\sqrt{N}} $$ 16.3 Design-Technology Co-Optimization (DTCO) $$ \text{Objective: } \max_{\text{design}, \text{process}} \left[ \text{Performance} \times Y(\text{design}, \text{process}) \right] $$ Subject to: - Design rules: $DR_i(\text{layout}) \geq 0$ - Process windows: $\mathbf{x} \in PW$ - Reliability: $MTTF \geq \text{target}$

process-induced stress, process integration

Process-induced stress from STI spacers and epitaxial layers modulates channel carrier mobility.

process-induced variation, manufacturing

Variation from manufacturing.

process,isolation,fork

Processes have isolated memory. Fork for parallelism. More overhead than threads.

processing waste, production

Unnecessary process steps.