← Back to AI Factory Chat

AI Factory Glossary

758 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 3 of 16 (758 entries)

particulate contamination control, clean tech

Minimize particles.

partnership,collaborate,partner

We partner with data providers, algorithm teams, compute platforms, and communication experts. Lets build the AI value chain together.

parts count method, business & standards

Parts count method predicts reliability by summing component failure rates.

parts inventory management, operations

Stock spare parts.

pass@k, evaluation

Pass@k measures if correct solution exists among k generations.

passage retrieval, rag

Retrieve document chunks.

passage retrieval, rag

Passage retrieval finds relevant text segments rather than whole documents.

passivation,etch

Polymer deposition on sidewalls during etch to protect and maintain anisotropy.

patch dropout, computer vision

Randomly drop patches during training.

patch embedding, computer vision

Linear projection of image patches.

patch merging in vit, computer vision

Reduce spatial resolution in hierarchical ViT.

patch merging, computer vision

Combine image patches hierarchically.

patchgan discriminator, generative models

Discriminate on patches.

patchify operation, computer vision

Convert image to sequence of patches.

patchtst, time series models

Patch Time Series Transformer divides series into patches reducing tokens and improving long-horizon forecasting efficiency.

patent analysis, legal ai

Analyze patent documents with NLP.

patent analysis,legal ai

Analyze patent documents.

patent classification, legal ai

Categorize patents.

patent drafting assistance,legal ai

Help write patent applications.

patent infringement, legal

Unauthorized use of patented technology.

patent litigation, legal

Legal disputes over patents.

patent portfolio, business

Collection of patents.

patent similarity, legal ai

Find similar patents.

path delay fault, advanced test & probe

Path delay faults represent cumulative delays along specific logic paths exceeding timing requirements.

path delay fault,testing

Cumulative delay along path.

path encoding nas, neural architecture search

Path encoding represents architectures through enumeration of computational paths from input to output.

path patching, explainable ai

Intervene on specific paths.

path patching, interpretability

Path patching ablates information flow through specific paths testing their necessity.

pathology image analysis,healthcare ai

Analyze tissue slides.

patience,iteration,long game

AI progress takes time. Iterate patiently. Compound improvements. Play the long game.

patient risk stratification,healthcare ai

Assess patient risk levels.

pattern fidelity, advanced test & probe

Pattern fidelity in test programs ensures generated stimulus signals accurately represent intended waveforms for device characterization.

pattern generation,content creation

Create repeating patterns and tiles.

pattern placement,overlay,registration,alignment,wafer alignment,die placement,pattern transfer,lithography alignment,overlay error,placement accuracy

# Pattern Placement 1. The Core Problem In semiconductor manufacturing, we must transfer nanoscale patterns from a mask to a silicon wafer with sub-nanometer precision across billions of features. The mathematical challenge is threefold: - Forward modeling : Predicting what pattern will actually print given a mask design - Inverse problem : Determining what mask to use to achieve a desired pattern - Optimization under uncertainty : Ensuring robust manufacturing despite process variations 2. Optical Lithography Mathematics 2.1 Aerial Image Formation (Hopkins Formulation) The intensity distribution at the wafer plane is governed by partially coherent imaging theory: $$ I(x,y) = \iint\!\!\iint TCC(f_1,g_1,f_2,g_2) \cdot M(f_1,g_1) \cdot M^*(f_2,g_2) \cdot e^{2\pi i[(f_1-f_2)x + (g_1-g_2)y]} \, df_1\,dg_1\,df_2\,dg_2 $$ Where: - $TCC$ (Transmission Cross-Coefficient) encodes the optical system - $M(f,g)$ is the Fourier transform of the mask transmission function - The double integral reflects the coherent superposition from different source points 2.2 Resolution Limits The Rayleigh criterion establishes fundamental constraints: $$ R_{min} = k_1 \cdot \frac{\lambda}{NA} $$ $$ DOF = k_2 \cdot \frac{\lambda}{NA^2} $$ Parameters: | Parameter | DUV (ArF) | EUV | |-----------|-----------|-----| | Wavelength $\lambda$ | 193 nm | 13.5 nm | | Typical NA | 1.35 | 0.33 (High-NA: 0.55) | | Min. pitch | ~36 nm | ~24 nm | The $k_1$ factor (process-dependent, typically 0.25–0.4) is where most of the mathematical innovation occurs. 2.3 Image Log-Slope (ILS) The image log-slope is a critical metric for pattern fidelity: $$ ILS = \frac{1}{I} \left| \frac{dI}{dx} \right|_{edge} $$ Higher ILS values indicate better edge definition and process margin. 2.4 Modulation Transfer Function (MTF) The optical system's ability to transfer contrast is characterized by: $$ MTF(f) = \frac{I_{max}(f) - I_{min}(f)}{I_{max}(f) + I_{min}(f)} $$ 3. Photoresist Modeling The resist transforms the aerial image into a physical pattern through coupled partial differential equations. 3.1 Exposure Kinetics (Dill Model) Light absorption in resist: $$ \frac{\partial I}{\partial z} = -\alpha(M) \cdot I $$ Absorption coefficient: $$ \alpha = A \cdot M + B $$ Photoactive compound decomposition: $$ \frac{\partial M}{\partial t} = -C \cdot I \cdot M $$ Where: - $A$ = bleachable absorption coefficient (μm⁻¹) - $B$ = non-bleachable absorption coefficient (μm⁻¹) - $C$ = exposure rate constant (cm²/mJ) - $M$ = relative PAC concentration (0 to 1) 3.2 Chemically Amplified Resist (Diffusion-Reaction) For modern resists, photoacid generation and diffusion govern pattern formation: $$ \frac{\partial [H^+]}{\partial t} = D\nabla^2[H^+] - k_{quench}[H^+][Q] - k_{react}[H^+][Polymer] $$ Components: - $D$ = diffusion coefficient of photoacid - $k_{quench}$ = quencher reaction rate - $k_{react}$ = deprotection reaction rate - $[Q]$ = quencher concentration 3.3 Development Rate Models The Mack model relates local chemistry to dissolution: $$ R(m) = R_{max} \cdot \frac{(a+1)(1-m)^n}{a + (1-m)^n} + R_{min} $$ Where: - $m$ = normalized inhibitor concentration - $n$ = development selectivity parameter - $a$ = threshold parameter - $R_{max}$, $R_{min}$ = maximum and minimum development rates 3.4 Resist Profile Evolution The resist surface evolves according to: $$ \frac{\partial z}{\partial t} = -R(m(x,y,z)) \cdot \hat{n} $$ Where $\hat{n}$ is the surface normal vector. 4. Pattern Placement and Overlay Mathematics 4.1 Overlay Error Decomposition Total placement error is modeled as a polynomial field: $$ \delta x(X,Y) = a_0 + a_1 X + a_2 Y + a_3 XY + a_4 X^2 + a_5 Y^2 + \ldots $$ $$ \delta y(X,Y) = b_0 + b_1 X + b_2 Y + b_3 XY + b_4 X^2 + b_5 Y^2 + \ldots $$ Physical interpretation of coefficients: | Term | Coefficient | Physical Meaning | |------|-------------|------------------| | Translation | $a_0, b_0$ | Rigid shift in x, y | | Magnification | $a_1, b_2$ | Isotropic scaling | | Rotation | $a_2, -b_1$ | In-plane rotation | | Asymmetric Mag | $a_1 - b_2$ | Anisotropic scaling | | Trapezoid | $a_3, b_3$ | Keystone distortion | | Higher order | $a_4, a_5, \ldots$ | Lens aberrations, wafer distortion | 4.2 Edge Placement Error (EPE) Budget $$ EPE_{total}^2 = EPE_{overlay}^2 + EPE_{CD}^2 + EPE_{LER}^2 + EPE_{stochastic}^2 $$ Error budget at 3nm node: - Total EPE budget: ~1-2 nm - Each component must be controlled to sub-nanometer precision 4.3 Overlay Correction Model The correction applied to the scanner is: $$ \begin{pmatrix} \Delta x \\ \Delta y \end{pmatrix} = \begin{pmatrix} 1 + M_x & R + O_x \\ -R + O_y & 1 + M_y \end{pmatrix} \begin{pmatrix} X \\ Y \end{pmatrix} + \begin{pmatrix} T_x \\ T_y \end{pmatrix} $$ Where: - $T_x, T_y$ = translation corrections - $M_x, M_y$ = magnification corrections - $R$ = rotation correction - $O_x, O_y$ = orthogonality corrections 4.4 Wafer Distortion Modeling Wafer-level distortion is often modeled using Zernike polynomials: $$ W(r, \theta) = \sum_{n,m} Z_n^m \cdot R_n^m(r) \cdot \cos(m\theta) $$ 5. Computational Lithography: The Inverse Problem 5.1 Optical Proximity Correction (OPC) Given target pattern $P_{target}$, find mask $M$ such that: $$ \min_M \|Litho(M) - P_{target}\|^2 + \lambda \cdot \mathcal{R}(M) $$ Where: - $Litho(\cdot)$ is the forward lithography model - $\mathcal{R}(M)$ enforces mask manufacturability constraints - $\lambda$ is the regularization weight 5.2 Gradient-Based Optimization Using the chain rule through the forward model: $$ \frac{\partial L}{\partial M} = \frac{\partial L}{\partial I} \cdot \frac{\partial I}{\partial M} $$ The aerial image gradient $\frac{\partial I}{\partial M}$ can be computed efficiently via: $$ \frac{\partial I}{\partial M}(x,y) = 2 \cdot \text{Re}\left[\iint TCC \cdot \frac{\partial M}{\partial M_{pixel}} \cdot M^* \cdot e^{i\phi} \, df\,dg\right] $$ 5.3 Inverse Lithography Technology (ILT) For curvilinear masks, the level-set method parametrizes the mask boundary: $$ \frac{\partial \phi}{\partial t} + F|\nabla\phi| = 0 $$ Where: - $\phi$ is the signed distance function - $F$ is the speed function derived from the cost gradient: $$ F = -\frac{\partial L}{\partial \phi} $$ 5.4 Source-Mask Optimization (SMO) Joint optimization over source shape $S$ and mask $M$: $$ \min_{S,M} \mathcal{L}(S,M) = \|I(S,M) - I_{target}\|^2 + \alpha \mathcal{R}_S(S) + \beta \mathcal{R}_M(M) $$ Optimization approach: 1. Fix $S$, optimize $M$ (mask optimization) 2. Fix $M$, optimize $S$ (source optimization) 3. Iterate until convergence 5.5 Process Window Optimization Maximize the overlapping process window: $$ \max_{M} \left[ \min_{(dose, focus) \in PW} \left( CD_{target} - |CD(dose, focus) - CD_{target}| \right) \right] $$ 6. Multi-Patterning Mathematics Below ~40nm pitch with 193nm lithography, single exposure cannot resolve features. 6.1 Graph Coloring Formulation Problem: Assign features to masks such that no two features on the same mask violate minimum spacing. Graph representation: - Nodes = pattern features - Edges = spacing conflicts (features too close for single exposure) - Colors = mask assignments For double patterning (LELE), this becomes graph 2-coloring . 6.2 Integer Linear Programming Formulation Objective: Minimize stitches (pattern splits) $$ \min \sum_i c_i \cdot s_i $$ Subject to: $$ x_i + x_j \geq 1 \quad \forall (i,j) \in \text{Conflicts} $$ $$ x_i \in \{0,1\} $$ 6.3 Conflict Graph Analysis The chromatic number $\chi(G)$ determines minimum masks needed: - $\chi(G) = 2$ → Double patterning feasible - $\chi(G) = 3$ → Triple patterning required - $\chi(G) > 3$ → Layout modification needed Odd cycle detection: $$ \text{Conflict if } \exists \text{ cycle of odd length in conflict graph} $$ 6.4 Self-Aligned Patterning (SADP/SAQP) Spacer-based approaches achieve pitch multiplication: $$ Pitch_{final} = \frac{Pitch_{mandrel}}{2^n} $$ Where $n$ is the number of spacer iterations. SADP constraints: - All lines have same width (spacer width) - Only certain topologies are achievable - Tip-to-tip spacing constraints 7. Stochastic Effects (Critical for EUV) At EUV wavelengths, photon shot noise becomes significant. 7.1 Photon Statistics Photon count follows Poisson statistics: $$ P(n) = \frac{\lambda^n e^{-\lambda}}{n!} $$ Where: - $n$ = number of photons - $\lambda$ = expected photon count The resulting dose variation: $$ \frac{\sigma_{dose}}{dose} = \frac{1}{\sqrt{N_{photons}}} $$ 7.2 Photon Count Estimation Number of photons per pixel: $$ N_{photons} = \frac{Dose \cdot A_{pixel}}{E_{photon}} = \frac{Dose \cdot A_{pixel} \cdot \lambda}{hc} $$ For EUV (λ = 13.5 nm): $$ E_{photon} = \frac{hc}{\lambda} \approx 92 \text{ eV} $$ 7.3 Stochastic Edge Placement Error $$ \sigma_{SEPE} \propto \frac{1}{\sqrt{Dose \cdot ILS}} $$ The stochastic EPE relationship: $$ \sigma_{EPE,stoch} = \frac{\sigma_{dose,local}}{ILS_{resist}} \approx \sqrt{\frac{2}{\pi}} \cdot \frac{1}{ILS \cdot \sqrt{n_{eff}}} $$ Where $n_{eff}$ is the effective number of photons contributing to the edge. 7.4 Line Edge Roughness (LER) Power spectral density of edge roughness: $$ PSD(f) = \frac{2\sigma^2 \xi}{1 + (2\pi f \xi)^{2\alpha}} $$ Where: - $\sigma$ = RMS roughness amplitude - $\xi$ = correlation length - $\alpha$ = roughness exponent (Hurst parameter) 7.5 Defect Probability The probability of a stochastic failure: $$ P_{fail} = 1 - \text{erf}\left(\frac{CD/2 - \mu_{edge}}{\sqrt{2}\sigma_{edge}}\right) $$ 8. Physical Design Placement Optimization At the design level, cell placement is a large-scale optimization problem. 8.1 Quadratic Placement Minimize half-perimeter wirelength approximation: $$ W = \sum_{(i,j) \in E} w_{ij} \left[(x_i - x_j)^2 + (y_i - y_j)^2\right] $$ This yields a sparse linear system: $$ Qx = b_x, \quad Qy = b_y $$ Where $Q$ is the weighted graph Laplacian: $$ Q_{ii} = \sum_{j \neq i} w_{ij}, \quad Q_{ij} = -w_{ij} $$ 8.2 Half-Perimeter Wirelength (HPWL) For a net with pins at positions $\{(x_i, y_i)\}$: $$ HPWL = \left(\max_i x_i - \min_i x_i\right) + \left(\max_i y_i - \min_i y_i\right) $$ 8.3 Density-Aware Placement To prevent overlap, add density constraints: $$ \sum_{c \in bin(k)} A_c \leq D_{max} \cdot A_{bin} \quad \forall k $$ Solved via augmented Lagrangian: $$ \mathcal{L}(x, \lambda) = W(x) + \sum_k \lambda_k \left(\sum_{c \in bin(k)} A_c - D_{max} \cdot A_{bin}\right) $$ 8.4 Timing-Driven Placement With timing criticality weights $w_i$: $$ \min \sum_i w_i \cdot d_i(placement) $$ Delay model (Elmore delay): $$ \tau_{Elmore} = \sum_{i} R_i \cdot C_{downstream,i} $$ 8.5 Electromigration-Aware Placement Current density constraint: $$ J = \frac{I}{A_{wire}} \leq J_{max} $$ $$ MTTF = A \cdot J^{-n} \cdot e^{\frac{E_a}{kT}} $$ 9. Process Control Mathematics 9.1 Run-to-Run Control EWMA (Exponentially Weighted Moving Average): $$ Target_{n+1} = \lambda \cdot Measurement_n + (1-\lambda) \cdot Target_n $$ Where: - $\lambda$ = smoothing factor (0 < λ ≤ 1) - Smaller $\lambda$ → more smoothing, slower response - Larger $\lambda$ → less smoothing, faster response 9.2 State-Space Model Process dynamics: $$ x_{k+1} = Ax_k + Bu_k + w_k $$ $$ y_k = Cx_k + v_k $$ Where: - $x_k$ = state vector (e.g., tool drift) - $u_k$ = control input (recipe adjustments) - $y_k$ = measurement output - $w_k, v_k$ = process and measurement noise 9.3 Kalman Filter Prediction step: $$ \hat{x}_{k|k-1} = A\hat{x}_{k-1|k-1} + Bu_k $$ $$ P_{k|k-1} = AP_{k-1|k-1}A^T + Q $$ Update step: $$ K_k = P_{k|k-1}C^T(CP_{k|k-1}C^T + R)^{-1} $$ $$ \hat{x}_{k|k} = \hat{x}_{k|k-1} + K_k(y_k - C\hat{x}_{k|k-1}) $$ 9.4 Model Predictive Control (MPC) Optimize over prediction horizon $N$: $$ \min_{u_0, \ldots, u_{N-1}} \sum_{k=0}^{N-1} \left[ (y_k - y_{ref})^T Q (y_k - y_{ref}) + u_k^T R u_k \right] $$ Subject to: - State dynamics - Input constraints: $u_{min} \leq u_k \leq u_{max}$ - Output constraints: $y_{min} \leq y_k \leq y_{max}$ 9.5 Virtual Metrology Predict wafer quality from equipment sensor data: $$ \hat{y} = f(\mathbf{s}; \theta) = \mathbf{s}^T \mathbf{w} + b $$ For PLS (Partial Least Squares): $$ \mathbf{X} = \mathbf{T}\mathbf{P}^T + \mathbf{E} $$ $$ \mathbf{y} = \mathbf{T}\mathbf{q} + \mathbf{f} $$ 10. Machine Learning Integration Modern fabs increasingly use ML alongside physics-based models. 10.1 Hotspot Detection Classification problem: $$ P(hotspot | pattern) = \sigma\left(\mathbf{W}^T \cdot CNN(pattern) + b\right) $$ Where: - $\sigma$ = sigmoid function - $CNN$ = convolutional neural network feature extractor Input representations: - Rasterized pattern images - Graph neural networks on layout topology 10.2 Accelerated OPC Neural networks predict corrections: $$ \Delta_{OPC} = NN(P_{local}, context) $$ Benefits: - Reduce iterations from ~20 to ~3-5 - Enable curvilinear OPC at practical runtime 10.3 Etch Modeling with ML Hybrid physics-ML approach: $$ CD_{final} = CD_{resist} + \Delta_{etch}(params) $$ $$ \Delta_{etch} = f_{physics}(params) + NN_{correction}(params, pattern) $$ 10.4 Physics-Informed Neural Networks (PINNs) Combine data with physics constraints: $$ \mathcal{L} = \mathcal{L}_{data} + \lambda \cdot \mathcal{L}_{physics} $$ Physics loss example (diffusion equation): $$ \mathcal{L}_{physics} = \left\| \frac{\partial u}{\partial t} - D\nabla^2 u \right\|^2 $$ 10.5 Yield Prediction Random Forest / Gradient Boosting: $$ \hat{Y} = \sum_{m=1}^{M} \gamma_m h_m(\mathbf{x}) $$ Where: - $h_m$ = weak learners (decision trees) - $\gamma_m$ = weights 11. Design-Technology Co-Optimization (DTCO) At advanced nodes, design and process must be optimized jointly. 11.1 Multi-Objective Formulation $$ \min \left[ f_{performance}(x), f_{power}(x), f_{area}(x), f_{yield}(x) \right] $$ Subject to: - Design rule constraints: $g_{DR}(x) \leq 0$ - Process capability constraints: $g_{process}(x) \leq 0$ - Reliability constraints: $g_{reliability}(x) \leq 0$ 11.2 Pareto Optimality A solution $x^*$ is Pareto optimal if: $$ \nexists x : f_i(x) \leq f_i(x^*) \; \forall i \text{ and } f_j(x) < f_j(x^*) \text{ for some } j $$ 11.3 Design Rule Optimization Minimize total cost: $$ \min_{DR} \left[ C_{area}(DR) + C_{yield}(DR) + C_{performance}(DR) \right] $$ Trade-off relationships: - Tighter metal pitch → smaller area, lower yield - Larger via size → better reliability, larger area - More routing layers → better routability, higher cost 11.4 Standard Cell Optimization Cell height optimization: $$ H_{cell} = n \cdot CPP \cdot k $$ Where: - $CPP$ = contacted poly pitch - $n$ = number of tracks - $k$ = scaling factor 11.5 Interconnect RC Optimization Resistance: $$ R = \rho \cdot \frac{L}{W \cdot H} $$ Capacitance (parallel plate approximation): $$ C = \epsilon \cdot \frac{A}{d} $$ RC delay: $$ \tau_{RC} = R \cdot C \propto \frac{\rho \epsilon L^2}{W H d} $$ 12. Mathematical Stack | Level | Mathematics | Key Challenge | |-------|-------------|---------------| | Optics | Fourier optics, Maxwell equations | Partially coherent imaging | | Resist | Diffusion-reaction PDEs | Nonlinear kinetics | | Pattern Transfer | Etch modeling, surface evolution | Multiphysics coupling | | Placement | Graph theory, ILP, quadratic programming | NP-hard decomposition | | Overlay | Polynomial field fitting | Sub-nm registration | | OPC/ILT | Nonlinear inverse problems | Non-convex optimization | | Stochastics | Poisson processes, Monte Carlo | Low-photon regimes | | Control | State-space, Kalman filtering | Real-time adaptation | | ML | CNNs, GNNs, PINNs | Generalization, interpretability | Equations Fundamental Lithography $$ R_{min} = k_1 \cdot \frac{\lambda}{NA} \quad \text{(Resolution)} $$ $$ DOF = k_2 \cdot \frac{\lambda}{NA^2} \quad \text{(Depth of Focus)} $$ Edge Placement $$ EPE_{total} = \sqrt{EPE_{overlay}^2 + EPE_{CD}^2 + EPE_{LER}^2 + EPE_{stoch}^2} $$ Stochastic Limits (EUV) $$ \sigma_{EPE,stoch} \propto \frac{1}{\sqrt{Dose \cdot ILS}} $$ OPC Optimization $$ \min_M \|Litho(M) - P_{target}\|^2 + \lambda \mathcal{R}(M) $$

pattern recognition yield, yield enhancement

Pattern recognition identifies recurring failure signatures on wafer maps suggesting specific defect mechanisms.

patterned wafer inspection, metrology

Inspect after patterning.

payback period, business & strategy

Payback period measures time to recover initial investment from cash flows.

pbm, pbm, recommendation systems

Position-Based Model separately estimates attractiveness and examination probability for each position.

pbs, pbs, infrastructure

Job scheduler for HPC.

pbti modeling, pbti, reliability

Model PBTI degradation.

pc algorithm, pc, time series models

PC algorithm learns directed acyclic graphs from data through conditional independence tests.

pc-darts, pc-darts, neural architecture search

Partially Connected DARTS reduces memory costs by searching over subsets of edges rather than entire architecture graph.

pca,principal component analysis,dimensionality reduction,eigenvalue,eigendecomposition,variance,semiconductor pca,fdc

# Principal Component Analysis (PCA) in Semiconductor Manufacturing: Mathematical Foundations 1. Introduction and Motivation Semiconductor manufacturing is one of the most complex industrial processes, involving hundreds to thousands of process variables across fabrication steps like lithography, etching, chemical vapor deposition (CVD), ion implantation, and chemical mechanical polishing (CMP). A single wafer fab might monitor 2,000–10,000 sensor readings and process parameters simultaneously. PCA addresses a fundamental challenge: how do you extract meaningful patterns from massively high-dimensional data while separating true process variation from noise? 2. The Mathematical Framework of PCA 2.1 Problem Setup Let X be an n × p data matrix where: • n = number of observations (wafers, lots, or time points) • p = number of variables (sensor readings, metrology measurements) In semiconductor contexts, p is often very large (hundreds or thousands), while n might be comparable or even smaller. 2.2 Centering and Standardization Step 1: Center the data For each variable j, compute the mean: • x̄ⱼ = (1/n) Σᵢxᵢⱼ Create the centered matrix X̃ where: • x̃ᵢⱼ = xᵢⱼ - x̄ⱼ Step 2: Standardize (optional but common) In semiconductor manufacturing, variables have vastly different scales (temperature in °C, pressure in mTorr, RF power in watts, thickness in angstroms). Standardization is typically essential: • zᵢⱼ = (xᵢⱼ - x̄ⱼ) / sⱼ where: • sⱼ = √[(1/(n-1)) Σᵢ(xᵢⱼ - x̄ⱼ)²] This gives the standardized matrix Z. 2.3 The Covariance and Correlation Matrices The sample covariance matrix of centered data: • S = (1/(n-1)) X̃ᵀX̃ The correlation matrix (when using standardized data): • R = (1/(n-1)) ZᵀZ Both are p × p symmetric positive semi-definite matrices. 3. The Eigenvalue Problem: Core of PCA 3.1 Eigendecomposition PCA seeks to find orthogonal directions that maximize variance. This leads to the eigenvalue problem: • Svₖ = λₖvₖ Where: • λₖ = k-th eigenvalue (variance captured by PCₖ) • vₖ = k-th eigenvector (loadings defining PCₖ) Properties: • Eigenvalues are non-negative: λ₁ ≥ λ₂ ≥ ⋯ ≥ λₚ ≥ 0 • Eigenvectors are orthonormal: vᵢᵀvⱼ = δᵢⱼ • Total variance: Σₖλₖ = trace(S) = Σⱼsⱼ² 3.2 Derivation via Variance Maximization The first principal component is the unit vector w that maximizes the variance of the projected data: • max_w Var(X̃w) = max_w wᵀSw subject to ‖w‖ = 1. Using Lagrange multipliers: • L = wᵀSw - λ(wᵀw - 1) Taking the gradient and setting to zero: • ∂L/∂w = 2Sw - 2λw = 0 • Sw = λw This proves that the variance-maximizing direction is an eigenvector, and the variance along that direction equals the eigenvalue. 3.3 Singular Value Decomposition (SVD) Approach Computationally, PCA is typically performed via SVD of the centered data matrix: • X̃ = UΣVᵀ Where: • U is n × n orthogonal (left singular vectors) • Σ is n × p diagonal with singular values σ₁ ≥ σ₂ ≥ ⋯ • V is p × p orthogonal (right singular vectors = principal component loadings) The relationship to eigenvalues: • λₖ = σₖ² / (n-1) Why SVD? • Numerically more stable than directly computing S and its eigendecomposition • Works even when p > n (common in semiconductor metrology) • Avoids forming the potentially huge p × p covariance matrix 4. PCA Components and Interpretation 4.1 Loadings (Eigenvectors) The loadings matrix V = [v₁ | v₂ | ⋯ | vₚ] contains the "recipes" for each principal component: • PCₖ = v₁ₖ·(variable 1) + v₂ₖ·(variable 2) + ⋯ + vₚₖ·(variable p) Semiconductor interpretation: If PC₁ has large positive loadings on chamber temperature, chuck temperature, and wall temperature, but small loadings on gas flow rates, then PC₁ represents a "thermal mode" of process variation. 4.2 Scores (Projections) The scores matrix gives each observation's position in the reduced PC space: • T = X̃V or equivalently, using SVD: T = UΣ Each row of T represents a wafer's "coordinates" in the principal component space. 4.3 Variance Explained The proportion of variance explained by the k-th component: • PVEₖ = λₖ / Σⱼλⱼ Cumulative variance explained: • CPVEₖ = Σⱼ₌₁ᵏ PVEⱼ Example: In a 500-variable semiconductor dataset, you might find: • PC1: 35% variance (overall thermal drift) • PC2: 18% variance (pressure/flow mode) • PC3: 8% variance (RF power variation) • First 10 PCs: 85% cumulative variance 5. Dimensionality Reduction and Reconstruction 5.1 Reduced Representation Keeping only the first q principal components (where q ≪ p): • Tᵧ = X̃Vᵧ where Vᵧ is p × q (the first q columns of V). This compresses the data from p dimensions to q dimensions while preserving the most important variation. 5.2 Reconstruction Approximate reconstruction of original data: • X̂ = TᵧVᵧᵀ + 1·x̄ᵀ The reconstruction error (residuals): • E = X̃ - TᵧVᵧᵀ = X̃(I - VᵧVᵧᵀ) 6. Statistical Monitoring Using PCA 6.1 Hotelling's T² Statistic Measures how far a new observation is from the center within the PC model: • T² = Σₖ(tₖ²/λₖ) = tᵀΛᵧ⁻¹t This is a Mahalanobis distance in the reduced space. Control limit (under normality assumption): • T²_α = [q(n²-1) / n(n-q)] × F_α(q, n-q) Semiconductor use: High T² indicates the wafer is "unusual but explained by the model"—variation is in known directions but extreme in magnitude. 6.2 Q-Statistic (Squared Prediction Error) Measures variation outside the model (in the residual space): • Q = eᵀe = ‖x̃ - Vᵧt‖² = Σₖ₌ᵧ₊₁ᵖ tₖ² Approximate control limit (Jackson-Mudholkar): • Q_α = θ₁ × [c_α√(2θ₂h₀²)/θ₁ + 1 + θ₂h₀(h₀-1)/θ₁²]^(1/h₀) where θᵢ = Σₖ₌ᵧ₊₁ᵖ λₖⁱ and h₀ = 1 - 2θ₁θ₃/(3θ₂²) Semiconductor use: High Q indicates a new type of variation not seen in the training data—potentially a novel fault condition. 6.3 Combined Monitoring Logic • T² Normal + Q Normal → Process in control • T² High + Q Normal → Known variation, extreme magnitude • T² Normal + Q High → New variation pattern • T² High + Q High → Severe, possibly mixed fault 7. Variable Contribution Analysis When T² or Q exceeds limits, identify which variables are responsible. 7.1 Contributions to T² For observation with score vector t: • Cont_T²(j) = Σₖ(vⱼₖtₖ/√λₖ) × x̃ⱼ Variables with large contributions are driving the out-of-control signal. 7.2 Contributions to Q • Cont_Q(j) = eⱼ² = (x̃ⱼ - Σₖvⱼₖtₖ)² 8. Semiconductor Manufacturing Applications 8.1 Fault Detection and Classification (FDC) Example setup: • 800 sensors on a plasma etch chamber • PCA model built on 2,000 "golden" wafers • Real-time monitoring: compute T² and Q for each new wafer • If limits exceeded: alarm, contribution analysis, automated disposition Typical faults detected: • RF matching network drift (shows in RF-related loadings) • Throttle valve degradation (pressure control variables) • Gas line contamination (specific gas flow signatures) • Chamber seasoning effects (gradual drift in PC scores) 8.2 Virtual Metrology Use PCA to predict expensive metrology from cheap sensor data: • Build PCA model on sensor data X • Relate PC scores to metrology y (e.g., film thickness, CD) via regression: • ŷ = β₀ + βᵀt This is Principal Component Regression (PCR). Advantage: Reduces the p >> n problem; regularizes against overfitting. 8.3 Run-to-Run Control Incorporate PC scores into feedback control loops: • Recipe adjustment = K·(T_target - T_actual) where T is the score vector, enabling multivariate feedback control. 9. Practical Considerations in Semiconductor Fabs 9.1 Choosing the Number of Components (q) Common methods: • Scree plot: Look for "elbow" in eigenvalue plot • Cumulative variance: Choose q such that CPVE ≥ threshold (e.g., 90%) • Cross-validation: Minimize prediction error on held-out data • Parallel analysis: Compare eigenvalues to those from random data In semiconductor FDC, typically q = 5–20 for a 500–1000 variable model. 9.2 Handling Missing Data Common in semiconductor metrology (tool downtime, sampling strategies): • Simple: Impute with variable mean • Iterative PCA: Impute, build PCA, predict missing values, iterate • NIPALS algorithm: Handles missing data natively 9.3 Non-Stationarity and Model Updating Semiconductor processes drift over time (chamber conditioning, consumable wear). Approaches: • Moving window PCA: Rebuild model on recent n observations • Recursive PCA: Update eigendecomposition incrementally • Adaptive thresholds: Adjust control limits based on recent performance 9.4 Nonlinear Extensions When linear PCA is insufficient: • Kernel PCA: Map data to higher-dimensional space via kernel function • Neural network autoencoders: Nonlinear compression/reconstruction • Multiway PCA: For batch processes (unfold 3D array to 2D) 10. Mathematical Example: A Simplified Illustration Consider a toy example with 3 sensors on an etch chamber: • Wafer 1: Temp = 100°C | Pressure = 50 mTorr | RF Power = 3.0 kW • Wafer 2: Temp = 102°C | Pressure = 51 mTorr | RF Power = 3.1 kW • Wafer 3: Temp = 98°C | Pressure = 49 mTorr | RF Power = 2.9 kW • Wafer 4: Temp = 105°C | Pressure = 52 mTorr | RF Power = 3.2 kW • Wafer 5: Temp = 97°C | Pressure = 48 mTorr | RF Power = 2.8 kW Step 1: Standardize (since units differ) After standardization, compute correlation matrix R. Step 2: Eigendecomposition of R • R ≈ [1.0, 0.98, 0.99; 0.98, 1.0, 0.97; 0.99, 0.97, 1.0] Eigenvalues: λ₁ = 2.94, λ₂ = 0.04, λ₃ = 0.02 Step 3: Interpretation • PC1 captures 98% of variance with loadings ≈ [0.58, 0.57, 0.58] • This means all three variables move together (correlated drift) • A single score value summarizes the "overall process state" 11. Summary PCA provides the semiconductor industry with a mathematically rigorous framework for: • Dimensionality reduction: Compress thousands of variables to a manageable number of interpretable components • Fault detection: Monitor T² and Q statistics against control limits • Root cause analysis: Contribution plots identify which sensors/variables are responsible for alarms • Virtual metrology: Predict quality metrics from process data • Process understanding: Eigenvectors reveal the underlying modes of process variation The core mathematics—eigendecomposition, variance maximization, and orthogonal projection—remain the same whether you're analyzing 3 variables or 3,000. The elegance of PCA lies in this scalability, making it indispensable for modern semiconductor manufacturing where data volumes continue to grow exponentially. Further Research: • Advanced PCA Methods: Explore kernel PCA for nonlinear dimensionality reduction, sparse PCA for interpretable loadings, and robust PCA for outlier resistance. • Multiway PCA: For batch semiconductor processes, multiway PCA unfolds 3D data arrays (wafers × variables × time) into 2D matrices for analysis. • Dynamic PCA: Incorporates time-lagged variables to capture process dynamics and autocorrelation in time-series sensor data. • Partial Least Squares (PLS): When the goal is prediction rather than compression, PLS finds latent variables that maximize covariance with the response variable. • Independent Component Analysis (ICA): Finds statistically independent components rather than uncorrelated components, useful for separating mixed fault signatures. • Real-Time Implementation: Industrial PCA systems process thousands of variables per wafer in milliseconds, requiring efficient algorithms and hardware acceleration. • Integration with Machine Learning: Modern fault detection systems combine PCA-based monitoring with neural networks and ensemble methods for improved classification accuracy.

pcgrad, reinforcement learning advanced

PCGrad resolves gradient conflicts in multi-task learning by projecting conflicting gradients to avoid negative transfer.

pcm (process control monitor),pcm,process control monitor,metrology

Dedicated test wafers or sites for monitoring process.

pcmci plus, pcmci, time series models

PCMCI+ extends PCMCI with improved conditional independence testing for high-dimensional time series.

pcmci, pcmci, time series models

PCMCI combines PC algorithm with momentary conditional independence testing for causal discovery in multivariate time series.

pcpo, pcpo, reinforcement learning advanced

Projection-Based Constrained Policy Optimization improves sample efficiency in safe RL through better constraint handling.

pd-soi (partially depleted soi),pd-soi,partially depleted soi,technology

Thicker body not fully depleted.

pdca cycle, pdca, quality

Continuous improvement cycle.