← Back to AI Factory Chat

AI Factory Glossary

9,967 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 99 of 200 (9,967 entries)

lot merging, operations

Combine lots.

lot number, manufacturing operations

Lot numbers identify groups of wafers processed together for traceability.

lot number, traceability

Batch identification.

lot sizing, supply chain & logistics

Lot sizing determines optimal production or order quantities balancing setup costs and inventory.

lot splitting, operations

Divide lots for parallel processing.

lot tracking, operations

Monitor wafer lot location.

lot,production

Group of wafers processed together as a batch.

lottery ticket hypothesis, model optimization

Lottery ticket hypothesis posits that dense networks contain sparse subnetworks trainable to full accuracy.

lottery ticket hypothesis,model training

Sparse subnetworks that train from scratch.

lottery ticket,sparse,init

Lottery ticket hypothesis: sparse subnetworks train to full accuracy. Find winning ticket through pruning.

louvain algorithm, graph algorithms

Fast community detection method.

low energy electron diffraction (leed),low energy electron diffraction,leed,metrology

Surface crystal structure.

low temperature, text generation

More deterministic generation.

low-angle grain boundary, defects

Small misorientation between grains.

low-k dielectric, process integration

Low-k dielectrics reduce inter-metal capacitance and RC delay by using materials with lower permittivity than silicon dioxide.

low-k dielectric,beol

Insulator with low dielectric constant to reduce capacitance and RC delay.

low-k dielectric,beol

Insulator with low k for reduced capacitance.

low-loop vs high-loop, packaging

Different loop profiles.

low-precision training, optimization

Use FP16 or BF16 for training.

low-profile qfp, lqfp, packaging

Thinner QFP.

low-rank factorization, model optimization

Low-rank factorization decomposes weight matrices into products of smaller matrices.

low-rank tensor fusion, multimodal ai

Efficient tensor fusion.

low-resource translation, nlp

Translate with limited data.

low-temperature bake, packaging

Gentle moisture removal.

lower control limit, lcl, spc

Lower boundary for normal variation.

lower specification limit, lsl, spc

Minimum acceptable value.

lowercasing, nlp

Convert to lowercase.

lp norm constraints, ai safety

Bound perturbations in Lp norm.

lpcnet, audio & speech

LPCNet combines linear prediction with neural vocoding for efficient high-quality speech generation.

lpcvd (low-pressure cvd),lpcvd,low-pressure cvd,cvd

CVD at reduced pressure for better uniformity and step coverage.

lpips, lpips, evaluation

Perceptual distance metric.

lru cache (least recently used),lru cache,least recently used,optimization

Evict least recently used items when full.

lru cache, lru, llm optimization

Least Recently Used cache evicts oldest accessed entries maintaining frequently used items.

lsh, lsh, rag

Locality Sensitive Hashing creates hash functions preserving similarity for approximate search.

lstm anomaly, lstm, time series models

LSTM-based anomaly detection flags time steps with high prediction error or unusual hidden states.

lstm-vae anomaly, lstm-vae, time series models

LSTM-VAE combines variational autoencoders with LSTM networks to detect anomalies in sequential data through reconstruction probability thresholds.

lstnet, time series models

Long- and Short-term Time-series Network combines CNNs and RNNs with skip connections for multivariate forecasting.

lsuv, lsuv, optimization

Initialize to unit variance per layer.

ltpd, ltpd, quality & reliability

Lot Tolerance Percent Defective is worst quality level acceptable to consumer.

lvcnet, audio & speech

Light-weight and Variable-Context Network for efficient high-quality neural vocoding on mobile devices.

lvi, lvi, failure analysis advanced

Laser Voltage Imaging spatially maps voltage distributions across die surface revealing shorts or voltage drops.

lvs (layout versus schematic),lvs,layout versus schematic,design

Verify layout matches circuit schematic.

lyapunov functions rl, reinforcement learning advanced

Lyapunov functions provide certificates of stability and safety for learned control policies.

mac efficiency, mac, model optimization

Multiply-accumulate efficiency quantifies utilization of hardware MAC units during inference.

maccs keys, maccs, chemistry ai

Structural key descriptors.

mace, mace, chemistry ai

High-accuracy molecular modeling.

machine capability, spc

Inherent equipment capability.

machine learning applications, ML semiconductor, AI semiconductor manufacturing, virtual metrology, deep learning fab, neural network semiconductor, predictive maintenance fab, yield prediction ML, defect detection AI, process optimization ML

# Semiconductor Manufacturing Process: Machine Learning Applications & Mathematical Modeling A comprehensive exploration of the intersection of advanced mathematics, statistical learning, and semiconductor physics. ## 1. The Problem Landscape Semiconductor manufacturing is arguably the most complex manufacturing process ever devised: - **500+ sequential process steps** for advanced chips - **Thousands of control parameters** per tool - **Sub-nanometer precision** requirements (modern nodes at 3nm, moving to 2nm) - **Billions of transistors** per chip - **Yield sensitivity** — a single defect can destroy a \$10,000+ chip This creates an ideal environment for ML: - High dimensionality - Massive data generation - Complex nonlinear physics - Enormous economic stakes ### Key Manufacturing Stages 1. **Front-end processing (wafer fabrication)** - Photolithography - Etching (wet and dry) - Deposition (CVD, PVD, ALD) - Ion implantation - Chemical mechanical planarization (CMP) - Oxidation - Metallization 2. **Back-end processing** - Wafer testing - Dicing - Packaging - Final testing ## 2. Core Mathematical Frameworks ### 2.1 Virtual Metrology (VM) **Problem**: Physical metrology is slow and expensive. Predict metrology outcomes from in-situ sensor data. **Mathematical formulation**: Given process sensor data $\mathbf{X} \in \mathbb{R}^{n \times p}$ and sparse metrology measurements $\mathbf{y} \in \mathbb{R}^n$, learn: $$ \hat{y} = f(\mathbf{x}; \theta) $$ **Key approaches**: | Method | Mathematical Form | Strengths | |--------|-------------------|-----------| | Partial Least Squares (PLS) | Maximize $\text{Cov}(\mathbf{Xw}, \mathbf{Yc})$ | Handles multicollinearity | | Gaussian Process Regression | $f(x) \sim \mathcal{GP}(m(x), k(x,x'))$ | Uncertainty quantification | | Neural Networks | Compositional nonlinear mappings | Captures complex interactions | | Ensemble Methods | Aggregation of weak learners | Robustness | **Critical mathematical consideration — Regularization**: $$ L(\theta) = \|\mathbf{y} - f(\mathbf{X};\theta)\|^2 + \lambda_1\|\theta\|_1 + \lambda_2\|\theta\|_2^2 $$ The **elastic net penalty** is essential because semiconductor data has: - High collinearity among sensors - Far more features than samples for new processes - Need for interpretable sparse solutions ### 2.2 Fault Detection and Classification (FDC) **Mathematical framework for detection**: Define normal operating region $\Omega$ from training data. For new observation $\mathbf{x}$, compute: $$ d(\mathbf{x}, \Omega) = \text{anomaly score} $$ #### PCA-based Approach (Industry Workhorse) Project data onto principal components. Compute: - **$T^2$ statistic** (variation within model): $$ T^2 = \sum_{i=1}^{k} \frac{t_i^2}{\lambda_i} $$ - **$Q$ statistic / SPE** (variation outside model): $$ Q = \|\mathbf{x} - \hat{\mathbf{x}}\|^2 = \|(I - PP^T)\mathbf{x}\|^2 $$ #### Deep Learning Extensions - **Autoencoders**: Reconstruction error as anomaly score - **Variational Autoencoders**: Probabilistic anomaly detection via ELBO - **One-class Neural Networks**: Learn decision boundary around normal data #### Fault Classification Given fault signatures, this becomes multi-class classification. The mathematical challenge is **class imbalance** — faults are rare. **Solutions**: - SMOTE and variants for synthetic oversampling - Cost-sensitive learning - **Focal loss**: $$ FL(p) = -\alpha(1-p)^\gamma \log(p) $$ ### 2.3 Run-to-Run (R2R) Process Control **The control problem**: Processes drift due to chamber conditioning, consumable wear, and environmental variation. Adjust recipe parameters between wafer runs to maintain targets. #### EWMA Controller (Simplest Form) $$ u_{k+1} = u_k + \lambda \cdot G^{-1}(y_{\text{target}} - y_k) $$ where $G$ is the process gain matrix $\left(\frac{\partial y}{\partial u}\right)$. #### Model Predictive Control Formulation $$ \min_{u_k} J = (y_{\text{target}} - \hat{y}_k)^T Q (y_{\text{target}} - \hat{y}_k) + \Delta u_k^T R \, \Delta u_k $$ **Subject to**: - Process model: $\hat{y} = f(u, \text{state})$ - Constraints: $u_{\min} \leq u \leq u_{\max}$ #### Adaptive/Learning R2R The process model drifts. Use recursive estimation: $$ \hat{\theta}_{k+1} = \hat{\theta}_k + K_k(y_k - \hat{y}_k) $$ where $K$ is the **Kalman gain**, or use online gradient descent for neural network models. ### 2.4 Yield Modeling and Optimization #### Classical Defect-Limited Yield **Poisson model**: $$ Y = e^{-AD} $$ where $A$ = chip area, $D$ = defect density. **Negative binomial** (accounts for clustering): $$ Y = \left(1 + \frac{AD}{\alpha}\right)^{-\alpha} $$ #### ML-based Yield Prediction The yield is a complex function of hundreds of process parameters across all steps. This is a high-dimensional regression problem with: - Interactions between distant process steps - Nonlinear effects - Spatial patterns on wafer **Gradient boosted trees** (XGBoost, LightGBM) excel here due to: - Automatic feature selection - Interaction detection - Robustness to outliers #### Spatial Yield Modeling Uses Gaussian processes with spatial kernels: $$ k(x_i, x_j) = \sigma^2 \exp\left(-\frac{\|x_i - x_j\|^2}{2\ell^2}\right) $$ to capture systematic wafer-level patterns. ## 3. Physics-Informed Machine Learning ### 3.1 The Hybrid Paradigm Pure data-driven models struggle with: - Extrapolation beyond training distribution - Limited data for new processes - Physical implausibility of predictions #### Physics-Informed Neural Networks (PINNs) $$ L = L_{\text{data}} + \lambda_{\text{physics}} L_{\text{physics}} $$ where $L_{\text{physics}}$ enforces physical laws. **Examples in semiconductor context**: | Process | Governing Physics | PDE Constraint | |---------|-------------------|----------------| | Thermal processing | Heat equation | $\frac{\partial T}{\partial t} = \alpha \nabla^2 T$ | | Diffusion/implant | Fick's law | $\frac{\partial C}{\partial t} = D \nabla^2 C$ | | Plasma etch | Boltzmann + fluid | Complex coupled system | | CMP | Preston equation | $\frac{dh}{dt} = k_p \cdot P \cdot V$ | ### 3.2 Computational Lithography #### The Forward Problem Mask pattern $M(\mathbf{r})$ → Optical system $H(\mathbf{k})$ → Aerial image → Resist chemistry → Final pattern $$ I(\mathbf{r}) = \left|\mathcal{F}^{-1}\{H(\mathbf{k}) \cdot \mathcal{F}\{M(\mathbf{r})\}\}\right|^2 $$ #### Inverse Lithography / OPC Given target pattern, find mask that produces it. This is a **non-convex optimization**: $$ \min_M \|P_{\text{target}} - P(M)\|^2 + R(M) $$ #### ML Acceleration - **CNNs** learn the forward mapping (1000× faster than rigorous simulation) - **GANs** for mask synthesis - **Differentiable lithography simulators** for end-to-end optimization ## 4. Time Series and Sequence Modeling ### 4.1 Equipment Health Monitoring #### Remaining Useful Life (RUL) Prediction Model equipment degradation as a stochastic process: $$ S(t) = S_0 + \int_0^t g(S(\tau), u(\tau)) \, d\tau + \sigma W(t) $$ #### Deep Learning Approaches - **LSTM/GRU**: Capture long-range temporal dependencies in sensor streams - **Temporal Convolutional Networks**: Dilated convolutions for efficient long sequences - **Transformers**: Attention over maintenance history and operating conditions ### 4.2 Trace Data Analysis Each wafer run produces high-frequency sensor traces (temperature, pressure, RF power, etc.). #### Feature Extraction Approaches - Statistical moments (mean, variance, skewness) - Frequency domain (FFT coefficients) - Wavelet decomposition - Learned features via 1D CNNs or autoencoders #### Dynamic Time Warping (DTW) For trace comparison: $$ DTW(X, Y) = \min_{\pi} \sum_{(i,j) \in \pi} d(x_i, y_j) $$ ## 5. Bayesian Optimization for Process Development ### 5.1 The Experimental Challenge New process development requires finding optimal recipe settings with minimal experiments (each wafer costs \$1000+, time is critical). #### Bayesian Optimization Framework 1. Fit Gaussian Process surrogate to observations 2. Compute acquisition function 3. Query next point: $x_{\text{next}} = \arg\max_x \alpha(x)$ 4. Repeat #### Acquisition Functions - **Expected Improvement**: $$ EI(x) = \mathbb{E}[\max(f(x) - f^*, 0)] $$ - **Knowledge Gradient**: Value of information from observing at $x$ - **Upper Confidence Bound**: $$ UCB(x) = \mu(x) + \kappa\sigma(x) $$ ### 5.2 High-Dimensional Extensions Standard BO struggles beyond ~20 dimensions. Semiconductor recipes have 50-200 parameters. **Solutions**: - **Random embeddings** (REMBO) - **Additive structure**: $f(\mathbf{x}) = \sum_i f_i(x_i)$ - **Trust region methods** (TuRBO) - **Neural network surrogates** ## 6. Causal Inference for Root Cause Analysis ### 6.1 The Problem **Correlation ≠ Causation**. When yield drops, engineers need to find the *cause*, not just correlated variables. #### Granger Causality (Time Series) $X$ Granger-causes $Y$ if past $X$ improves prediction of $Y$ beyond past $Y$ alone: $$ \sigma^2(Y_t | Y_{ \sigma^2(Y_t | Y_{

machine learning for fab,production

Apply ML to optimize recipes predict defects or improve yield.

machine learning force fields, chemistry ai

Learn forces from quantum calculations.