← Back to AI Factory Chat

AI Factory Glossary

3,145 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 33 of 63 (3,145 entries)

lru cache, lru, llm optimization

Least Recently Used cache evicts oldest accessed entries maintaining frequently used items.

lstm anomaly, lstm, time series models

LSTM-based anomaly detection flags time steps with high prediction error or unusual hidden states.

lstm-vae anomaly, lstm-vae, time series models

LSTM-VAE combines variational autoencoders with LSTM networks to detect anomalies in sequential data through reconstruction probability thresholds.

lstnet, time series models

Long- and Short-term Time-series Network combines CNNs and RNNs with skip connections for multivariate forecasting.

lvi, lvi, failure analysis advanced

Laser Voltage Imaging spatially maps voltage distributions across die surface revealing shorts or voltage drops.

mac efficiency, mac, model optimization

Multiply-accumulate efficiency quantifies utilization of hardware MAC units during inference.

maccs keys, maccs, chemistry ai

Structural key descriptors.

mace, mace, chemistry ai

High-accuracy molecular modeling.

machine learning applications, ML semiconductor, AI semiconductor manufacturing, virtual metrology, deep learning fab, neural network semiconductor, predictive maintenance fab, yield prediction ML, defect detection AI, process optimization ML

# Semiconductor Manufacturing Process: Machine Learning Applications & Mathematical Modeling A comprehensive exploration of the intersection of advanced mathematics, statistical learning, and semiconductor physics. ## 1. The Problem Landscape Semiconductor manufacturing is arguably the most complex manufacturing process ever devised: - **500+ sequential process steps** for advanced chips - **Thousands of control parameters** per tool - **Sub-nanometer precision** requirements (modern nodes at 3nm, moving to 2nm) - **Billions of transistors** per chip - **Yield sensitivity** — a single defect can destroy a \$10,000+ chip This creates an ideal environment for ML: - High dimensionality - Massive data generation - Complex nonlinear physics - Enormous economic stakes ### Key Manufacturing Stages 1. **Front-end processing (wafer fabrication)** - Photolithography - Etching (wet and dry) - Deposition (CVD, PVD, ALD) - Ion implantation - Chemical mechanical planarization (CMP) - Oxidation - Metallization 2. **Back-end processing** - Wafer testing - Dicing - Packaging - Final testing ## 2. Core Mathematical Frameworks ### 2.1 Virtual Metrology (VM) **Problem**: Physical metrology is slow and expensive. Predict metrology outcomes from in-situ sensor data. **Mathematical formulation**: Given process sensor data $\mathbf{X} \in \mathbb{R}^{n \times p}$ and sparse metrology measurements $\mathbf{y} \in \mathbb{R}^n$, learn: $$ \hat{y} = f(\mathbf{x}; \theta) $$ **Key approaches**: | Method | Mathematical Form | Strengths | |--------|-------------------|-----------| | Partial Least Squares (PLS) | Maximize $\text{Cov}(\mathbf{Xw}, \mathbf{Yc})$ | Handles multicollinearity | | Gaussian Process Regression | $f(x) \sim \mathcal{GP}(m(x), k(x,x'))$ | Uncertainty quantification | | Neural Networks | Compositional nonlinear mappings | Captures complex interactions | | Ensemble Methods | Aggregation of weak learners | Robustness | **Critical mathematical consideration — Regularization**: $$ L(\theta) = \|\mathbf{y} - f(\mathbf{X};\theta)\|^2 + \lambda_1\|\theta\|_1 + \lambda_2\|\theta\|_2^2 $$ The **elastic net penalty** is essential because semiconductor data has: - High collinearity among sensors - Far more features than samples for new processes - Need for interpretable sparse solutions ### 2.2 Fault Detection and Classification (FDC) **Mathematical framework for detection**: Define normal operating region $\Omega$ from training data. For new observation $\mathbf{x}$, compute: $$ d(\mathbf{x}, \Omega) = \text{anomaly score} $$ #### PCA-based Approach (Industry Workhorse) Project data onto principal components. Compute: - **$T^2$ statistic** (variation within model): $$ T^2 = \sum_{i=1}^{k} \frac{t_i^2}{\lambda_i} $$ - **$Q$ statistic / SPE** (variation outside model): $$ Q = \|\mathbf{x} - \hat{\mathbf{x}}\|^2 = \|(I - PP^T)\mathbf{x}\|^2 $$ #### Deep Learning Extensions - **Autoencoders**: Reconstruction error as anomaly score - **Variational Autoencoders**: Probabilistic anomaly detection via ELBO - **One-class Neural Networks**: Learn decision boundary around normal data #### Fault Classification Given fault signatures, this becomes multi-class classification. The mathematical challenge is **class imbalance** — faults are rare. **Solutions**: - SMOTE and variants for synthetic oversampling - Cost-sensitive learning - **Focal loss**: $$ FL(p) = -\alpha(1-p)^\gamma \log(p) $$ ### 2.3 Run-to-Run (R2R) Process Control **The control problem**: Processes drift due to chamber conditioning, consumable wear, and environmental variation. Adjust recipe parameters between wafer runs to maintain targets. #### EWMA Controller (Simplest Form) $$ u_{k+1} = u_k + \lambda \cdot G^{-1}(y_{\text{target}} - y_k) $$ where $G$ is the process gain matrix $\left(\frac{\partial y}{\partial u}\right)$. #### Model Predictive Control Formulation $$ \min_{u_k} J = (y_{\text{target}} - \hat{y}_k)^T Q (y_{\text{target}} - \hat{y}_k) + \Delta u_k^T R \, \Delta u_k $$ **Subject to**: - Process model: $\hat{y} = f(u, \text{state})$ - Constraints: $u_{\min} \leq u \leq u_{\max}$ #### Adaptive/Learning R2R The process model drifts. Use recursive estimation: $$ \hat{\theta}_{k+1} = \hat{\theta}_k + K_k(y_k - \hat{y}_k) $$ where $K$ is the **Kalman gain**, or use online gradient descent for neural network models. ### 2.4 Yield Modeling and Optimization #### Classical Defect-Limited Yield **Poisson model**: $$ Y = e^{-AD} $$ where $A$ = chip area, $D$ = defect density. **Negative binomial** (accounts for clustering): $$ Y = \left(1 + \frac{AD}{\alpha}\right)^{-\alpha} $$ #### ML-based Yield Prediction The yield is a complex function of hundreds of process parameters across all steps. This is a high-dimensional regression problem with: - Interactions between distant process steps - Nonlinear effects - Spatial patterns on wafer **Gradient boosted trees** (XGBoost, LightGBM) excel here due to: - Automatic feature selection - Interaction detection - Robustness to outliers #### Spatial Yield Modeling Uses Gaussian processes with spatial kernels: $$ k(x_i, x_j) = \sigma^2 \exp\left(-\frac{\|x_i - x_j\|^2}{2\ell^2}\right) $$ to capture systematic wafer-level patterns. ## 3. Physics-Informed Machine Learning ### 3.1 The Hybrid Paradigm Pure data-driven models struggle with: - Extrapolation beyond training distribution - Limited data for new processes - Physical implausibility of predictions #### Physics-Informed Neural Networks (PINNs) $$ L = L_{\text{data}} + \lambda_{\text{physics}} L_{\text{physics}} $$ where $L_{\text{physics}}$ enforces physical laws. **Examples in semiconductor context**: | Process | Governing Physics | PDE Constraint | |---------|-------------------|----------------| | Thermal processing | Heat equation | $\frac{\partial T}{\partial t} = \alpha \nabla^2 T$ | | Diffusion/implant | Fick's law | $\frac{\partial C}{\partial t} = D \nabla^2 C$ | | Plasma etch | Boltzmann + fluid | Complex coupled system | | CMP | Preston equation | $\frac{dh}{dt} = k_p \cdot P \cdot V$ | ### 3.2 Computational Lithography #### The Forward Problem Mask pattern $M(\mathbf{r})$ → Optical system $H(\mathbf{k})$ → Aerial image → Resist chemistry → Final pattern $$ I(\mathbf{r}) = \left|\mathcal{F}^{-1}\{H(\mathbf{k}) \cdot \mathcal{F}\{M(\mathbf{r})\}\}\right|^2 $$ #### Inverse Lithography / OPC Given target pattern, find mask that produces it. This is a **non-convex optimization**: $$ \min_M \|P_{\text{target}} - P(M)\|^2 + R(M) $$ #### ML Acceleration - **CNNs** learn the forward mapping (1000× faster than rigorous simulation) - **GANs** for mask synthesis - **Differentiable lithography simulators** for end-to-end optimization ## 4. Time Series and Sequence Modeling ### 4.1 Equipment Health Monitoring #### Remaining Useful Life (RUL) Prediction Model equipment degradation as a stochastic process: $$ S(t) = S_0 + \int_0^t g(S(\tau), u(\tau)) \, d\tau + \sigma W(t) $$ #### Deep Learning Approaches - **LSTM/GRU**: Capture long-range temporal dependencies in sensor streams - **Temporal Convolutional Networks**: Dilated convolutions for efficient long sequences - **Transformers**: Attention over maintenance history and operating conditions ### 4.2 Trace Data Analysis Each wafer run produces high-frequency sensor traces (temperature, pressure, RF power, etc.). #### Feature Extraction Approaches - Statistical moments (mean, variance, skewness) - Frequency domain (FFT coefficients) - Wavelet decomposition - Learned features via 1D CNNs or autoencoders #### Dynamic Time Warping (DTW) For trace comparison: $$ DTW(X, Y) = \min_{\pi} \sum_{(i,j) \in \pi} d(x_i, y_j) $$ ## 5. Bayesian Optimization for Process Development ### 5.1 The Experimental Challenge New process development requires finding optimal recipe settings with minimal experiments (each wafer costs \$1000+, time is critical). #### Bayesian Optimization Framework 1. Fit Gaussian Process surrogate to observations 2. Compute acquisition function 3. Query next point: $x_{\text{next}} = \arg\max_x \alpha(x)$ 4. Repeat #### Acquisition Functions - **Expected Improvement**: $$ EI(x) = \mathbb{E}[\max(f(x) - f^*, 0)] $$ - **Knowledge Gradient**: Value of information from observing at $x$ - **Upper Confidence Bound**: $$ UCB(x) = \mu(x) + \kappa\sigma(x) $$ ### 5.2 High-Dimensional Extensions Standard BO struggles beyond ~20 dimensions. Semiconductor recipes have 50-200 parameters. **Solutions**: - **Random embeddings** (REMBO) - **Additive structure**: $f(\mathbf{x}) = \sum_i f_i(x_i)$ - **Trust region methods** (TuRBO) - **Neural network surrogates** ## 6. Causal Inference for Root Cause Analysis ### 6.1 The Problem **Correlation ≠ Causation**. When yield drops, engineers need to find the *cause*, not just correlated variables. #### Granger Causality (Time Series) $X$ Granger-causes $Y$ if past $X$ improves prediction of $Y$ beyond past $Y$ alone: $$ \sigma^2(Y_t | Y_{ \sigma^2(Y_t | Y_{

machine learning for fab,production

Apply ML to optimize recipes predict defects or improve yield.

machine learning force fields, chemistry ai

Learn forces from quantum calculations.

machine learning ocd, metrology

Use ML to interpret optical spectra.

machine learning ocd, ml-ocd, metrology

Use neural networks to interpret scatterometry.

machine model (mm),machine model,mm,reliability

ESD from charged machine.

macro search space, neural architecture search

Macro search spaces in NAS define entire network topologies rather than repeatable cells or modules.

mae pre-training, mae, computer vision

Mask large portions and reconstruct.

magic number detection, code ai

Find unexplained constants.

magnetic field imaging, failure analysis advanced

Magnetic field imaging detects current flow through inductively coupled sensors revealing shorts and current paths.

magnitude pruning, model optimization

Magnitude pruning removes weights with smallest absolute values based on importance threshold.

magnitude pruning,model optimization

Remove weights with smallest magnitudes.

magnn, magnn, graph neural networks

Metapath Aggregated Graph Neural Network learns from heterogeneous graphs using metapath-based neighbor encoding.

maieutic prompting,reasoning

Use model-generated explanations recursively.

main effect, quality & reliability

Main effects represent average impact of factors ignoring other factors.

main effect,doe

Impact of single factor on response.

main etch,etch

Bulk material removal step.

mainframe,production

Main body of cluster tool housing transfer chamber and modules.

maintainability index, code ai

Score code maintainability.

maintainability, manufacturing operations

Maintainability is ease and speed of performing maintenance activities.

maintenance prevention, manufacturing operations

Maintenance prevention designs equipment for reliability and ease of maintenance.

maintenance time tracking, production

Record time spent on maintenance.

maintenance window, manufacturing operations

Maintenance windows schedule downtime minimizing production impact.

make-a-video, multimodal ai

Make-A-Video generates videos from text using spatiotemporal diffusion models.

mamba architecture, llm architecture

Mamba uses selective state space models for efficient sequence modeling without attention.

mamba,foundation model

State-space model architecture efficient for long sequences.

mamba,s4,state space model,ssm

Mamba/S4 are state-space models that replace full attention with more efficient recurrence-style updates, aiming for faster long-sequence processing.

maml (model-agnostic meta-learning),maml,model-agnostic meta-learning,few-shot learning

Meta-learning method that finds good initialization for fast adaptation.

mapping network, generative models

Transform latent code to style.

marching cubes, multimodal ai

Marching cubes extracts mesh surfaces from volumetric data or implicit functions.

marked point process, time series models

Marked point processes attach additional information marks to events capturing both occurrence times and event attributes.

markov chain monte carlo (mcmc),markov chain monte carlo,mcmc,statistics

Sample from posterior distributions.

markov model for reliability, reliability

State-based reliability model.

mart, mart, ai safety

Focus on misclassified examples.

marvin,ai functions,python

Marvin provides AI functions in Python. Natural language to structured data.

mask blur, generative models

Soften mask edges.

mask repair, lithography

Fix defects on photomasks.

masked image modeling, mim, computer vision

Predict masked patches.

masked language model,mlm,bert

MLM pretraining masks random tokens, model predicts them. BERT-style. Bidirectional understanding.

masked language modeling (vision),masked language modeling,vision,multimodal ai

Predict masked words given image.

masked language modeling with vision, multimodal ai

MLM conditioned on images.

masked language modeling, mlm, foundation model

BERT-style masked token prediction.