← Back to AI Factory Chat

AI Factory Glossary

64 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 1 of 2 (64 entries)

obfuscated gradients, ai safety

Defenses that hide gradients.

obfuscation attacks, ai safety

Hide malicious intent through encoding.

obirch (optical beam induced resistance change),obirch,optical beam induced resistance change,failure analysis

Laser-based defect localization.

obirch, obirch, failure analysis advanced

Optical Beam Induced Resistance Change detects resistive defects by monitoring device resistance while scanning with a laser beam.

object-centric nerf, multimodal ai

Object-centric NeRF decomposes scenes into separate object representations.

observation space, ai agents

Observation space defines possible states or information agents can perceive from environments.

occupancy network, multimodal ai

Occupancy networks predict whether 3D points are inside or outside objects.

ocr,document ai,pdf

OCR extracts text from images/PDFs. Combine with LLM for document understanding, Q&A over scanned docs.

ode-rnn, ode-rnn, neural architecture

Combine ODEs with RNN for continuous-time modeling.

ofa elastic, ofa, neural architecture search

Once-for-All elastic kernels support variable kernel sizes within single supernet enabling fine-grained architecture specialization.

ohem, ohem, advanced training

Online Hard Example Mining selects high-loss examples within mini-batches for gradient computation improving training efficiency.

on-device ai,edge ai

Run models locally on user devices.

on-device model, llm architecture

On-device models run locally on user hardware without cloud connectivity.

on-device training, edge ai

Train models directly on edge devices.

on-site solar, environmental & sustainability

On-site solar photovoltaic systems generate renewable electricity directly at manufacturing facilities.

once-for-all networks, neural architecture

Train supernet once search many times.

once-for-all, neural architecture search

Once-for-all networks train a single supernet that supports diverse architectural configurations enabling efficient deployment-specific specialization.

one-class svm ts, time series models

One-class SVM for time series learns boundaries of normal behavior detecting anomalies as deviations from learned support.

one-shot nas, neural architecture

Single training run to evaluate architectures.

one-shot pruning, model optimization

One-shot pruning removes all unnecessary parameters in single step before retraining.

one-shot pruning,model optimization

Prune once without retraining.

one-shot weight sharing, neural architecture search

One-shot weight sharing trains single supernet where subnetworks share weights for efficient architecture evaluation.

online distillation, model compression

Distill during training of teacher.

online learning,machine learning

Update model as new data arrives.

onnx (open neural network exchange),onnx,open neural network exchange,deployment

Format for exchanging models between frameworks.

onnx format, onnx, model optimization

ONNX provides open standard for representing deep learning models enabling framework interoperability.

onnx runtime, onnx, model optimization

ONNX Runtime provides cross-platform inference optimization for ONNX format models.

opc model calibration, opc, lithography

Fit OPC model to test patterns.

opc model validation, opc, lithography

Verify OPC model on new patterns.

opc, optical proximity correction, opc modeling, lithography opc, mask correction, proximity effects, opc optimization, rule-based opc, model-based opc

# Optical Proximity Correction (OPC): Mathematical Modeling ## 1. The Physical Problem When projecting mask patterns onto a silicon wafer using light (typically 193nm DUV or 13.5nm EUV), several phenomena distort the image: - **Diffraction**: Light bending around features near or below the wavelength - **Interference**: Constructive/destructive wave interactions - **Optical aberrations**: Lens imperfections - **Resist effects**: Photochemical behavior during exposure and development - **Etch loading**: Pattern-density-dependent etch rates **OPC pre-distorts the mask** so that after all these effects, the printed pattern matches the design intent. ### Key Parameters | Parameter | Typical Value | Description | |-----------|---------------|-------------| | $\lambda$ | 193 nm (DUV), 13.5 nm (EUV) | Exposure wavelength | | $NA$ | 0.33 - 1.35 | Numerical aperture | | $k_1$ | 0.25 - 0.40 | Process factor | | Resolution | $\frac{k_1 \lambda}{NA}$ | Minimum feature size | ## 2. Hopkins Imaging Model The foundational mathematical framework for **partially coherent lithographic imaging** comes from Hopkins' theory (1953). ### Aerial Image Intensity The aerial image intensity at position $\mathbf{r} = (x, y)$ is given by: $$ I(\mathbf{r}) = \iiint\!\!\!\iint TCC(\mathbf{f}_1, \mathbf{f}_2) \cdot M(\mathbf{f}_1) \cdot M^*(\mathbf{f}_2) \cdot e^{2\pi i (\mathbf{f}_1 - \mathbf{f}_2) \cdot \mathbf{r}} \, d\mathbf{f}_1 \, d\mathbf{f}_2 $$ Where: - $M(\mathbf{f})$ — Fourier transform of the mask transmission function - $M^*(\mathbf{f})$ — Complex conjugate of $M(\mathbf{f})$ - $TCC$ — Transmission Cross Coefficient - $\mathbf{f} = (f_x, f_y)$ — Spatial frequency coordinates ### Transmission Cross Coefficient (TCC) The TCC encodes the optical system characteristics: $$ TCC(\mathbf{f}_1, \mathbf{f}_2) = \iint J(\mathbf{f}) \cdot H(\mathbf{f} + \mathbf{f}_1) \cdot H^*(\mathbf{f} + \mathbf{f}_2) \, d\mathbf{f} $$ Where: - $J(\mathbf{f})$ — Source (illumination) intensity distribution (mutual intensity at mask) - $H(\mathbf{f})$ — Pupil function of the projection lens - $H^*(\mathbf{f})$ — Complex conjugate of pupil function ### Pupil Function For an ideal circular aperture: $$ H(\mathbf{f}) = \begin{cases} 1 & \text{if } |\mathbf{f}| \leq \frac{NA}{\lambda} \\ 0 & \text{otherwise} \end{cases} $$ With aberrations included: $$ H(\mathbf{f}) = P(\mathbf{f}) \cdot e^{i \cdot W(\mathbf{f})} $$ Where $W(\mathbf{f})$ is the wavefront aberration function (Zernike polynomial expansion). ## 3. SOCS Decomposition ### Sum of Coherent Systems To make computation tractable, the TCC (a Hermitian matrix when discretized) is decomposed via **eigenvalue decomposition**: $$ TCC(\mathbf{f}_1, \mathbf{f}_2) = \sum_{n=1}^{N} \lambda_n \cdot \phi_n(\mathbf{f}_1) \cdot \phi_n^*(\mathbf{f}_2) $$ Where: - $\lambda_n$ — Eigenvalues (sorted in descending order) - $\phi_n(\mathbf{f})$ — Eigenvectors (orthonormal kernels) ### Image Computation This allows the image to be computed as a **sum of coherent images**: $$ I(\mathbf{r}) = \sum_{n=1}^{N} \lambda_n \left| \mathcal{F}^{-1}\{\phi_n \cdot M\} \right|^2 $$ Or equivalently: $$ I(\mathbf{r}) = \sum_{n=1}^{N} \lambda_n \left| I_n(\mathbf{r}) \right|^2 $$ Where each coherent image is: $$ I_n(\mathbf{r}) = \mathcal{F}^{-1}\{\phi_n(\mathbf{f}) \cdot M(\mathbf{f})\} $$ ### Practical Considerations - **Eigenvalue decay**: $\lambda_n$ decay rapidly; typically only 10–50 terms needed - **Speedup**: Converts one $O(N^4)$ partially coherent calculation into $\sim$20 $O(N^2 \log N)$ FFT operations - **Accuracy**: Trade-off between number of terms and simulation accuracy ## 4. OPC Problem Formulation ### Forward Problem Given mask $M(\mathbf{r})$, predict wafer pattern $W(\mathbf{r})$: $$ M \xrightarrow{\text{optics}} I(\mathbf{r}) \xrightarrow{\text{resist}} R(\mathbf{r}) \xrightarrow{\text{etch}} W(\mathbf{r}) $$ **Mathematical chain:** 1. **Optical Model**: $I = \mathcal{O}(M)$ — Hopkins/SOCS imaging 2. **Resist Model**: $R = \mathcal{R}(I)$ — Threshold or convolution model 3. **Etch Model**: $W = \mathcal{E}(R)$ — Etch bias and loading ### Inverse Problem (OPC) Given target pattern $T(\mathbf{r})$, find mask $M(\mathbf{r})$ such that: $$ W(M) \approx T $$ **This is fundamentally ill-posed:** - Non-unique: Many masks could produce similar results - Nonlinear: The imaging equation is quadratic in mask transmission - Constrained: Mask must be manufacturable ## 5. Edge Placement Error Minimization ### Objective Function The standard OPC objective minimizes **Edge Placement Error (EPE)**: $$ \min_M \mathcal{L}(M) = \sum_{i=1}^{N_{\text{edges}}} w_i \cdot \text{EPE}_i^2 $$ Where: $$ \text{EPE}_i = x_i^{\text{printed}} - x_i^{\text{target}} $$ - $x_i^{\text{printed}}$ — Actual edge position after lithography - $x_i^{\text{target}}$ — Desired edge position from design - $w_i$ — Weight for edge $i$ (can prioritize critical features) ### Constraints Subject to mask manufacturability: - **Minimum feature size**: $\text{CD}_{\text{mask}} \geq \text{CD}_{\min}$ - **Minimum spacing**: $\text{Space}_{\text{mask}} \geq \text{Space}_{\min}$ - **Maximum jog**: Limit on edge fragmentation complexity - **MEEF constraint**: Mask Error Enhancement Factor within spec ### Iterative Edge-Based OPC Algorithm The classic algorithm moves mask edges iteratively: $$ \Delta x^{(n+1)} = \Delta x^{(n)} - \alpha \cdot \text{EPE}^{(n)} $$ Where: - $\Delta x$ — Edge movement from original position - $\alpha$ — Damping factor (typically 0.3–0.8) - $n$ — Iteration number **Convergence criterion:** $$ \max_i |\text{EPE}_i| < \epsilon \quad \text{or} \quad n > n_{\max} $$ ### Gradient Computation Using the chain rule: $$ \frac{\partial \text{EPE}}{\partial m} = \frac{\partial \text{EPE}}{\partial I} \cdot \frac{\partial I}{\partial m} $$ Where $m$ represents mask parameters (edge positions, segment lengths). At a contour position where $I = I_{th}$: $$ \frac{\partial x_{\text{edge}}}{\partial m} = -\frac{1}{|\nabla I|} \cdot \frac{\partial I}{\partial m} $$ The **image log-slope (ILS)** is a key metric: $$ \text{ILS} = \frac{1}{I} \left| \frac{\partial I}{\partial x} \right|_{I = I_{th}} $$ Higher ILS → better process latitude, lower EPE sensitivity. ## 6. Resist Modeling ### Threshold Model (Simplest) The resist develops where intensity exceeds threshold: $$ R(\mathbf{r}) = \begin{cases} 1 & \text{if } I(\mathbf{r}) > I_{th} \\ 0 & \text{otherwise} \end{cases} $$ The printed contour is the $I_{th}$ isoline. ### Variable Threshold Resist (VTR) The threshold varies with local context: $$ I_{th}(\mathbf{r}) = I_{th,0} + \beta_1 \cdot \bar{I}_{\text{local}} + \beta_2 \cdot \nabla^2 I + \beta_3 \cdot (\nabla I)^2 + \ldots $$ Where: - $I_{th,0}$ — Base threshold - $\bar{I}_{\text{local}}$ — Local average intensity (density effect) - $\nabla^2 I$ — Laplacian (curvature effect) - $\beta_i$ — Fitted coefficients ### Compact Phenomenological Models For OPC speed, empirical models are used instead of physics-based resist simulation: $$ R(\mathbf{r}) = \sum_{j=1}^{N_k} w_j \cdot \left( K_j \otimes g_j(I) \right) $$ Where: - $K_j$ — Convolution kernels (typically Gaussians): $$K_j(\mathbf{r}) = \frac{1}{2\pi\sigma_j^2} \exp\left( -\frac{|\mathbf{r}|^2}{2\sigma_j^2} \right)$$ - $g_j(I)$ — Nonlinear functions: $I$, $I^2$, $\log(I)$, $\sqrt{I}$, etc. - $w_j$ — Fitted weights - $\otimes$ — Convolution operator ### Physical Interpretation | Kernel Width | Physical Effect | |--------------|-----------------| | Small $\sigma$ | Optical proximity effects | | Medium $\sigma$ | Acid/base diffusion in resist | | Large $\sigma$ | Long-range loading effects | ### Model Calibration Parameters are fitted to wafer measurements: $$ \min_{\theta} \sum_{k=1}^{N_{\text{test}}} \left( \text{CD}_k^{\text{measured}} - \text{CD}_k^{\text{model}}(\theta) \right)^2 + \lambda \|\theta\|^2 $$ Where: - $\theta = \{w_j, \sigma_j, \beta_i, \ldots\}$ — Model parameters - $\lambda \|\theta\|^2$ — Regularization term - Test structures: Lines, spaces, contacts, line-ends at various pitches/densities ## 7. Inverse Lithography Technology ### Full Optimization Formulation ILT treats the mask as a continuous optimization variable (pixelated): $$ \min_{M} \mathcal{L}(M) = \| W(M) - T \|^2 + \lambda \cdot \mathcal{R}(M) $$ Where: - $W(M)$ — Predicted wafer pattern - $T$ — Target pattern - $\mathcal{R}(M)$ — Regularization for manufacturability - $\lambda$ — Regularization weight ### Cost Function Components **Pattern Fidelity Term:** $$ \mathcal{L}_{\text{fidelity}} = \int \left( W(\mathbf{r}) - T(\mathbf{r}) \right)^2 d\mathbf{r} $$ Or in discrete form: $$ \mathcal{L}_{\text{fidelity}} = \sum_{\mathbf{r} \in \text{grid}} \left( W(\mathbf{r}) - T(\mathbf{r}) \right)^2 $$ ### Regularization Terms **Total Variation** (promotes piecewise constant, sharp edges): $$ \mathcal{R}_{TV}(M) = \int |\nabla M| \, d\mathbf{r} = \int \sqrt{\left(\frac{\partial M}{\partial x}\right)^2 + \left(\frac{\partial M}{\partial y}\right)^2} \, d\mathbf{r} $$ **Curvature Penalty** (promotes smooth contours): $$ \mathcal{R}_{\kappa}(M) = \oint_{\partial M} \kappa^2 \, ds $$ Where $\kappa$ is the local curvature of the mask boundary. **Minimum Feature Size** (MRC - Mask Rule Check): $$ \mathcal{R}_{MRC}(M) = \sum_{\text{violations}} \text{penalty}(\text{violation severity}) $$ **Sigmoid Regularization** (push mask toward binary): $$ \mathcal{R}_{\text{binary}}(M) = \int M(1-M) \, d\mathbf{r} $$ ### Level Set Formulation Represent the mask boundary implicitly via level set function $\phi(\mathbf{r})$: - Inside chrome: $\phi(\mathbf{r}) < 0$ - Outside chrome: $\phi(\mathbf{r}) > 0$ - Boundary: $\phi(\mathbf{r}) = 0$ **Evolution equation:** $$ \frac{\partial \phi}{\partial t} = -v \cdot |\nabla \phi| $$ Where velocity $v$ is derived from the cost function gradient: $$ v = -\frac{\delta \mathcal{L}}{\delta \phi} $$ **Advantages:** - Naturally handles topological changes (features splitting/merging) - Implicit curvature regularization available - Well-studied numerical methods ### Optimization Algorithms Since the problem is **non-convex**, various methods are used: 1. **Gradient Descent with Momentum:** $$ M^{(n+1)} = M^{(n)} - \eta \nabla_M \mathcal{L} + \mu \left( M^{(n)} - M^{(n-1)} \right) $$ 2. **Conjugate Gradient:** $$ d^{(n+1)} = -\nabla \mathcal{L}^{(n+1)} + \beta^{(n)} d^{(n)} $$ 3. **Adam Optimizer:** $$ m_t = \beta_1 m_{t-1} + (1-\beta_1) g_t $$ $$ v_t = \beta_2 v_{t-1} + (1-\beta_2) g_t^2 $$ $$ M_{t+1} = M_t - \eta \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon} $$ 4. **Genetic Algorithms** (for discrete/combinatorial aspects) 5. **Simulated Annealing** (for escaping local minima) ## 8. Source-Mask Optimization ### Joint Optimization SMO optimizes both illumination source $S$ and mask $M$ simultaneously: $$ \min_{S, M} \sum_{j \in \text{PW}} w_j \cdot \| W(S, M, \text{condition}_j) - T \|^2 $$ ### Source Parameterization **Pixelated Source:** $$ S = \{s_{ij}\} \quad \text{where } s_{ij} \in [0, 1] $$ Each pixel in the pupil plane is a free variable. **Parametric Source:** - Annular: $(R_{\text{inner}}, R_{\text{outer}})$ - Quadrupole: $(R, \theta, \sigma)$ - Freeform: Spline or Zernike coefficients ### Alternating Optimization **Algorithm:** ``` Initialize: S⁰, M⁰ for k = 1 to max_iter: # Step 1: Fix S, optimize M (standard OPC) M^k = argmin_M L(S^(k-1), M) # Step 2: Fix M, optimize S S^k = argmin_S L(S, M^k) # Check convergence if |L^k - L^(k-1)| < tolerance: break ``` **Note:** Step 2 is often convex in $S$ when $M$ is fixed (linear in source pixels for intensity-based metrics). ### Mathematical Form for Source Optimization When mask is fixed, the image is linear in source: $$ I(\mathbf{r}; S) = \sum_{ij} s_{ij} \cdot I_{ij}(\mathbf{r}) $$ Where $I_{ij}$ is the image contribution from source pixel $(i,j)$. This makes source optimization a **quadratic program** (convex if cost is convex in $I$). ## 9. Process Window Optimization ### Multi-Condition Optimization Real manufacturing has variations. Robust OPC optimizes across a **process window (PW)**: $$ \min_M \sum_{j \in \text{PW}} w_j \cdot \mathcal{L}(M, \text{condition}_j) $$ ### Process Window Dimensions | Dimension | Typical Range | Effect | |-----------|---------------|--------| | Focus | $\pm 50$ nm | Defocus blur | | Dose | $\pm 3\%$ | Threshold shift | | Mask CD | $\pm 2$ nm | Feature size bias | | Aberrations | Per-lens | Pattern distortion | ### Worst-Case (Minimax) Formulation $$ \min_M \max_{j \in \text{PW}} \text{EPE}_j(M) $$ This is more conservative but ensures robustness. ### Soft Constraints via Barrier Functions $$ \mathcal{L}_{PW}(M) = \sum_j w_j \cdot \text{EPE}_j^2 + \mu \sum_j \sum_i \max(0, |\text{EPE}_{ij}| - \text{spec})^2 $$ ### Process Window Metrics **Common Process Window (CPW):** $$ \text{CPW} = \text{Focus Range} \times \text{Dose Range} $$ Where all specs are simultaneously met. **Exposure Latitude (EL):** $$ \text{EL} = \frac{\Delta \text{Dose}}{\text{Dose}_{\text{nom}}} \times 100\% $$ **Depth of Focus (DOF):** $$ \text{DOF} = \text{Focus range where } |\text{EPE}| < \text{spec} $$ ## 10. Stochastic Effects (EUV) At EUV wavelengths (13.5 nm), **photon counts are low** and shot noise becomes significant. ### Photon Statistics Number of photons per pixel follows **Poisson distribution**: $$ P(n | \bar{n}) = \frac{\bar{n}^n e^{-\bar{n}}}{n!} $$ Where: $$ \bar{n} = \frac{E \cdot A \cdot \eta}{\frac{hc}{\lambda}} $$ - $E$ — Exposure dose (mJ/cm²) - $A$ — Pixel area - $\eta$ — Quantum efficiency - $\frac{hc}{\lambda}$ — Photon energy ### Signal-to-Noise Ratio $$ \text{SNR} = \frac{\bar{n}}{\sqrt{\bar{n}}} = \sqrt{\bar{n}} $$ For reliable imaging, need $\text{SNR} > 5$, requiring $\bar{n} > 25$ photons/pixel. ### Line Edge Roughness (LER) Random edge fluctuations characterized by: - **3σ LER**: $3 \times \text{standard deviation of edge position}$ - **Correlation length** $\xi$: Spatial extent of roughness **Power Spectral Density:** $$ \text{PSD}(f) = \frac{2\sigma^2 \xi}{1 + (2\pi f \xi)^{2\alpha}} $$ Where $\alpha$ is the roughness exponent (typically 0.5–1.0). ### Stochastic Defect Probability Probability of a stochastic failure (missing contact, bridging): $$ P_{\text{fail}} = 1 - \prod_{\text{features}} (1 - p_i) $$ For rare events, approximately: $$ P_{\text{fail}} \approx \sum_i p_i $$ ### Stochastic-Aware OPC Objective $$ \min_M \mathbb{E}[\text{EPE}^2] + \lambda_1 \cdot \text{Var}(\text{EPE}) + \lambda_2 \cdot P_{\text{fail}} $$ ### Monte Carlo Simulation For stochastic modeling: 1. Sample photon arrival: $n_{ij} \sim \text{Poisson}(\bar{n}_{ij})$ 2. Simulate acid generation: Proportional to absorbed photons 3. Simulate diffusion: Random walk or stochastic PDE 4. Simulate development: Threshold with noise 5. Repeat $N$ times, compute statistics ## 11. Machine Learning Approaches ### Neural Network Forward Models Train networks to approximate expensive simulations: $$ \hat{I} = f_\theta(M) \approx I_{\text{optical}}(M) $$ **Architectures:** - **CNN**: Convolutional neural networks for local pattern effects - **U-Net**: Encoder-decoder for image-to-image translation - **GAN**: Generative adversarial networks for realistic image generation **Training:** $$ \min_\theta \sum_{k} \| f_\theta(M_k) - I_k^{\text{simulation}} \|^2 $$ ### End-to-End ILT with Deep Learning Directly predict corrected masks: $$ \hat{M}_{\text{OPC}} = G_\theta(T) $$ **Training data:** Pairs $(T, M_{\text{optimal}})$ from conventional ILT. **Loss function:** $$ \mathcal{L} = \| W(G_\theta(T)) - T \|^2 + \lambda \| G_\theta(T) - M_{\text{ref}} \|^2 $$ ### Hybrid Approaches Combine ML speed with physics accuracy: 1. **ML Initialization**: $M^{(0)} = G_\theta(T)$ 2. **Physics Refinement**: Run conventional OPC starting from $M^{(0)}$ **Benefits:** - Faster convergence (good starting point) - Physics ensures accuracy - ML handles global pattern context ### Neural Network Architectures for OPC | Architecture | Use Case | Advantages | |--------------|----------|------------| | CNN | Local correction prediction | Fast inference | | U-Net | Full mask prediction | Multi-scale features | | GAN | Realistic mask generation | Sharp boundaries | | Transformer | Global context | Long-range dependencies | | Physics-Informed NN | Constrained prediction | Respects physics | ## 12. Computational Complexity ### Scale of Full-Chip OPC - **Features per chip**: $10^9 - 10^{10}$ - **Evaluation points**: $\sim 10^{12}$ (multiple points per feature) - **Iterations**: 10–50 per feature - **Optical simulations**: $O(N \log N)$ per FFT ### Complexity Analysis **Single feature OPC:** $$ T_{\text{feature}} = O(N_{\text{iter}} \times N_{\text{SOCS}} \times N_{\text{grid}} \log N_{\text{grid}}) $$ **Full chip:** $$ T_{\text{chip}} = O(N_{\text{features}} \times T_{\text{feature}}) $$ **Result:** Hours to days on large compute clusters. ### Acceleration Strategies **Hierarchical Processing:** - Identify repeated cells (memory arrays, standard cells) - Compute OPC once, reuse for identical instances - Speedup: $10\times - 100\times$ for regular designs **GPU Parallelization:** - FFTs parallelize well on GPUs - Convolutions map to tensor operations - Multiple features processed simultaneously - Speedup: $10\times - 50\times$ **Approximate Models:** - **Kernel-based**: Pre-compute influence functions - **Variable resolution**: Fine grid only near edges - **Neural surrogates**: Replace simulation with inference **Domain Decomposition:** - Divide chip into tiles - Process tiles in parallel - Handle tile boundaries with overlap or iteration ## 13. Mathematical Toolkit Summary | Domain | Techniques | |--------|-----------| | **Optics** | Fourier transforms, Hopkins theory, SOCS decomposition, Abbe imaging | | **Optimization** | Gradient descent, conjugate gradient, level sets, genetic algorithms, simulated annealing | | **Linear Algebra** | Eigendecomposition (TCC), sparse matrices, SVD, matrix factorization | | **PDEs** | Diffusion equations (resist), level set evolution, Hamilton-Jacobi | | **Statistics** | Poisson processes, Monte Carlo, stochastic simulation, Bayesian inference | | **Machine Learning** | CNNs, GANs, U-Net, transformers, physics-informed neural networks | | **Computational Geometry** | Polygon operations, fragmentation, contour extraction, Boolean operations | | **Numerical Methods** | FFT, finite differences, quadrature, interpolation | ## Equations Quick Reference ### Hopkins Imaging $$ I(\mathbf{r}) = \iiint\!\!\!\iint TCC(\mathbf{f}_1, \mathbf{f}_2) \cdot M(\mathbf{f}_1) \cdot M^*(\mathbf{f}_2) \cdot e^{2\pi i (\mathbf{f}_1 - \mathbf{f}_2) \cdot \mathbf{r}} \, d\mathbf{f}_1 \, d\mathbf{f}_2 $$ ### SOCS Image $$ I(\mathbf{r}) = \sum_{n=1}^{N} \lambda_n \left| \mathcal{F}^{-1}\{\phi_n \cdot M\} \right|^2 $$ ### EPE Minimization $$ \min_M \sum_{i} w_i \left( x_i^{\text{printed}} - x_i^{\text{target}} \right)^2 $$ ### ILT Cost Function $$ \min_{M} \| W(M) - T \|^2 + \lambda \cdot \mathcal{R}(M) $$ ### Level Set Evolution $$ \frac{\partial \phi}{\partial t} = -v \cdot |\nabla \phi| $$ ### Poisson Photon Statistics $$ P(n | \bar{n}) = \frac{\bar{n}^n e^{-\bar{n}}}{n!} $$

open source,oss,local model,llama

Open-source LLMs (e.g. Llama-style) let you run models locally or on your own servers, customize them via fine-tuning, and control data privacy.

open-domain dialogue, dialogue

Free-form conversations.

open-set domain adaptation, domain adaptation

Target has unknown classes.

open-source model, llm architecture

Open-source models have publicly available weights and training details.

openai embedding,ada,text

OpenAI text-embedding-ada-002. Easy to use. Decent quality.

openai sdk,python,typescript

OpenAI SDK available in Python and TypeScript. Official client. Streaming, tools.

openvino, model optimization

OpenVINO optimizes models for Intel hardware using graph optimization and kernel libraries.

operation primitives, neural architecture search

Operation primitives are basic computational blocks like convolutions pooling and skip connections used in NAS.

operational carbon, environmental & sustainability

Operational carbon emissions occur during product use phase from energy consumption.

operator fusion, model optimization

Operator fusion combines multiple operations into single kernel reducing memory traffic and latency.

optical emission fa, failure analysis advanced

Optical emission failure analysis detects photon emission from hot carriers or breakdown events localizing defects.

optical flow estimation, multimodal ai

Optical flow estimation computes pixel motion between frames for interpolation and stabilization.

optimization and computational methods, computational lithography, inverse lithography, ilt, opc optimization, source mask optimization, smo, gradient descent, adjoint method, machine learning lithography

# Semiconductor Manufacturing Process Optimization and Computational Mathematical Modeling ## 1. The Fundamental Challenge Modern semiconductor manufacturing involves **500–1000+ sequential process steps** to produce chips with billions of transistors at nanometer scales. Each step has dozens of tunable parameters, creating an optimization challenge that is: - **Extraordinarily high-dimensional** — hundreds to thousands of parameters - **Highly nonlinear** — complex interactions between process variables - **Expensive to explore experimentally** — each wafer costs thousands of dollars - **Multi-objective** — balancing yield, throughput, cost, and performance **Key Manufacturing Processes:** 1. **Lithography** — Pattern transfer using light/EUV exposure 2. **Etching** — Material removal (wet/dry plasma etching) 3. **Deposition** — Material addition (CVD, PVD, ALD) 4. **Ion Implantation** — Dopant introduction 5. **Thermal Processing** — Diffusion, annealing, oxidation 6. **Chemical-Mechanical Planarization (CMP)** — Surface planarization ## 2. The Mathematical Foundation ### 2.1 Governing Physics: Partial Differential Equations Nearly all semiconductor processes are governed by systems of coupled PDEs. #### Heat Transfer (Thermal Processing, Laser Annealing) $$ \rho c_p \frac{\partial T}{\partial t} = \nabla \cdot (k \nabla T) + Q $$ Where: - $\rho$ — density ($\text{kg/m}^3$) - $c_p$ — specific heat capacity ($\text{J/(kg}\cdot\text{K)}$) - $T$ — temperature ($\text{K}$) - $k$ — thermal conductivity ($\text{W/(m}\cdot\text{K)}$) - $Q$ — volumetric heat source ($\text{W/m}^3$) #### Mass Diffusion (Dopant Redistribution, Oxidation) $$ \frac{\partial C}{\partial t} = \nabla \cdot \left( D(C, T) \nabla C \right) + R(C) $$ Where: - $C$ — concentration ($\text{atoms/cm}^3$) - $D(C, T)$ — diffusion coefficient (concentration and temperature dependent) - $R(C)$ — reaction/generation term **Common Diffusion Models:** - **Constant source diffusion:** $$C(x, t) = C_s \cdot \text{erfc}\left( \frac{x}{2\sqrt{Dt}} \right)$$ - **Limited source diffusion:** $$C(x, t) = \frac{Q}{\sqrt{\pi D t}} \exp\left( -\frac{x^2}{4Dt} \right)$$ #### Fluid Dynamics (CVD, Etching Reactors) **Navier-Stokes Equations:** $$ \rho \left( \frac{\partial \mathbf{v}}{\partial t} + \mathbf{v} \cdot \nabla \mathbf{v} \right) = -\nabla p + \mu \nabla^2 \mathbf{v} + \mathbf{f} $$ **Continuity Equation:** $$ \frac{\partial \rho}{\partial t} + \nabla \cdot (\rho \mathbf{v}) = 0 $$ **Species Transport:** $$ \frac{\partial c_i}{\partial t} + \mathbf{v} \cdot \nabla c_i = D_i \nabla^2 c_i + \sum_j R_{ij} $$ Where: - $\mathbf{v}$ — velocity field ($\text{m/s}$) - $p$ — pressure ($\text{Pa}$) - $\mu$ — dynamic viscosity ($\text{Pa}\cdot\text{s}$) - $c_i$ — species concentration - $R_{ij}$ — reaction rates between species #### Electromagnetics (Lithography, Plasma Physics) **Maxwell's Equations:** $$ \nabla \times \mathbf{E} = -\frac{\partial \mathbf{B}}{\partial t} $$ $$ \nabla \times \mathbf{H} = \mathbf{J} + \frac{\partial \mathbf{D}}{\partial t} $$ **Hopkins Formulation for Partially Coherent Imaging:** $$ I(\mathbf{x}) = \iint J(\mathbf{f}_1, \mathbf{f}_2) \tilde{O}(\mathbf{f}_1) \tilde{O}^*(\mathbf{f}_2) e^{2\pi i (\mathbf{f}_1 - \mathbf{f}_2) \cdot \mathbf{x}} \, d\mathbf{f}_1 \, d\mathbf{f}_2 $$ Where: - $J(\mathbf{f}_1, \mathbf{f}_2)$ — mutual intensity (transmission cross-coefficient) - $\tilde{O}(\mathbf{f})$ — Fourier transform of mask transmission function ### 2.2 Surface Evolution and Topography Etching and deposition cause surfaces to evolve over time. The **Level Set Method** elegantly handles this: $$ \frac{\partial \phi}{\partial t} + V_n |\nabla \phi| = 0 $$ Where: - $\phi$ — level set function (surface defined by $\phi = 0$) - $V_n$ — normal velocity determined by local etch/deposition rates **Advantages:** - Naturally handles topological changes (void formation, surface merging) - No need for explicit surface tracking - Handles complex geometries **Etch Rate Models:** - **Ion-enhanced etching:** $$V_n = k_0 + k_1 \Gamma_{\text{ion}} + k_2 \Gamma_{\text{neutral}}$$ - **Visibility-dependent deposition:** $$V_n = V_0 \cdot \Omega(\mathbf{x})$$ where $\Omega(\mathbf{x})$ is the solid angle visible from point $\mathbf{x}$ ## 3. Computational Methods ### 3.1 Discretization Approaches #### Finite Element Methods (FEM) FEM dominates stress/strain analysis, thermal modeling, and electromagnetic simulation. The **weak formulation** transforms strong-form PDEs into integral equations: For the heat equation $-\nabla \cdot (k \nabla T) = Q$: $$ \int_\Omega \nabla w \cdot (k \nabla T) \, d\Omega = \int_\Omega w Q \, d\Omega + \int_{\Gamma_N} w q \, dS $$ Where: - $w$ — test/weight function - $\Omega$ — domain - $\Gamma_N$ — Neumann boundary **Galerkin Approximation:** $$ T(\mathbf{x}) \approx \sum_{i=1}^{N} T_i N_i(\mathbf{x}) $$ Where $N_i(\mathbf{x})$ are shape functions and $T_i$ are nodal values. #### Finite Difference Methods (FDM) Efficient for regular geometries and time-dependent problems. **Explicit Scheme (Forward Euler):** $$ \frac{T_i^{n+1} - T_i^n}{\Delta t} = \alpha \frac{T_{i+1}^n - 2T_i^n + T_{i-1}^n}{\Delta x^2} $$ **Stability Condition (CFL):** $$ \Delta t \leq \frac{\Delta x^2}{2\alpha} $$ **Implicit Scheme (Backward Euler):** $$ \frac{T_i^{n+1} - T_i^n}{\Delta t} = \alpha \frac{T_{i+1}^{n+1} - 2T_i^{n+1} + T_{i-1}^{n+1}}{\Delta x^2} $$ - Unconditionally stable but requires solving linear systems #### Monte Carlo Methods Essential for stochastic processes, particularly **ion implantation**. **Binary Collision Approximation (BCA):** 1. Sample impact parameter from screened Coulomb potential 2. Calculate scattering angle using: $$\theta = \pi - 2 \int_{r_{\min}}^{\infty} \frac{b \, dr}{r^2 \sqrt{1 - \frac{V(r)}{E_{\text{CM}}} - \frac{b^2}{r^2}}}$$ 3. Compute energy transfer: $$T = \frac{4 M_1 M_2}{(M_1 + M_2)^2} E \sin^2\left(\frac{\theta}{2}\right)$$ 4. Track recoils, vacancies, and interstitials 5. Accumulate statistics over $10^4 - 10^6$ ions ### 3.2 Multi-Scale Modeling | Scale | Length | Time | Methods | |:------|:-------|:-----|:--------| | Quantum | 0.1–1 nm | fs | DFT, ab initio MD | | Atomistic | 1–100 nm | ps–ns | Classical MD, Kinetic MC | | Mesoscale | 100 nm–10 μm | μs–ms | Phase field, Continuum MC | | Continuum | μm–mm | ms–hours | FEM, FDM, FVM | | Equipment | cm–m | seconds–hours | CFD, Thermal/Mechanical | **Information Flow Between Scales:** - **Upscaling:** Parameters computed at lower scales inform higher-scale models - Reaction barriers from DFT → Kinetic Monte Carlo rates - Surface mobilities from MD → Continuum deposition models - **Downscaling:** Boundary conditions and fields from higher scales - Temperature fields → Local reaction rates - Stress fields → Defect migration barriers ## 4. Optimization Frameworks ### 4.1 The General Problem Structure Semiconductor process optimization typically takes the form: $$ \min_{\mathbf{x} \in \mathcal{X}} f(\mathbf{x}) \quad \text{subject to} \quad g_i(\mathbf{x}) \leq 0, \quad h_j(\mathbf{x}) = 0 $$ Where: - $\mathbf{x} \in \mathbb{R}^n$ — process parameters (temperatures, pressures, times, flows, powers) - $f(\mathbf{x})$ — objective function (often negative yield or weighted combination) - $g_i(\mathbf{x}) \leq 0$ — inequality constraints (equipment limits, process windows) - $h_j(\mathbf{x}) = 0$ — equality constraints (design requirements) **Typical Parameter Vector:** $$ \mathbf{x} = \begin{bmatrix} T_1 \\ T_2 \\ P_{\text{chamber}} \\ t_{\text{process}} \\ \text{Flow}_{\text{gas1}} \\ \text{Flow}_{\text{gas2}} \\ \text{RF Power} \\ \vdots \end{bmatrix} $$ ### 4.2 Response Surface Methodology (RSM) Classical RSM builds polynomial surrogate models from designed experiments: **Second-Order Model:** $$ \hat{y} = \beta_0 + \sum_{i=1}^{k} \beta_i x_i + \sum_{i=1}^{k} \sum_{j>i}^{k} \beta_{ij} x_i x_j + \sum_{i=1}^{k} \beta_{ii} x_i^2 + \epsilon $$ **Matrix Form:** $$ \hat{y} = \beta_0 + \mathbf{x}^T \mathbf{b} + \mathbf{x}^T \mathbf{B} \mathbf{x} $$ Where: - $\mathbf{b}$ — vector of linear coefficients - $\mathbf{B}$ — matrix of quadratic and interaction coefficients **Design of Experiments (DOE) Types:** | Design Type | Runs for k Factors | Best For | |:------------|:-------------------|:---------| | Full Factorial | $2^k$ | Small k, all interactions | | Fractional Factorial | $2^{k-p}$ | Screening, main effects | | Central Composite | $2^k + 2k + n_c$ | Response surfaces | | Box-Behnken | Varies | Quadratic models, efficient | **Optimal Point (for quadratic model):** $$ \mathbf{x}^* = -\frac{1}{2} \mathbf{B}^{-1} \mathbf{b} $$ ### 4.3 Bayesian Optimization For expensive black-box functions, Bayesian optimization is remarkably efficient. **Gaussian Process Prior:** $$ f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')) $$ **Common Kernels:** - **Squared Exponential (RBF):** $$k(\mathbf{x}, \mathbf{x}') = \sigma^2 \exp\left( -\frac{\|\mathbf{x} - \mathbf{x}'\|^2}{2\ell^2} \right)$$ - **Matérn 5/2:** $$k(\mathbf{x}, \mathbf{x}') = \sigma^2 \left(1 + \frac{\sqrt{5}r}{\ell} + \frac{5r^2}{3\ell^2}\right) \exp\left(-\frac{\sqrt{5}r}{\ell}\right)$$ where $r = \|\mathbf{x} - \mathbf{x}'\|$ **Posterior Distribution:** Given observations $\mathcal{D} = \{(\mathbf{x}_i, y_i)\}_{i=1}^{n}$: $$ \mu(\mathbf{x}^*) = \mathbf{k}_*^T (\mathbf{K} + \sigma_n^2 \mathbf{I})^{-1} \mathbf{y} $$ $$ \sigma^2(\mathbf{x}^*) = k(\mathbf{x}^*, \mathbf{x}^*) - \mathbf{k}_*^T (\mathbf{K} + \sigma_n^2 \mathbf{I})^{-1} \mathbf{k}_* $$ **Acquisition Functions:** - **Expected Improvement (EI):** $$\text{EI}(\mathbf{x}) = \mathbb{E}\left[\max(f(\mathbf{x}) - f^+, 0)\right]$$ Closed form: $$\text{EI}(\mathbf{x}) = (\mu(\mathbf{x}) - f^+ - \xi) \Phi(Z) + \sigma(\mathbf{x}) \phi(Z)$$ where $Z = \frac{\mu(\mathbf{x}) - f^+ - \xi}{\sigma(\mathbf{x})}$ - **Upper Confidence Bound (UCB):** $$\text{UCB}(\mathbf{x}) = \mu(\mathbf{x}) + \kappa \sigma(\mathbf{x})$$ - **Probability of Improvement (PI):** $$\text{PI}(\mathbf{x}) = \Phi\left(\frac{\mu(\mathbf{x}) - f^+ - \xi}{\sigma(\mathbf{x})}\right)$$ ### 4.4 Metaheuristic Methods For highly non-convex, multimodal optimization landscapes. #### Genetic Algorithms (GA) **Algorithmic Steps:** 1. **Initialize** population of $N$ candidate solutions 2. **Evaluate** fitness $f(\mathbf{x}_i)$ for each individual 3. **Select** parents using tournament/roulette wheel selection 4. **Crossover** to create offspring: - Single-point: $\mathbf{x}_{\text{child}} = [\mathbf{x}_1(1:c), \mathbf{x}_2(c+1:n)]$ - Blend: $\mathbf{x}_{\text{child}} = \alpha \mathbf{x}_1 + (1-\alpha) \mathbf{x}_2$ 5. **Mutate** with probability $p_m$: $$x_i' = x_i + \mathcal{N}(0, \sigma^2)$$ 6. **Replace** population and repeat #### Particle Swarm Optimization (PSO) **Update Equations:** $$ \mathbf{v}_i^{t+1} = \omega \mathbf{v}_i^t + c_1 r_1 (\mathbf{p}_i - \mathbf{x}_i^t) + c_2 r_2 (\mathbf{g} - \mathbf{x}_i^t) $$ $$ \mathbf{x}_i^{t+1} = \mathbf{x}_i^t + \mathbf{v}_i^{t+1} $$ Where: - $\omega$ — inertia weight (typically 0.4–0.9) - $c_1, c_2$ — cognitive and social parameters (typically ~2.0) - $\mathbf{p}_i$ — personal best position - $\mathbf{g}$ — global best position - $r_1, r_2$ — random numbers in $[0, 1]$ #### Simulated Annealing (SA) **Acceptance Probability:** $$ P(\text{accept}) = \begin{cases} 1 & \text{if } \Delta E < 0 \\ \exp\left(-\frac{\Delta E}{k_B T}\right) & \text{if } \Delta E \geq 0 \end{cases} $$ **Cooling Schedule:** $$ T_{k+1} = \alpha T_k \quad \text{(geometric, } \alpha \approx 0.95\text{)} $$ ### 4.5 Multi-Objective Optimization Real optimization involves trade-offs between competing objectives. **Multi-Objective Problem:** $$ \min_{\mathbf{x}} \mathbf{F}(\mathbf{x}) = \begin{bmatrix} f_1(\mathbf{x}) \\ f_2(\mathbf{x}) \\ \vdots \\ f_m(\mathbf{x}) \end{bmatrix} $$ **Pareto Dominance:** Solution $\mathbf{x}_1$ dominates $\mathbf{x}_2$ (written $\mathbf{x}_1 \prec \mathbf{x}_2$) if: - $f_i(\mathbf{x}_1) \leq f_i(\mathbf{x}_2)$ for all $i$ - $f_j(\mathbf{x}_1) < f_j(\mathbf{x}_2)$ for at least one $j$ **NSGA-II Algorithm:** 1. Non-dominated sorting to assign ranks 2. Crowding distance calculation: $$d_i = \sum_{m=1}^{M} \frac{f_m^{i+1} - f_m^{i-1}}{f_m^{\max} - f_m^{\min}}$$ 3. Selection based on rank and crowding distance 4. Standard crossover and mutation ### 4.6 Robust Optimization Manufacturing variability is inevitable. Robust optimization explicitly accounts for it. **Mean-Variance Formulation:** $$ \min_{\mathbf{x}} \mathbb{E}_\xi[f(\mathbf{x}, \xi)] + \lambda \cdot \text{Var}_\xi[f(\mathbf{x}, \xi)] $$ **Minimax (Worst-Case) Formulation:** $$ \min_{\mathbf{x}} \max_{\xi \in \mathcal{U}} f(\mathbf{x}, \xi) $$ **Chance-Constrained Formulation:** $$ \min_{\mathbf{x}} f(\mathbf{x}) \quad \text{s.t.} \quad P(g(\mathbf{x}, \xi) \leq 0) \geq 1 - \alpha $$ **Taguchi Signal-to-Noise Ratios:** - **Smaller-is-better:** $\text{SNR} = -10 \log_{10}\left(\frac{1}{n}\sum_{i=1}^{n} y_i^2\right)$ - **Larger-is-better:** $\text{SNR} = -10 \log_{10}\left(\frac{1}{n}\sum_{i=1}^{n} \frac{1}{y_i^2}\right)$ - **Nominal-is-best:** $\text{SNR} = 10 \log_{10}\left(\frac{\bar{y}^2}{s^2}\right)$ ## 5. Advanced Topics and Modern Approaches ### 5.1 Physics-Informed Neural Networks (PINNs) PINNs embed physical laws directly into neural network training. **Loss Function:** $$ \mathcal{L} = \mathcal{L}_{\text{data}} + \lambda \mathcal{L}_{\text{physics}} + \gamma \mathcal{L}_{\text{BC}} $$ Where: $$ \mathcal{L}_{\text{data}} = \frac{1}{N_d} \sum_{i=1}^{N_d} |u_\theta(\mathbf{x}_i) - u_i|^2 $$ $$ \mathcal{L}_{\text{physics}} = \frac{1}{N_p} \sum_{j=1}^{N_p} |\mathcal{N}[u_\theta(\mathbf{x}_j)]|^2 $$ $$ \mathcal{L}_{\text{BC}} = \frac{1}{N_b} \sum_{k=1}^{N_b} |\mathcal{B}[u_\theta(\mathbf{x}_k)] - g_k|^2 $$ **Example: Heat Equation PINN** For $\frac{\partial T}{\partial t} = \alpha \nabla^2 T$: $$ \mathcal{L}_{\text{physics}} = \frac{1}{N_p} \sum_{j=1}^{N_p} \left| \frac{\partial T_\theta}{\partial t} - \alpha \nabla^2 T_\theta \right|^2_{\mathbf{x}_j, t_j} $$ **Advantages:** - Dramatically reduced data requirements - Physical consistency guaranteed - Effective for inverse problems ### 5.2 Digital Twins and Real-Time Optimization A digital twin is a continuously updated simulation model of the physical process. **Kalman Filter for State Estimation:** **Prediction Step:** $$ \hat{\mathbf{x}}_{k|k-1} = \mathbf{F}_k \hat{\mathbf{x}}_{k-1|k-1} + \mathbf{B}_k \mathbf{u}_k $$ $$ \mathbf{P}_{k|k-1} = \mathbf{F}_k \mathbf{P}_{k-1|k-1} \mathbf{F}_k^T + \mathbf{Q}_k $$ **Update Step:** $$ \mathbf{K}_k = \mathbf{P}_{k|k-1} \mathbf{H}_k^T (\mathbf{H}_k \mathbf{P}_{k|k-1} \mathbf{H}_k^T + \mathbf{R}_k)^{-1} $$ $$ \hat{\mathbf{x}}_{k|k} = \hat{\mathbf{x}}_{k|k-1} + \mathbf{K}_k (\mathbf{z}_k - \mathbf{H}_k \hat{\mathbf{x}}_{k|k-1}) $$ $$ \mathbf{P}_{k|k} = (\mathbf{I} - \mathbf{K}_k \mathbf{H}_k) \mathbf{P}_{k|k-1} $$ **Run-to-Run Control:** $$ \mathbf{u}_{k+1} = \mathbf{u}_k + \mathbf{G} (\mathbf{y}_{\text{target}} - \hat{\mathbf{y}}_k) $$ Where $\mathbf{G}$ is the controller gain matrix. ### 5.3 Machine Learning for Virtual Metrology **Virtual Metrology Model:** $$ \hat{y} = f_{\text{ML}}(\mathbf{x}_{\text{sensor}}, \mathbf{x}_{\text{recipe}}, \mathbf{x}_{\text{context}}) $$ Where: - $\mathbf{x}_{\text{sensor}}$ — in-situ sensor data (OES, RF impedance, etc.) - $\mathbf{x}_{\text{recipe}}$ — process recipe parameters - $\mathbf{x}_{\text{context}}$ — chamber state, maintenance history **Domain Adaptation Challenge:** $$ \mathcal{L}_{\text{total}} = \mathcal{L}_{\text{task}} + \lambda \mathcal{L}_{\text{domain}} $$ Using adversarial training to minimize distribution shift between chambers. ### 5.4 Reinforcement Learning for Sequential Decisions **Markov Decision Process (MDP) Formulation:** - **State** $s$: Current wafer/chamber conditions - **Action** $a$: Recipe adjustments - **Reward** $r$: Yield, throughput, quality metrics - **Transition** $P(s'|s, a)$: Process dynamics **Policy Gradient (REINFORCE):** $$ \nabla_\theta J(\theta) = \mathbb{E}_{\pi_\theta} \left[ \sum_{t=0}^{T} \nabla_\theta \log \pi_\theta(a_t|s_t) \cdot G_t \right] $$ Where $G_t = \sum_{k=t}^{T} \gamma^{k-t} r_k$ is the return. ## 6. Specific Process Case Studies ### 6.1 Lithography: Computational Imaging and OPC **Optical Proximity Correction Optimization:** $$ \mathbf{m}^* = \arg\min_{\mathbf{m}} \|\mathbf{T}_{\text{target}} - \mathbf{I}(\mathbf{m})\|^2 + R(\mathbf{m}) $$ Where: - $\mathbf{m}$ — mask transmission function - $\mathbf{I}(\mathbf{m})$ — forward imaging model - $R(\mathbf{m})$ — regularization (manufacturability, minimum features) **Aerial Image Formation (Scalar Model):** $$ I(x, y) = \left| \int_{-\text{NA}}^{\text{NA}} \tilde{M}(f_x) H(f_x) e^{2\pi i f_x x} df_x \right|^2 $$ **Source-Mask Optimization (SMO):** $$ \min_{\mathbf{m}, \mathbf{s}} \sum_{p} \|I_p(\mathbf{m}, \mathbf{s}) - T_p\|^2 + \lambda_m R_m(\mathbf{m}) + \lambda_s R_s(\mathbf{s}) $$ Jointly optimizing mask pattern and illumination source. ### 6.2 CMP: Pattern-Dependent Modeling **Preston Equation:** $$ \frac{dz}{dt} = K_p \cdot p \cdot V $$ Where: - $K_p$ — Preston coefficient (material-dependent) - $p$ — local pressure - $V$ — relative velocity **Pattern-Dependent Pressure Model:** $$ p_{\text{eff}}(x, y) = p_{\text{applied}} \cdot \frac{1}{\rho(x, y) * K(x, y)} $$ Where $\rho(x, y)$ is the local pattern density and $*$ denotes convolution with a planarization kernel $K$. **Step Height Evolution:** $$ \frac{d(\Delta z)}{dt} = K_p V (p_{\text{high}} - p_{\text{low}}) $$ ### 6.3 Plasma Etching: Plasma-Surface Interactions **Species Balance in Plasma:** $$ \frac{dn_i}{dt} = \sum_j k_{ji} n_j n_e - \sum_k k_{ik} n_i n_e - \frac{n_i}{\tau_{\text{res}}} + S_i $$ Where: - $n_i$ — density of species $i$ - $k_{ji}$ — rate coefficients (Arrhenius form) - $\tau_{\text{res}}$ — residence time - $S_i$ — source terms **Ion Energy Distribution Function:** $$ f(E) = \frac{1}{\sqrt{2\pi}\sigma_E} \exp\left(-\frac{(E - \bar{E})^2}{2\sigma_E^2}\right) $$ **Etch Yield:** $$ Y(E, \theta) = Y_0 \cdot \sqrt{E - E_{\text{th}}} \cdot f(\theta) $$ Where $f(\theta)$ is the angular dependence. ## 7. The Mathematics of Yield **Poisson Defect Model:** $$ Y = e^{-D \cdot A} $$ Where: - $D$ — defect density ($\text{defects/cm}^2$) - $A$ — chip area ($\text{cm}^2$) **Negative Binomial (Clustered Defects):** $$ Y = \left(1 + \frac{DA}{\alpha}\right)^{-\alpha} $$ Where $\alpha$ is the clustering parameter (smaller = more clustered). **Parametric Yield:** For a parameter with distribution $p(\theta)$ and specification $[\theta_{\min}, \theta_{\max}]$: $$ Y_{\text{param}} = \int_{\theta_{\min}}^{\theta_{\max}} p(\theta) \, d\theta $$ For Gaussian distribution: $$ Y_{\text{param}} = \Phi\left(\frac{\theta_{\max} - \mu}{\sigma}\right) - \Phi\left(\frac{\theta_{\min} - \mu}{\sigma}\right) $$ **Process Capability Index:** $$ C_{pk} = \min\left(\frac{\mu - \text{LSL}}{3\sigma}, \frac{\text{USL} - \mu}{3\sigma}\right) $$ **Total Yield:** $$ Y_{\text{total}} = Y_{\text{defect}} \times Y_{\text{parametric}} \times Y_{\text{test}} $$ ## 8. Open Challenges 1. **High-Dimensional Optimization** - Hundreds to thousands of interacting parameters - Curse of dimensionality in sampling-based methods - Need for effective dimensionality reduction 2. **Uncertainty Quantification** - Error propagation across model hierarchies - Aleatory vs. epistemic uncertainty separation - Confidence bounds on predictions 3. **Data Scarcity** - Each experimental data point costs \$1000+ - Models must learn from small datasets - Transfer learning between processes/tools 4. **Interpretability** - Black-box models limit root cause analysis - Need for physics-informed feature engineering - Explainable AI for process engineering 5. **Real-Time Constraints** - Run-to-run control requires millisecond decisions - Reduced-order models needed - Edge computing for in-situ optimization 6. **Integration Complexity** - Multiple physics domains coupled - Full-flow optimization across 500+ steps - Design-technology co-optimization ## 9. Optimization summary Semiconductor manufacturing process optimization represents one of the most sophisticated applications of computational mathematics in industry. It integrates: - **Classical numerical methods** (FEM, FDM, Monte Carlo) - **Statistical modeling** (DOE, RSM, uncertainty quantification) - **Optimization theory** (convex/non-convex, single/multi-objective, deterministic/robust) - **Machine learning** (neural networks, Gaussian processes, reinforcement learning) - **Control theory** (Kalman filtering, run-to-run control, MPC) The field continues to evolve as feature sizes shrink toward atomic scales, process complexity grows, and computational capabilities expand. Success requires not just mathematical sophistication but deep physical intuition about the processes being modeled—the best work reflects genuine synthesis across disciplines.

optimization inversion, multimodal ai

Optimization-based inversion iteratively refines latent codes minimizing reconstruction error.

optimization under uncertainty, digital manufacturing

Optimize considering variability.

optimization-based inversion, generative models

Optimize latent to match image.

orchestrator,router,multi-model

An orchestrator routes each request to the best model or tool (cheap vs. expensive, code vs. chat) and can chain multiple steps into a workflow.

orthogonal convolutions, ai safety

Constrain convolutions to be orthogonal.

otter,multimodal ai

Multi-modal instruction tuning.

out-of-distribution, ai safety

Out-of-distribution inputs differ from training data potentially causing failures.