metaformer,llm architecture
Abstract framework for transformer-like architectures.
288 technical terms and definitions
Abstract framework for transformer-like architectures.
Learn good initialization via meta-learning.
# Mathematical Modeling of Metal Deposition in Semiconductor Manufacturing ## 1. Overview: Metal Deposition Processes Metal deposition is a critical step in semiconductor fabrication, creating interconnects, contacts, barrier layers, and various metallic structures. The primary deposition methods require distinct mathematical treatments: | Process | Physics Domain | Key Mathematics | |---------|----------------|-----------------| | **PVD (Sputtering)** | Ballistic transport, plasma physics | Boltzmann transport, Monte Carlo | | **CVD/PECVD** | Gas-phase transport, surface reactions | Navier-Stokes, reaction-diffusion | | **ALD** | Self-limiting surface chemistry | Site-balance kinetics | | **Electroplating (ECD)** | Electrochemistry, mass transport | Butler-Volmer, Nernst-Planck | ## 2. Transport Phenomena Models ### 2.1 Gas-Phase Transport (CVD/PECVD) The precursor concentration field follows the **convection-diffusion-reaction equation**: $$ \frac{\partial C}{\partial t} + \mathbf{v} \cdot \nabla C = D \nabla^2 C + R_{gas} $$ Where: - $C$ — precursor concentration (mol/m³) - $\mathbf{v}$ — velocity field vector (m/s) - $D$ — diffusion coefficient (m²/s) - $R_{gas}$ — gas-phase reaction source term (mol/m³·s) ### 2.2 Flow Field Equations The **incompressible Navier-Stokes equations** govern the velocity field: $$ \rho \left( \frac{\partial \mathbf{v}}{\partial t} + \mathbf{v} \cdot \nabla \mathbf{v} \right) = -\nabla p + \mu \nabla^2 \mathbf{v} $$ With continuity equation: $$ \nabla \cdot \mathbf{v} = 0 $$ Where: - $\rho$ — gas density (kg/m³) - $p$ — pressure (Pa) - $\mu$ — dynamic viscosity (Pa·s) ### 2.3 Knudsen Number and Transport Regimes At low pressures, the **Knudsen number** determines the transport regime: $$ Kn = \frac{\lambda}{L} = \frac{k_B T}{\sqrt{2} \pi d^2 p L} $$ Where: - $\lambda$ — mean free path (m) - $L$ — characteristic length (m) - $k_B$ — Boltzmann constant ($1.38 \times 10^{-23}$ J/K) - $T$ — temperature (K) - $d$ — molecular diameter (m) - $p$ — pressure (Pa) **Transport regime classification:** - $Kn < 0.01$ — **Continuum regime** → Navier-Stokes CFD - $0.01 < Kn < 0.1$ — **Slip flow regime** → Modified NS with slip boundary conditions - $0.1 < Kn < 10$ — **Transitional regime** → DSMC, Boltzmann equation - $Kn > 10$ — **Free molecular regime** → Ballistic/Monte Carlo methods ## 3. Surface Reaction Kinetics ### 3.1 Langmuir-Hinshelwood Mechanism For bimolecular surface reactions (common in CVD): $$ r = \frac{k \cdot K_A K_B \cdot p_A p_B}{(1 + K_A p_A + K_B p_B)^2} $$ Where: - $r$ — reaction rate (mol/m²·s) - $k$ — surface reaction rate constant (mol/m²·s) - $K_A, K_B$ — adsorption equilibrium constants (Pa⁻¹) - $p_A, p_B$ — partial pressures of reactants A and B (Pa) ### 3.2 Sticking Coefficient Model The probability that an impinging molecule adsorbs on the surface: $$ S = S_0 \exp\left( -\frac{E_a}{k_B T} \right) \cdot f(\theta) $$ Where: - $S$ — sticking coefficient (dimensionless) - $S_0$ — pre-exponential sticking factor - $E_a$ — activation energy (J) - $f(\theta) = (1 - \theta)^n$ — site blocking function - $\theta$ — surface coverage (dimensionless, 0 to 1) - $n$ — order of site blocking ### 3.3 Arrhenius Temperature Dependence $$ k(T) = A \exp\left( -\frac{E_a}{RT} \right) $$ Where: - $A$ — pre-exponential factor (frequency factor) - $E_a$ — activation energy (J/mol) - $R$ — universal gas constant (8.314 J/mol·K) - $T$ — absolute temperature (K) ## 4. Film Growth Models ### 4.1 Continuum Surface Evolution #### Edwards-Wilkinson Equation (Linear Growth) $$ \frac{\partial h}{\partial t} = \nu \nabla^2 h + F + \eta(\mathbf{x}, t) $$ #### Kardar-Parisi-Zhang (KPZ) Equation (Nonlinear Growth) $$ \frac{\partial h}{\partial t} = \nu \nabla^2 h + \frac{\lambda}{2} |\nabla h|^2 + F + \eta $$ Where: - $h(\mathbf{x}, t)$ — surface height at position $\mathbf{x}$ and time $t$ - $\nu$ — surface diffusion coefficient (m²/s) - $\lambda$ — nonlinear growth parameter - $F$ — mean deposition flux (m/s) - $\eta$ — stochastic noise term (Gaussian white noise) ### 4.2 Scaling Relations Surface roughness evolves according to: $$ W(L, t) = L^\alpha f\left( \frac{t}{L^z} \right) $$ Where: - $W$ — interface width (roughness) - $L$ — system size - $\alpha$ — roughness exponent - $z$ — dynamic exponent - $f$ — scaling function ## 5. Step Coverage and Conformality ### 5.1 Thiele Modulus For high-aspect-ratio features, the **Thiele modulus** determines conformality: $$ \phi = L \sqrt{\frac{k_s}{D_{eff}}} $$ Where: - $\phi$ — Thiele modulus (dimensionless) - $L$ — feature depth (m) - $k_s$ — surface reaction rate constant (m/s) - $D_{eff}$ — effective diffusivity (m²/s) **Step coverage regimes:** - $\phi \ll 1$ — **Reaction-limited** → Excellent conformality - $\phi \gg 1$ — **Transport-limited** → Poor step coverage (bread-loafing) ### 5.2 Knudsen Diffusion in Trenches $$ D_K = \frac{w}{3} \sqrt{\frac{8 R T}{\pi M}} $$ Where: - $D_K$ — Knudsen diffusion coefficient (m²/s) - $w$ — trench width (m) - $R$ — universal gas constant (J/mol·K) - $T$ — temperature (K) - $M$ — molecular weight (kg/mol) ### 5.3 Feature-Scale Concentration Profile Solving for concentration in a trench with reactive walls: $$ D_{eff} \frac{d^2 C}{dy^2} = \frac{2 k_s C}{w} $$ General solution: $$ C(y) = C_0 \frac{\cosh\left( \phi \frac{L - y}{L} \right)}{\cosh(\phi)} $$ ## 6. Atomic Layer Deposition (ALD) Models ### 6.1 Self-Limiting Surface Kinetics Surface site balance equation: $$ \frac{d\theta}{dt} = k_a C (1 - \theta) - k_d \theta $$ Where: - $\theta$ — fractional surface coverage - $k_a$ — adsorption rate constant (m³/mol·s) - $k_d$ — desorption rate constant (s⁻¹) - $C$ — gas-phase precursor concentration (mol/m³) At equilibrium saturation: $$ \theta_{eq} = \frac{k_a C}{k_a C + k_d} \approx 1 \quad \text{(for strong chemisorption)} $$ ### 6.2 Growth Per Cycle (GPC) $$ \text{GPC} = \Gamma_0 \cdot \Omega \cdot \eta $$ Where: - $\Gamma_0$ — surface site density (sites/m²) - $\Omega$ — volume per deposited atom (m³) - $\eta$ — reaction efficiency (dimensionless) ### 6.3 Saturation Dose-Time Relationship $$ \theta(t) = 1 - \exp\left( -\frac{S \cdot \Phi \cdot t}{\Gamma_0} \right) $$ **Impingement flux** from kinetic theory: $$ \Phi = \frac{p}{\sqrt{2 \pi m k_B T}} $$ Where: - $\Phi$ — molecular impingement flux (molecules/m²·s) - $p$ — precursor partial pressure (Pa) - $m$ — molecular mass (kg) ## 7. Plasma Modeling (PVD/PECVD) ### 7.1 Plasma Sheath Physics **Child-Langmuir law** for ion current density: $$ J_{ion} = \frac{4 \varepsilon_0}{9} \sqrt{\frac{2e}{M_i}} \frac{V_s^{3/2}}{d_s^2} $$ Where: - $J_{ion}$ — ion current density (A/m²) - $\varepsilon_0$ — vacuum permittivity ($8.85 \times 10^{-12}$ F/m) - $e$ — elementary charge ($1.6 \times 10^{-19}$ C) - $M_i$ — ion mass (kg) - $V_s$ — sheath voltage (V) - $d_s$ — sheath thickness (m) ### 7.2 Ion Energy at Substrate $$ \varepsilon_{ion} \approx e V_s + \frac{1}{2} M_i v_{Bohm}^2 $$ **Bohm velocity:** $$ v_{Bohm} = \sqrt{\frac{k_B T_e}{M_i}} $$ Where: - $T_e$ — electron temperature (K or eV) ### 7.3 Sputtering Yield (Sigmund Formula) $$ Y(E) = \frac{3 \alpha}{4 \pi^2} \cdot \frac{4 M_1 M_2}{(M_1 + M_2)^2} \cdot \frac{E}{U_0} $$ Where: - $Y$ — sputtering yield (atoms/ion) - $\alpha$ — dimensionless factor (~0.2–0.4) - $M_1$ — incident ion mass - $M_2$ — target atom mass - $E$ — incident ion energy (eV) - $U_0$ — surface binding energy (eV) ### 7.4 Electron Energy Distribution Function (EEDF) The Boltzmann equation in energy space: $$ \frac{\partial f}{\partial t} + \mathbf{v} \cdot \nabla f + \frac{e \mathbf{E}}{m_e} \cdot \nabla_v f = C[f] $$ Where: - $f$ — electron energy distribution function - $\mathbf{E}$ — electric field - $m_e$ — electron mass - $C[f]$ — collision integral ## 8. MDP: Markov Decision Process for Process Control ### 8.1 MDP Formulation A Markov Decision Process is defined by the tuple: $$ \mathcal{M} = (S, A, P, R, \gamma) $$ **Components in semiconductor context:** - **State space $S$**: Film thickness, resistivity, uniformity, equipment state, wafer position - **Action space $A$**: Temperature, pressure, flow rates, RF power, deposition time - **Transition probability $P(s' | s, a)$**: Stochastic process model - **Reward function $R(s, a)$**: Yield, uniformity, throughput, quality metrics - **Discount factor $\gamma$**: Time preference (typically 0.9–0.99) ### 8.2 Bellman Optimality Equation $$ V^*(s) = \max_{a \in A} \left[ R(s, a) + \gamma \sum_{s'} P(s' | s, a) V^*(s') \right] $$ **Q-function formulation:** $$ Q^*(s, a) = R(s, a) + \gamma \sum_{s'} P(s' | s, a) \max_{a'} Q^*(s', a') $$ ### 8.3 Run-to-Run (R2R) Control Optimal recipe adjustment after each wafer: $$ \mathbf{u}_{k+1} = \mathbf{u}_k + \mathbf{K} (\mathbf{y}_{target} - \mathbf{y}_k) $$ Where: - $\mathbf{u}_k$ — process recipe parameters at run $k$ - $\mathbf{y}_k$ — measured output at run $k$ - $\mathbf{K}$ — controller gain matrix (from MDP policy optimization) ### 8.4 Reinforcement Learning Approaches | Method | Application | Characteristics | |--------|-------------|-----------------| | **Q-Learning** | Discrete parameter optimization | Model-free, tabular | | **Deep Q-Network (DQN)** | High-dimensional state spaces | Neural network approximation | | **Policy Gradient** | Continuous process control | Direct policy optimization | | **Actor-Critic (A2C/PPO)** | Complex control tasks | Combined value and policy | | **Model-Based RL** | Physics-informed control | Sample efficient | ## 9. Electrochemical Deposition (Copper Damascene) ### 9.1 Butler-Volmer Equation $$ i = i_0 \left[ \exp\left( \frac{\alpha_a F \eta}{RT} \right) - \exp\left( -\frac{\alpha_c F \eta}{RT} \right) \right] $$ Where: - $i$ — current density (A/m²) - $i_0$ — exchange current density (A/m²) - $\alpha_a, \alpha_c$ — anodic and cathodic transfer coefficients - $F$ — Faraday constant (96,485 C/mol) - $\eta = E - E_{eq}$ — overpotential (V) - $R$ — gas constant (J/mol·K) - $T$ — temperature (K) ### 9.2 Mass Transport Limited Current $$ i_L = \frac{n F D C_b}{\delta} $$ Where: - $i_L$ — limiting current density (A/m²) - $n$ — number of electrons transferred - $D$ — diffusion coefficient of Cu²⁺ (m²/s) - $C_b$ — bulk concentration (mol/m³) - $\delta$ — diffusion layer thickness (m) ### 9.3 Nernst-Planck Equation $$ \mathbf{J}_i = -D_i \nabla C_i - \frac{z_i F D_i}{RT} C_i \nabla \phi + C_i \mathbf{v} $$ Where: - $\mathbf{J}_i$ — flux of species $i$ - $z_i$ — charge number - $\phi$ — electric potential ### 9.4 Superfilling (Bottom-Up Fill) The curvature-enhanced accelerator mechanism: $$ v_n = v_0 (1 + \kappa \cdot \Gamma_{acc}) $$ Where: - $v_n$ — local growth velocity normal to surface - $v_0$ — baseline growth velocity - $\kappa$ — local surface curvature (1/m) - $\Gamma_{acc}$ — accelerator surface concentration ## 10. Multiscale Modeling Framework ### 10.1 Hierarchical Scale Integration ``` ┌──────────────────────────────────────────────────────────────┐ │ REACTOR SCALE │ │ CFD: Flow, temperature, concentration │ │ Time: seconds | Length: cm │ └─────────────────────────┬────────────────────────────────────┘ │ Boundary fluxes ▼ ┌──────────────────────────────────────────────────────────────┐ │ FEATURE SCALE │ │ Level-set / String method for surface evolution │ │ Time: seconds | Length: μm │ └─────────────────────────┬────────────────────────────────────┘ │ Local rates ▼ ┌──────────────────────────────────────────────────────────────┐ │ MESOSCALE (kMC) │ │ Kinetic Monte Carlo: nucleation, island growth │ │ Time: ms | Length: nm │ └─────────────────────────┬────────────────────────────────────┘ │ Rate parameters ▼ ┌──────────────────────────────────────────────────────────────┐ │ ATOMISTIC (MD/DFT) │ │ Molecular dynamics, ab initio: binding energies, │ │ diffusion barriers, reaction paths │ │ Time: ps | Length: Å │ └──────────────────────────────────────────────────────────────┘ ``` ### 10.2 Kinetic Monte Carlo (kMC) Event rate from transition state theory: $$ k_i = \nu_0 \exp\left( -\frac{E_{a,i}}{k_B T} \right) $$ Total rate and time step: $$ k_{total} = \sum_i k_i, \quad \Delta t = -\frac{\ln(r)}{k_{total}} $$ Where $r \in (0, 1]$ is a uniform random number. ### 10.3 Molecular Dynamics Newton's equations of motion: $$ m_i \frac{d^2 \mathbf{r}_i}{dt^2} = -\nabla_i U(\mathbf{r}_1, \mathbf{r}_2, \ldots, \mathbf{r}_N) $$ **Lennard-Jones potential:** $$ U_{LJ}(r) = 4\varepsilon \left[ \left( \frac{\sigma}{r} \right)^{12} - \left( \frac{\sigma}{r} \right)^6 \right] $$ **Embedded Atom Method (EAM) for metals:** $$ U = \sum_i F_i(\rho_i) + \frac{1}{2} \sum_{i \neq j} \phi_{ij}(r_{ij}) $$ Where $\rho_i = \sum_{j \neq i} f_j(r_{ij})$ is the electron density at atom $i$. ## 11. Uniformity Modeling ### 11.1 Wafer-Scale Thickness Distribution (Sputtering) For a circular magnetron target: $$ t(r) = \int_{target} \frac{Y \cdot J_{ion} \cdot \cos\theta_t \cdot \cos\theta_w}{\pi R^2} \, dA $$ Where: - $t(r)$ — thickness at radial position $r$ - $\theta_t$ — emission angle from target - $\theta_w$ — incidence angle at wafer ### 11.2 Uniformity Metrics **Within-Wafer Uniformity (WIW):** $$ \sigma_{WIW} = \frac{1}{\bar{t}} \sqrt{\frac{1}{N} \sum_{i=1}^{N} (t_i - \bar{t})^2} \times 100\% $$ **Wafer-to-Wafer Uniformity (WTW):** $$ \sigma_{WTW} = \frac{1}{\bar{t}_{avg}} \sqrt{\frac{1}{M} \sum_{j=1}^{M} (\bar{t}_j - \bar{t}_{avg})^2} \times 100\% $$ **Target specifications:** - $\sigma_{WIW} < 1\%$ for advanced nodes (≤7 nm) - $\sigma_{WTW} < 0.5\%$ for high-volume manufacturing ## 12. Virtual Metrology and Statistical Models ### 12.1 Gaussian Process Regression (GPR) $$ f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')) $$ **Squared exponential (RBF) kernel:** $$ k(\mathbf{x}, \mathbf{x}') = \sigma_f^2 \exp\left( -\frac{|\mathbf{x} - \mathbf{x}'|^2}{2\ell^2} \right) $$ **Predictive distribution:** $$ f_* | \mathbf{X}, \mathbf{y}, \mathbf{x}_* \sim \mathcal{N}(\bar{f}_*, \text{var}(f_*)) $$ ### 12.2 Partial Least Squares (PLS) $$ \mathbf{Y} = \mathbf{X} \mathbf{B} + \mathbf{E} $$ Where: - $\mathbf{X}$ — process parameter matrix - $\mathbf{Y}$ — quality outcome matrix - $\mathbf{B}$ — regression coefficient matrix - $\mathbf{E}$ — residual matrix ### 12.3 Principal Component Analysis (PCA) $$ \mathbf{X} = \mathbf{T} \mathbf{P}^T + \mathbf{E} $$ **Hotelling's $T^2$ statistic for fault detection:** $$ T^2 = \sum_{i=1}^{k} \frac{t_i^2}{\lambda_i} $$ ## 13. Process Optimization ### 13.1 Response Surface Methodology (RSM) **Second-order polynomial model:** $$ y = \beta_0 + \sum_{i=1}^{k} \beta_i x_i + \sum_{i=1}^{k} \beta_{ii} x_i^2 + \sum_{i < j} \beta_{ij} x_i x_j + \varepsilon $$ ### 13.2 Constrained Optimization $$ \min_{\mathbf{x}} f(\mathbf{x}) \quad \text{subject to} \quad g_i(\mathbf{x}) \leq 0, \quad h_j(\mathbf{x}) = 0 $$ **Example constraints:** - $g_1$: Non-uniformity ≤ 3% - $g_2$: Resistivity within spec - $g_3$: Throughput ≥ target - $h_1$: Total film thickness = target ### 13.3 Pareto Multi-Objective Optimization $$ \min_{\mathbf{x}} \left[ f_1(\mathbf{x}), f_2(\mathbf{x}), \ldots, f_m(\mathbf{x}) \right] $$ Common trade-offs: - Uniformity vs. throughput - Film quality vs. cost - Conformality vs. deposition rate ## 14. Mathematical Toolkit Reference | Domain | Key Equations | Application | |--------|---------------|-------------| | **Transport** | Navier-Stokes, Convection-Diffusion | Gas flow, precursor delivery | | **Kinetics** | Arrhenius, Langmuir-Hinshelwood | Reaction rates | | **Surface Evolution** | KPZ, Level-set, Edwards-Wilkinson | Film morphology | | **Plasma** | Boltzmann, Child-Langmuir | Ion/electron dynamics | | **Electrochemistry** | Butler-Volmer, Nernst-Planck | Copper plating | | **Control** | Bellman, MDP, RL algorithms | Recipe optimization | | **Statistics** | GPR, PLS, PCA | Virtual metrology | | **Multiscale** | MD, kMC, Continuum | Integrated simulation | ## 15. Physical Constants | Constant | Symbol | Value | Units | |----------|--------|-------|-------| | Boltzmann constant | $k_B$ | $1.38 \times 10^{-23}$ | J/K | | Gas constant | $R$ | $8.314$ | J/(mol·K) | | Faraday constant | $F$ | $96,485$ | C/mol | | Elementary charge | $e$ | $1.60 \times 10^{-19}$ | C | | Vacuum permittivity | $\varepsilon_0$ | $8.85 \times 10^{-12}$ | F/m | | Avogadro's number | $N_A$ | $6.02 \times 10^{23}$ | mol⁻¹ | | Electron mass | $m_e$ | $9.11 \times 10^{-31}$ | kg |
Metapaths are composite relations connecting nodes through sequences of edge types in heterogeneous graphs used for similarity and embedding.
Heterogeneous graph embeddings.
Metapath2vec learns embeddings in heterogeneous graphs through metapath-guided random walks.
Meta Q-Network applies Q-learning to neural architecture search representing architectures as state sequences for discrete action spaces.
Suggest method names from implementation.
# Semiconductor Manufacturing Process Metrology: Mathematical Modeling
## 1. The Core Problem Structure
Semiconductor metrology faces a fundamental **inverse problem**: we make indirect measurements (optical spectra, scattered X-rays, electron signals) and must infer physical quantities (dimensions, compositions, defect states) that we cannot directly observe at the nanoscale.
### 1.1 Mathematical Formulation
The general measurement model:
$$
\mathbf{y} = \mathcal{F}(\mathbf{p}) + \boldsymbol{\epsilon}
$$
**Variable Definitions:**
- $\mathbf{y}$ — measured signal vector (spectrum, image intensity, scattered amplitude)
- $\mathbf{p}$ — physical parameters of interest (CD, thickness, sidewall angle, composition)
- $\mathcal{F}$ — forward model operator (physics of measurement process)
- $\boldsymbol{\epsilon}$ — noise/uncertainty term
### 1.2 Key Mathematical Challenges
- **Nonlinearity:** $\mathcal{F}$ is typically highly nonlinear
- **Computational cost:** Forward model evaluation is expensive
- **Ill-posedness:** Inverse may be non-unique or unstable
- **High dimensionality:** Many parameters from limited measurements
## 2. Optical Critical Dimension (OCD) / Scatterometry
This is the most mathematically intensive metrology technique in high-volume manufacturing.
### 2.1 Forward Problem: Electromagnetic Scattering
For periodic structures (gratings, arrays), solve Maxwell's equations with Floquet-Bloch boundary conditions.
#### 2.1.1 Maxwell's Equations
$$
\nabla \times \mathbf{E} = -\frac{\partial \mathbf{B}}{\partial t}
$$
$$
\nabla \times \mathbf{H} = \mathbf{J} + \frac{\partial \mathbf{D}}{\partial t}
$$
#### 2.1.2 Rigorous Coupled Wave Analysis (RCWA)
**Field Expansion in Fourier Series:**
The electric field in layer $j$ with grating vector $\mathbf{K}$:
$$
\mathbf{E}(\mathbf{r}) = \sum_{n=-N}^{N} \mathbf{E}_n^{(j)} \exp\left(i(\mathbf{k}_n \cdot \mathbf{r})\right)
$$
where the diffraction wave vectors are:
$$
\mathbf{k}_n = \mathbf{k}_0 + n\mathbf{K}
$$
**Key Properties:**
- Converts PDEs to eigenvalue problem
- Matches boundary conditions at layer interfaces
- Computational complexity: $O(N^3)$ where $N$ = number of Fourier orders
### 2.2 Inverse Problem: Parameter Extraction
Given measured spectra $R(\lambda, \theta)$, find best-fit parameters $\mathbf{p}$.
#### 2.2.1 Optimization Formulation
$$
\hat{\mathbf{p}} = \arg\min_{\mathbf{p}} \left\| \mathbf{y}_{\text{meas}} - \mathcal{F}(\mathbf{p}) \right\|^2 + \lambda R(\mathbf{p})
$$
**Regularization Options:**
- **Tikhonov regularization:**
$$
R(\mathbf{p}) = \left\| \mathbf{p} - \mathbf{p}_0 \right\|^2
$$
- **Sparsity-promoting (L1):**
$$
R(\mathbf{p}) = \left\| \mathbf{p} \right\|_1
$$
- **Total variation:**
$$
R(\mathbf{p}) = \int |\nabla \mathbf{p}| \, d\mathbf{x}
$$
#### 2.2.2 Library-Based Approach
1. **Precomputation:** Generate forward model on dense parameter grid
2. **Storage:** Build library with millions of entries
3. **Search:** Find best match using regression methods
**Regression Methods:**
- Polynomial regression — fast but limited accuracy
- Neural networks — handle nonlinearity well
- Gaussian process regression — provides uncertainty estimates
### 2.3 Parameter Correlations and Uncertainty
#### 2.3.1 Fisher Information Matrix
$$
[\mathbf{I}(\mathbf{p})]_{ij} = \mathbb{E}\left[\frac{\partial \ln L}{\partial p_i}\frac{\partial \ln L}{\partial p_j}\right]
$$
#### 2.3.2 Cramér-Rao Lower Bound
$$
\text{Var}(\hat{p}_i) \geq \left[\mathbf{I}^{-1}\right]_{ii}
$$
**Physical Interpretation:** Strong correlations (e.g., height vs. sidewall angle) manifest as near-singular information matrices—a fundamental limit on independent resolution.
## 3. Thin Film Metrology: Ellipsometry
### 3.1 Physical Model
Ellipsometry measures polarization state change upon reflection:
$$
\rho = \frac{r_p}{r_s} = \tan(\Psi)\exp(i\Delta)
$$
**Variables:**
- $r_p$ — p-polarized reflection coefficient
- $r_s$ — s-polarized reflection coefficient
- $\Psi$ — amplitude ratio angle
- $\Delta$ — phase difference
### 3.2 Transfer Matrix Formalism
For multilayer stacks:
$$
\mathbf{M} = \prod_{j=1}^{N} \mathbf{M}_j = \prod_{j=1}^{N} \begin{pmatrix} \cos\delta_j & \dfrac{i\sin\delta_j}{\eta_j} \\[10pt] i\eta_j\sin\delta_j & \cos\delta_j \end{pmatrix}
$$
where the phase thickness is:
$$
\delta_j = \frac{2\pi}{\lambda} n_j d_j \cos(\theta_j)
$$
**Parameters:**
- $n_j$ — refractive index of layer $j$
- $d_j$ — thickness of layer $j$
- $\theta_j$ — angle of propagation in layer $j$
- $\eta_j$ — optical admittance
### 3.3 Dispersion Models
#### 3.3.1 Cauchy Model (Transparent Materials)
$$
n(\lambda) = A + \frac{B}{\lambda^2} + \frac{C}{\lambda^4}
$$
#### 3.3.2 Sellmeier Equation
$$
n^2(\lambda) = 1 + \sum_{i} \frac{B_i \lambda^2}{\lambda^2 - C_i}
$$
#### 3.3.3 Tauc-Lorentz Model (Amorphous Semiconductors)
$$
\varepsilon_2(E) = \begin{cases}
\dfrac{A E_0 C (E - E_g)^2}{(E^2 - E_0^2)^2 + C^2 E^2} \cdot \dfrac{1}{E} & E > E_g \\[10pt]
0 & E \leq E_g
\end{cases}
$$
with $\varepsilon_1$ derived via Kramers-Kronig relations:
$$
\varepsilon_1(E) = \varepsilon_{1\infty} + \frac{2}{\pi} \mathcal{P} \int_0^\infty \frac{\xi \varepsilon_2(\xi)}{\xi^2 - E^2} d\xi
$$
#### 3.3.4 Drude Model (Metals/Conductors)
$$
\varepsilon(\omega) = \varepsilon_\infty - \frac{\omega_p^2}{\omega^2 + i\gamma\omega}
$$
**Parameters:**
- $\omega_p$ — plasma frequency
- $\gamma$ — damping coefficient
- $\varepsilon_\infty$ — high-frequency dielectric constant
## 4. X-ray Metrology Mathematics
### 4.1 X-ray Reflectivity (XRR)
#### 4.1.1 Parratt Recursion Formula
For specular reflection at grazing incidence:
$$
R_j = \frac{r_{j,j+1} + R_{j+1}\exp(2ik_{z,j+1}d_{j+1})}{1 + r_{j,j+1}R_{j+1}\exp(2ik_{z,j+1}d_{j+1})}
$$
where $r_{j,j+1}$ is the Fresnel coefficient at interface $j$.
#### 4.1.2 Roughness Correction (Névot-Croce Factor)
$$
r'_{j,j+1} = r_{j,j+1} \exp\left(-2k_{z,j}k_{z,j+1}\sigma_j^2\right)
$$
**Parameters:**
- $k_{z,j}$ — perpendicular wave vector component in layer $j$
- $\sigma_j$ — RMS roughness at interface $j$
### 4.2 CD-SAXS (Critical Dimension Small Angle X-ray Scattering)
#### 4.2.1 Scattering Intensity
For transmission scattering from 3D nanostructures:
$$
I(\mathbf{q}) = \left|\tilde{\rho}(\mathbf{q})\right|^2 = \left|\int \Delta\rho(\mathbf{r})\exp(-i\mathbf{q}\cdot\mathbf{r})d^3\mathbf{r}\right|^2
$$
#### 4.2.2 Form Factor for Simple Shapes
**Rectangular parallelepiped:**
$$
F(\mathbf{q}) = V \cdot \text{sinc}\left(\frac{q_x a}{2}\right) \cdot \text{sinc}\left(\frac{q_y b}{2}\right) \cdot \text{sinc}\left(\frac{q_z c}{2}\right)
$$
**Cylinder:**
$$
F(\mathbf{q}) = 2\pi R^2 L \cdot \frac{J_1(q_\perp R)}{q_\perp R} \cdot \text{sinc}\left(\frac{q_z L}{2}\right)
$$
where $J_1$ is the first-order Bessel function.
## 5. Statistical Process Control Mathematics
### 5.1 Virtual Metrology
Predict wafer properties from tool sensor data without direct measurement:
$$
y = f(\mathbf{x}) + \varepsilon
$$
#### 5.1.1 Partial Least Squares (PLS)
Handles high-dimensional, correlated inputs:
1. Find latent variables: $\mathbf{T} = \mathbf{X}\mathbf{W}$
2. Maximize covariance with $y$
3. Model: $y = \mathbf{T}\mathbf{Q} + e$
**Optimization objective:**
$$
\max_{\mathbf{w}} \text{Cov}(\mathbf{X}\mathbf{w}, y)^2 \quad \text{subject to} \quad \|\mathbf{w}\| = 1
$$
#### 5.1.2 Gaussian Process Regression
$$
y(\mathbf{x}) \sim \mathcal{GP}\left(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')\right)
$$
**Common Kernel Functions:**
- **Squared Exponential (RBF):**
$$
k(\mathbf{x}, \mathbf{x}') = \sigma_f^2 \exp\left(-\frac{\|\mathbf{x} - \mathbf{x}'\|^2}{2\ell^2}\right)
$$
- **Matérn 5/2:**
$$
k(r) = \sigma_f^2 \left(1 + \frac{\sqrt{5}r}{\ell} + \frac{5r^2}{3\ell^2}\right) \exp\left(-\frac{\sqrt{5}r}{\ell}\right)
$$
### 5.2 Run-to-Run Control
#### 5.2.1 EWMA Controller
$$
\hat{d}_t = \lambda y_{t-1} + (1-\lambda)\hat{d}_{t-1}
$$
$$
x_t = x_{\text{nom}} - \frac{\hat{d}_t}{\hat{\beta}}
$$
**Parameters:**
- $\lambda$ — smoothing factor (typically 0.2–0.4)
- $\hat{\beta}$ — estimated process gain
- $x_{\text{nom}}$ — nominal recipe setting
#### 5.2.2 Model Predictive Control (MPC)
$$
\min_{\mathbf{u}} \sum_{k=0}^{N} \left\| y_{t+k} - y_{\text{target}} \right\|_Q^2 + \left\| \Delta u_{t+k} \right\|_R^2
$$
subject to:
- Process dynamics: $\mathbf{x}_{t+1} = \mathbf{A}\mathbf{x}_t + \mathbf{B}\mathbf{u}_t$
- Output equation: $y_t = \mathbf{C}\mathbf{x}_t$
- Constraints: $\mathbf{u}_{\min} \leq \mathbf{u}_t \leq \mathbf{u}_{\max}$
### 5.3 Wafer-Level Spatial Modeling
#### 5.3.1 Zernike Polynomial Decomposition
$$
W(r,\theta) = \sum_{n=0}^{N} \sum_{m=-n}^{n} a_{nm} Z_n^m(r,\theta)
$$
**First few Zernike polynomials:**
| Index | Name | Formula |
|-------|------|---------|
| $Z_0^0$ | Piston | $1$ |
| $Z_1^{-1}$ | Tilt Y | $2r\sin\theta$ |
| $Z_1^1$ | Tilt X | $2r\cos\theta$ |
| $Z_2^0$ | Defocus | $\sqrt{3}(2r^2-1)$ |
| $Z_2^{-2}$ | Astigmatism | $\sqrt{6}r^2\sin2\theta$ |
| $Z_2^2$ | Astigmatism | $\sqrt{6}r^2\cos2\theta$ |
#### 5.3.2 Gaussian Random Fields
For spatially correlated residuals:
$$
\text{Cov}\left(W(\mathbf{s}_1), W(\mathbf{s}_2)\right) = \sigma^2 \rho\left(\|\mathbf{s}_1 - \mathbf{s}_2\|; \phi\right)
$$
**Common correlation functions:**
- **Exponential:**
$$
\rho(h) = \exp\left(-\frac{h}{\phi}\right)
$$
- **Gaussian:**
$$
\rho(h) = \exp\left(-\frac{h^2}{\phi^2}\right)
$$
## 6. Overlay Metrology Mathematics
### 6.1 Higher-Order Correction Models
Overlay error as polynomial expansion:
$$
\delta x = T_x + M_x \cdot x + R_x \cdot y + \sum_{i+j \leq n} c_{ij}^x x^i y^j
$$
$$
\delta y = T_y + M_y \cdot y + R_y \cdot x + \sum_{i+j \leq n} c_{ij}^y x^i y^j
$$
**Physical interpretation of linear terms:**
- $T_x, T_y$ — Translation
- $M_x, M_y$ — Magnification
- $R_x, R_y$ — Rotation
### 6.2 Sampling Strategy Optimization
#### 6.2.1 D-Optimal Design
$$
\mathbf{s}^* = \arg\max_{\mathbf{s}} \det\left(\mathbf{X}_s^T \mathbf{X}_s\right)
$$
Minimizes the volume of the confidence ellipsoid for parameter estimates.
#### 6.2.2 Information-Theoretic Approach
Maximize expected information gain:
$$
I(\mathbf{s}) = H(\mathbf{p}) - \mathbb{E}_{\mathbf{y}}\left[H(\mathbf{p}|\mathbf{y})\right]
$$
## 7. Machine Learning Integration
### 7.1 Physics-Informed Neural Networks (PINNs)
Combine data fitting with physical constraints:
$$
\mathcal{L} = \mathcal{L}_{\text{data}} + \lambda \mathcal{L}_{\text{physics}}
$$
**Components:**
- **Data loss:**
$$
\mathcal{L}_{\text{data}} = \frac{1}{N} \sum_{i=1}^{N} \left\| y_i - f_\theta(\mathbf{x}_i) \right\|^2
$$
- **Physics loss (example: Maxwell residual):**
$$
\mathcal{L}_{\text{physics}} = \frac{1}{M} \sum_{j=1}^{M} \left\| \nabla \times \mathbf{E}_\theta - i\omega\mu\mathbf{H}_\theta \right\|^2
$$
### 7.2 Neural Network Surrogates
**Architecture for forward model approximation:**
- **Input:** Geometric parameters $\mathbf{p} \in \mathbb{R}^d$
- **Hidden layers:** Multiple fully-connected layers with ReLU/GELU activation
- **Output:** Simulated spectrum $\mathbf{y} \in \mathbb{R}^m$
**Speedup:** $10^4$ – $10^6\times$ over rigorous simulation
### 7.3 Deep Learning for Defect Detection
**Methods:**
- **CNNs** — Classification and localization
- **Autoencoders** — Anomaly detection via reconstruction error:
$$
\text{Score}(\mathbf{x}) = \left\| \mathbf{x} - D(E(\mathbf{x})) \right\|^2
$$
- **Instance segmentation** — Precise defect boundary delineation
## 8. Uncertainty Quantification
### 8.1 GUM Framework (Guide to Uncertainty in Measurement)
Combined standard uncertainty:
$$
u_c^2(y) = \sum_{i} \left(\frac{\partial f}{\partial x_i}\right)^2 u^2(x_i) + 2\sum_{i
Micro search spaces focus on small components like operations within cells enabling efficient architecture optimization.
Small batch processed at once.
Micro-computed tomography creates 3D reconstructions of package internals with micron-scale resolution.
Competition for efficient models.
Class delegating everything.
Midjourney generates artistic images from text prompts using proprietary diffusion-based models.
Milk run logistics uses regular routes collecting materials from multiple suppliers reducing transportation costs.
Ultra-fast anneal using lasers or flash lamps.
Min tokens ensures generation continues until minimum length.
Min-p sampling sets minimum probability relative to top token.
MinCut pooling learns cluster assignments by minimizing normalized cut objectives creating coarsened graphs with balanced communities.
Update with small batches of streaming data.
Vision-language model aligned with GPT-4.
Mip-NeRF anti-aliases NeRF by integrating over conical frustums rather than points.
Mirostat dynamically adjusts temperature maintaining target perplexity.
Smooth activation x*tanh(softplus(x)).
Handle incomplete multimodal data.
Efficient open-source language model with sliding window attention.
Encode network as MILP for verification.
Mixed model production manufactures multiple products on same line enabling variety without dedicated resources.
Use lower precision (FP16) for some operations to speed up and save memory.
Mixed-precision training uses different numeric precisions for different operations balancing speed and accuracy.
MixMatch unifies consistency regularization entropy minimization and MixUp for semi-supervised learning with unlabeled data.
Mixture of Experts version of Mistral.
Dynamic computation allocation across layers based on input complexity.
Dynamically allocate computation across transformer layers based on token importance.
Mixture of depths dynamically allocates computation across layers per token.
Route each input to a few specialized expert networks instead of all parameters.
MixUp for text combines embeddings of two examples and interpolates their labels for training with continuous semantic augmentation.
MLC LLM provides universal LLM deployment. Compile to any device.
I can outline MLOps flows: versioning models, registries, canary deploys, rollback, and monitoring for drift.
MnasNet performs mobile neural architecture search optimizing accuracy and latency on target devices using reinforcement learning.
MobileNet architecture uses depthwise separable convolutions for efficient mobile deployment.
MobileNetV2 adds inverted residuals and linear bottlenecks improving efficiency and accuracy.
MobileNetV3 uses NAS-discovered architectures with squeeze-excitation and h-swish activation.
Simulate carrier mobility.
Generate mock objects for testing.
Randomly drop modalities during training.
Generate missing modalities.
Blend different optima.
Restrict who can use or modify models.