editing models via task vectors, model merging
Modify capabilities using task directions.
3,145 technical terms and definitions
Modify capabilities using task directions.
Modify real images via inversion.
Analyze brain wave patterns.
Attention mechanisms faster than O(n²).
Share weights across architectures during search.
EfficientNet architecture discovered through NAS balances depth width and resolution using compound scaling.
EfficientNet scaling jointly optimizes depth width and resolution using compound coefficient.
E(n) Equivariant Graph Neural Networks maintain equivariance to Euclidean transformations while learning node and edge features in geometric graphs.
Use principal components for CAM.
L1+L2 regularized attack.
Regularization technique to prevent catastrophic forgetting by protecting important weights.
Train discriminator to detect replaced tokens.
Efficient pre-training using discriminative task instead of masking.
Electrodeionization combines ion exchange and electrodialysis for continuous ultra-pure water production.
# Electromagnetism Mathematics Modeling A comprehensive guide to the mathematical frameworks used in semiconductor device simulation, covering electromagnetic theory, carrier transport, and quantum effects. 1. The Core Problem Semiconductor device modeling requires solving coupled systems that describe: - How electromagnetic fields propagate in and interact with semiconductor materials - How charge carriers (electrons and holes) move in response to fields - How quantum effects modify classical behavior at nanoscales Key Variables: | Symbol | Description | Units | |--------|-------------|-------| | $\phi$ | Electrostatic potential | V | | $n$ | Electron concentration | cm⁻³ | | $p$ | Hole concentration | cm⁻³ | | $\mathbf{E}$ | Electric field | V/cm | | $\mathbf{J}_n, \mathbf{J}_p$ | Current densities | A/cm² | 2. Fundamental Mathematical Frameworks 2.1 Drift-Diffusion System The workhorse of semiconductor device simulation couples three fundamental equations. 2.1.1 Poisson's Equation (Electrostatics) $$ \nabla \cdot (\varepsilon \nabla \phi) = -q(p - n + N_D^+ - N_A^-) $$ Where: - $\varepsilon$ — Permittivity of the semiconductor - $\phi$ — Electrostatic potential - $q$ — Elementary charge ($1.602 \times 10^{-19}$ C) - $n, p$ — Electron and hole concentrations - $N_D^+$ — Ionized donor concentration - $N_A^-$ — Ionized acceptor concentration 2.1.2 Continuity Equations (Carrier Conservation) For electrons: $$ \frac{\partial n}{\partial t} = \frac{1}{q}\nabla \cdot \mathbf{J}_n - R + G $$ For holes: $$ \frac{\partial p}{\partial t} = -\frac{1}{q}\nabla \cdot \mathbf{J}_p - R + G $$ Where: - $R$ — Recombination rate (cm⁻³s⁻¹) - $G$ — Generation rate (cm⁻³s⁻¹) 2.1.3 Current Density Relations Electron current (drift + diffusion): $$ \mathbf{J}_n = q\mu_n n \mathbf{E} + qD_n \nabla n $$ Hole current (drift + diffusion): $$ \mathbf{J}_p = q\mu_p p \mathbf{E} - qD_p \nabla p $$ Einstein Relations: $$ D_n = \frac{k_B T}{q} \mu_n \quad \text{and} \quad D_p = \frac{k_B T}{q} \mu_p $$ 2.1.4 Recombination Models - Shockley-Read-Hall (SRH): $$ R_{SRH} = \frac{np - n_i^2}{\tau_p(n + n_1) + \tau_n(p + p_1)} $$ - Auger Recombination: $$ R_{Auger} = (C_n n + C_p p)(np - n_i^2) $$ - Radiative Recombination: $$ R_{rad} = B(np - n_i^2) $$ 2.2 Maxwell's Equations in Semiconductors For optoelectronics and high-frequency devices, the full electromagnetic treatment is necessary. 2.2.1 Maxwell's Equations $$ \nabla \times \mathbf{E} = -\frac{\partial \mathbf{B}}{\partial t} $$ $$ \nabla \times \mathbf{H} = \mathbf{J} + \frac{\partial \mathbf{D}}{\partial t} $$ $$ \nabla \cdot \mathbf{D} = \rho $$ $$ \nabla \cdot \mathbf{B} = 0 $$ 2.2.2 Constitutive Relations Displacement field: $$ \mathbf{D} = \varepsilon_0 \varepsilon_r(\omega) \mathbf{E} $$ Current density: $$ \mathbf{J} = \sigma(\omega) \mathbf{E} $$ 2.2.3 Frequency-Dependent Dielectric Function $$ \varepsilon(\omega) = \varepsilon_\infty - \frac{\omega_p^2}{\omega^2 + i\gamma\omega} + \sum_j \frac{f_j}{\omega_j^2 - \omega^2 - i\Gamma_j\omega} $$ Components: - First term ($\varepsilon_\infty$): High-frequency (background) permittivity - Second term (Drude): Free carrier response - $\omega_p = \sqrt{\frac{nq^2}{\varepsilon_0 m^*}}$ — Plasma frequency - $\gamma$ — Damping rate - Third term (Lorentz oscillators): Interband transitions - $\omega_j$ — Resonance frequencies - $\Gamma_j$ — Linewidths - $f_j$ — Oscillator strengths 2.2.4 Complex Refractive Index $$ \tilde{n}(\omega) = n(\omega) + i\kappa(\omega) = \sqrt{\varepsilon(\omega)} $$ Optical properties: - Refractive index: $n = \text{Re}(\tilde{n})$ - Extinction coefficient: $\kappa = \text{Im}(\tilde{n})$ - Absorption coefficient: $\alpha = \frac{2\omega\kappa}{c} = \frac{4\pi\kappa}{\lambda}$ 2.3 Boltzmann Transport Equation When drift-diffusion is insufficient (hot carriers, high fields, ultrafast phenomena): $$ \frac{\partial f}{\partial t} + \mathbf{v} \cdot \nabla_\mathbf{r} f + \frac{\mathbf{F}}{\hbar} \cdot \nabla_\mathbf{k} f = \left(\frac{\partial f}{\partial t}\right)_{\text{coll}} $$ Where: - $f(\mathbf{r}, \mathbf{k}, t)$ — Distribution function in 6D phase space - $\mathbf{v} = \frac{1}{\hbar}\nabla_\mathbf{k} E(\mathbf{k})$ — Group velocity - $\mathbf{F}$ — External force (e.g., $q\mathbf{E}$) 2.3.1 Collision Integral (Relaxation Time Approximation) $$ \left(\frac{\partial f}{\partial t}\right)_{\text{coll}} \approx -\frac{f - f_0}{\tau} $$ 2.3.2 Scattering Mechanisms - Acoustic phonon scattering: $$ \frac{1}{\tau_{ac}} \propto T \cdot E^{1/2} $$ - Optical phonon scattering: $$ \frac{1}{\tau_{op}} \propto \left(N_{op} + \frac{1}{2} \mp \frac{1}{2}\right) $$ - Ionized impurity scattering (Brooks-Herring): $$ \frac{1}{\tau_{ii}} \propto \frac{N_I}{E^{3/2}} $$ 2.3.3 Solution Approaches - Monte Carlo methods: Stochastically simulate individual carrier trajectories - Moment expansions: Derive hydrodynamic equations from velocity moments - Spherical harmonic expansion: Expand angular dependence in k-space 2.4 Quantum Transport For nanoscale devices where quantum effects dominate. 2.4.1 Schrödinger Equation (Effective Mass Approximation) $$ \left[-\frac{\hbar^2}{2m^*}\nabla^2 + V(\mathbf{r})\right]\psi = E\psi $$ 2.4.2 Schrödinger-Poisson Self-Consistent Loop ┌─────────────────────────────────────────────────┐ │ │ │ Initial guess: V(r) │ │ │ │ │ ▼ │ │ Solve Schrodinger: H*psi = E*psi │ │ │ │ │ ▼ │ │ Calculate charge density: │ │ rho(r) = q * sum |psi_i(r)|^2 * f(E_i) │ │ │ │ │ ▼ │ │ Solve Poisson: div(grad V) = -rho/eps │ │ │ │ │ ▼ │ │ Check convergence ──► If not, iterate │ │ │ └─────────────────────────────────────────────────┘ 2.4.3 Non-Equilibrium Green's Function (NEGF) Retarded Green's function: $$ [EI - H - \Sigma^R]G^R = I $$ Lesser Green's function (for electron density): $$ G^< = G^R \Sigma^< G^A $$ Current formula (Landauer-Büttiker type): $$ I = \frac{2q}{h}\int \text{Tr}\left[\Sigma^< G^> - \Sigma^> G^<\right] dE $$ Transmission function: $$ T(E) = \text{Tr}\left[\Gamma_L G^R \Gamma_R G^A\right] $$ where $\Gamma_{L,R} = i(\Sigma_{L,R}^R - \Sigma_{L,R}^A)$ are the broadening matrices. 2.4.4 Wigner Function Formalism Quantum analog of the Boltzmann distribution: $$ f_W(\mathbf{r}, \mathbf{p}, t) = \frac{1}{(\pi\hbar)^3}\int \psi^*\left(\mathbf{r}+\mathbf{s}\right)\psi\left(\mathbf{r}-\mathbf{s}\right) e^{2i\mathbf{p}\cdot\mathbf{s}/\hbar} d^3s $$ 3. Coupled Optoelectronic Modeling For solar cells, LEDs, and lasers, optical and electrical physics must be solved self-consistently. 3.1 Self-Consistent Loop ┌─────────────────────────────────────────────────────────────┐ │ │ │ Maxwell's Equations ──────► Optical field E(r,w) │ │ │ │ │ ▼ │ │ Generation rate: G(r) = alpha*|E|^2/(hbar*w) │ │ │ │ │ ▼ │ │ Drift-Diffusion ──────► Carrier densities n(r), p(r) │ │ │ │ │ ▼ │ │ Update eps(w,n,p) ──────► Free carrier absorption, │ │ │ plasma effects, band filling │ │ │ │ │ └──────────────── iterate ────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘ 3.2 Key Coupling Equations Optical generation rate: $$ G(\mathbf{r}) = \frac{\alpha(\mathbf{r})|\mathbf{E}(\mathbf{r})|^2}{2\hbar\omega} $$ Free carrier absorption (modifies permittivity): $$ \Delta\alpha_{fc} = \sigma_n n + \sigma_p p $$ Band gap narrowing (high injection): $$ \Delta E_g = -A\left(\ln\frac{n}{n_0} + \ln\frac{p}{p_0}\right) $$ 3.3 Laser Rate Equations Carrier density: $$ \frac{dn}{dt} = \frac{\eta I}{qV} - \frac{n}{\tau} - g(n)S $$ Photon density: $$ \frac{dS}{dt} = \Gamma g(n)S - \frac{S}{\tau_p} + \Gamma\beta\frac{n}{\tau} $$ Gain function (linear approximation): $$ g(n) = g_0(n - n_{tr}) $$ 4. Numerical Methods 4.1 Method Comparison | Method | Best For | Key Features | Computational Cost | |--------|----------|--------------|-------------------| | Finite Element (FEM) | Complex geometries | Adaptive meshing, handles interfaces | Medium-High | | Finite Difference (FDM) | Regular grids | Simpler implementation | Low-Medium | | FDTD | Time-domain EM | Explicit time stepping, broadband | High | | Transfer Matrix (TMM) | Multilayer thin films | Analytical for 1D, very fast | Very Low | | RCWA | Periodic structures | Fourier expansion | Medium | | Monte Carlo | High-field transport | Stochastic, parallelizable | Very High | 4.2 Scharfetter-Gummel Discretization Essential for numerical stability in drift-diffusion. For electron current between nodes $i$ and $i+1$: $$ J_{n,i+1/2} = \frac{qD_n}{h}\left[n_i B\left(\frac{\phi_i - \phi_{i+1}}{V_T}\right) - n_{i+1} B\left(\frac{\phi_{i+1} - \phi_i}{V_T}\right)\right] $$ Bernoulli function: $$ B(x) = \frac{x}{e^x - 1} $$ 4.3 FDTD Yee Grid Update equations (1D example): $$ E_x^{n+1}(k) = E_x^n(k) + \frac{\Delta t}{\varepsilon \Delta z}\left[H_y^{n+1/2}(k+1/2) - H_y^{n+1/2}(k-1/2)\right] $$ $$ H_y^{n+1/2}(k+1/2) = H_y^{n-1/2}(k+1/2) + \frac{\Delta t}{\mu \Delta z}\left[E_x^n(k+1) - E_x^n(k)\right] $$ Courant stability condition: $$ \Delta t \leq \frac{\Delta x}{c\sqrt{d}} $$ where $d$ is the number of spatial dimensions. 4.4 Newton-Raphson for Coupled System For the coupled Poisson-continuity system, solve: $$ \begin{pmatrix} \frac{\partial F_\phi}{\partial \phi} & \frac{\partial F_\phi}{\partial n} & \frac{\partial F_\phi}{\partial p} \\ \frac{\partial F_n}{\partial \phi} & \frac{\partial F_n}{\partial n} & \frac{\partial F_n}{\partial p} \\ \frac{\partial F_p}{\partial \phi} & \frac{\partial F_p}{\partial n} & \frac{\partial F_p}{\partial p} \end{pmatrix} \begin{pmatrix} \delta\phi \\ \delta n \\ \delta p \end{pmatrix} = - \begin{pmatrix} F_\phi \\ F_n \\ F_p \end{pmatrix} $$ 5. Multiscale Challenge 5.1 Hierarchy of Scales | Scale | Size | Method | Physics Captured | |-------|------|--------|------------------| | Atomic | 0.1–1 nm | DFT, tight-binding | Band structure, material parameters | | Quantum | 1–100 nm | NEGF, Wigner function | Tunneling, confinement | | Mesoscale | 10–1000 nm | Boltzmann, Monte Carlo | Hot carriers, non-equilibrium | | Device | 100 nm–μm | Drift-diffusion | Classical transport | | Circuit | μm–mm | Compact models (SPICE) | Lumped elements | 5.2 Scale-Bridging Techniques - Parameter extraction: DFT → effective masses, band gaps → drift-diffusion parameters - Quantum corrections to drift-diffusion: $$ n = N_c F_{1/2}\left(\frac{E_F - E_c - \Lambda_n}{k_B T}\right) $$ where $\Lambda_n$ is the quantum potential from density-gradient theory: $$ \Lambda_n = -\frac{\hbar^2}{12m^*}\frac{\nabla^2 \sqrt{n}}{\sqrt{n}} $$ - Machine learning surrogates: Train neural networks on expensive quantum simulations 6. Key Mathematical Difficulties 6.1 Extreme Nonlinearity Carrier concentrations depend exponentially on potential: $$ n = n_i \exp\left(\frac{E_F - E_i}{k_B T}\right) = n_i \exp\left(\frac{q\phi}{k_B T}\right) $$ At room temperature, $k_B T/q \approx 26$ mV, so small potential changes cause huge concentration swings. Solutions: - Gummel iteration (decouple and solve sequentially) - Newton-Raphson with damping - Continuation methods 6.2 Numerical Stiffness - Doping varies by $10^{10}$ or more (from intrinsic to heavily doped) - Depletion regions: nm-scale features in μm-scale devices - Time scales: fs (optical) to ms (thermal) Solutions: - Adaptive mesh refinement - Implicit time stepping - Logarithmic variable transformations: $u = \ln(n/n_i)$ 6.3 High Dimensionality - Full Boltzmann: 7D (3 position + 3 momentum + time) - NEGF: Large matrix inversions per energy point Solutions: - Mode-space approximation - Hierarchical matrix methods - GPU acceleration 6.4 Multiphysics Coupling Interacting effects: - Electro-thermal: $\mu(T)$, $\kappa(T)$, Joule heating - Opto-electrical: Generation, free-carrier absorption - Electro-mechanical: Piezoelectric effects, strain-modified bands 7. Emerging Frontiers 7.1 Topological Effects Berry curvature: $$ \mathbf{\Omega}_n(\mathbf{k}) = i\langle\nabla_\mathbf{k} u_n| \times |\nabla_\mathbf{k} u_n\rangle $$ Anomalous velocity contribution: $$ \dot{\mathbf{r}} = \frac{1}{\hbar}\nabla_\mathbf{k} E_n - \dot{\mathbf{k}} \times \mathbf{\Omega}_n $$ Applications: Topological insulators, quantum Hall effect, valley-selective transport 7.2 2D Materials Graphene (Dirac equation): $$ H = v_F \begin{pmatrix} 0 & p_x - ip_y \\ p_x + ip_y & 0 \end{pmatrix} = v_F \boldsymbol{\sigma} \cdot \mathbf{p} $$ Linear dispersion: $$ E = \pm \hbar v_F |\mathbf{k}| $$ TMDCs (valley physics): $$ H = at(\tau k_x \sigma_x + k_y \sigma_y) + \frac{\Delta}{2}\sigma_z + \lambda\tau\frac{\sigma_z - 1}{2}s_z $$ 7.3 Spintronics Spin drift-diffusion: $$ \frac{\partial \mathbf{s}}{\partial t} = D_s \nabla^2 \mathbf{s} - \frac{\mathbf{s}}{\tau_s} + \mathbf{s} \times \boldsymbol{\omega} $$ Landau-Lifshitz-Gilbert (magnetization dynamics): $$ \frac{d\mathbf{M}}{dt} = -\gamma \mathbf{M} \times \mathbf{H}_{eff} + \frac{\alpha}{M_s}\mathbf{M} \times \frac{d\mathbf{M}}{dt} $$ 7.4 Plasmonics in Semiconductors Nonlocal dielectric response: $$ \varepsilon(\omega, \mathbf{k}) = \varepsilon_\infty - \frac{\omega_p^2}{\omega^2 + i\gamma\omega - \beta^2 k^2} $$ where $\beta^2 = \frac{3}{5}v_F^2$ accounts for spatial dispersion. Quantum corrections (Feibelman parameters): $$ d_\perp(\omega) = \frac{\int z \delta n(z) dz}{\int \delta n(z) dz} $$ Constants: | Constant | Symbol | Value | |----------|--------|-------| | Elementary charge | $q$ | $1.602 \times 10^{-19}$ C | | Planck's constant | $h$ | $6.626 \times 10^{-34}$ J·s | | Reduced Planck's constant | $\hbar$ | $1.055 \times 10^{-34}$ J·s | | Boltzmann constant | $k_B$ | $1.381 \times 10^{-23}$ J/K | | Vacuum permittivity | $\varepsilon_0$ | $8.854 \times 10^{-12}$ F/m | | Electron mass | $m_0$ | $9.109 \times 10^{-31}$ kg | | Speed of light | $c$ | $2.998 \times 10^{8}$ m/s | Material Parameters (Silicon @ 300K): | Parameter | Symbol | Value | |-----------|--------|-------| | Band gap | $E_g$ | 1.12 eV | | Intrinsic carrier concentration | $n_i$ | $1.0 \times 10^{10}$ cm⁻³ | | Electron mobility | $\mu_n$ | 1400 cm²/V·s | | Hole mobility | $\mu_p$ | 450 cm²/V·s | | Relative permittivity | $\varepsilon_r$ | 11.7 | | Electron effective mass | $m_n^*/m_0$ | 0.26 | | Hole effective mass | $m_p^*/m_0$ | 0.39 |
Model metal migration.
Rate model quality using ELO system from pairwise comparisons.
Elo ratings rank model performance through win-loss records against opponents.
Smooth negative part.
Draft emails automatically.
AI composes emails. Professional tone, context-aware.
Embedded carbon represents emissions from material production and manufacturing embodied in products.
ML on embedded systems.
SiGe regions creating compressive stress.
Embedding models encode text into dense vector representations.
E5 and BGE are strong open embedding models. MTEB benchmarks.
Learned concept representations.
AI agents that interact with physical world.
Urgent unplanned repairs.
Capabilities appearing at scale.
Capabilities that appear in large models but not small ones (reasoning multi-step problems).
# Semiconductor Manufacturing Process: Emerging Mathematical Frontiers ## 1. Computational Lithography and Inverse Problems ### 1.1 Inverse Lithography Technology (ILT) The fundamental problem: Given a desired wafer pattern $I_{\text{target}}(x,y)$, find the optimal mask pattern $M(x',y')$. **Core Mathematical Formulation:** $$ \min_{M} \mathcal{L}(M) = \int \left| I(x,y; M) - I_{\text{target}}(x,y) \right|^2 \, dx \, dy + \lambda \mathcal{R}(M) $$ Where: - $I(x,y; M)$ = Aerial image intensity on wafer - $I_{\text{target}}(x,y)$ = Desired pattern intensity - $\mathcal{R}(M)$ = Regularization term (mask manufacturability) - $\lambda$ = Regularization parameter **Key Challenges:** - **Dimensionality:** Full-chip optimization involves $N \sim 10^9$ to $10^{12}$ variables - **Non-convexity:** The forward model $I(x,y; M)$ is highly nonlinear - **Ill-posedness:** Multiple masks can produce similar images **Hopkins Imaging Model:** $$ I(x,y) = \sum_{k} \left| \int \int H_k(f_x, f_y) \cdot \tilde{M}(f_x, f_y) \cdot e^{2\pi i (f_x x + f_y y)} \, df_x \, df_y \right|^2 $$ Where: - $H_k(f_x, f_y)$ = Transmission cross-coefficient (TCC) eigenfunctions - $\tilde{M}(f_x, f_y)$ = Fourier transform of mask transmission ### 1.2 Source-Mask Optimization (SMO) **Bilinear Optimization Problem:** $$ \min_{S, M} \mathcal{L}(S, M) = \| I(S, M) - I_{\text{target}} \|^2 + \alpha \mathcal{R}_S(S) + \beta \mathcal{R}_M(M) $$ Where: - $S$ = Source intensity distribution (illumination pupil) - $M$ = Mask transmission function - $\mathcal{R}_S$, $\mathcal{R}_M$ = Source and mask regularizers **Alternating Minimization Approach:** 1. Fix $S^{(k)}$, solve: $M^{(k+1)} = \arg\min_M \mathcal{L}(S^{(k)}, M)$ 2. Fix $M^{(k+1)}$, solve: $S^{(k+1)} = \arg\min_S \mathcal{L}(S, M^{(k+1)})$ 3. Repeat until convergence ### 1.3 Stochastic Lithography Effects At EUV wavelengths ($\lambda = 13.5$ nm), photon shot noise becomes critical. **Photon Statistics:** $$ N_{\text{photons}} \sim \text{Poisson}\left( \frac{E \cdot A}{h\nu} \right) $$ Where: - $E$ = Exposure dose (mJ/cm²) - $A$ = Pixel area - $h\nu$ = Photon energy ($\approx 92$ eV for EUV) **Line Edge Roughness (LER) Model:** $$ \text{LER} = \sqrt{\sigma_{\text{shot}}^2 + \sigma_{\text{resist}}^2 + \sigma_{\text{acid}}^2} $$ **Stochastic Resist Development (Stochastic PDE):** $$ \frac{\partial h}{\partial t} = -R(M, I, \xi) + \eta(x, y, t) $$ Where: - $h(x,y,t)$ = Resist height - $R$ = Development rate (depends on local deprotection $M$, inhibitor $I$) - $\eta$ = Spatiotemporal noise term - $\xi$ = Quenched disorder from shot noise ## 2. Physics-Informed Machine Learning ### 2.1 Physics-Informed Neural Networks (PINNs) **Standard PINN Loss Function:** $$ \mathcal{L}_{\text{PINN}} = \mathcal{L}_{\text{data}} + \lambda_{\text{PDE}} \mathcal{L}_{\text{PDE}} + \lambda_{\text{BC}} \mathcal{L}_{\text{BC}} $$ Where: - $\mathcal{L}_{\text{data}} = \frac{1}{N_d} \sum_{i=1}^{N_d} |u_\theta(x_i) - u_i^{\text{obs}}|^2$ - $\mathcal{L}_{\text{PDE}} = \frac{1}{N_r} \sum_{j=1}^{N_r} |\mathcal{N}[u_\theta](x_j)|^2$ - $\mathcal{L}_{\text{BC}} = \frac{1}{N_b} \sum_{k=1}^{N_b} |\mathcal{B}[u_\theta](x_k) - g_k|^2$ **Key Mathematical Questions:** - **Approximation Theory:** What function classes can $u_\theta$ represent under PDE constraints? - **Generalization Bounds:** How does enforcing physics improve out-of-distribution performance? ### 2.2 Neural Operators **Fourier Neural Operator (FNO):** $$ v_{l+1}(x) = \sigma \left( W_l v_l(x) + \mathcal{F}^{-1}\left( R_l \cdot \mathcal{F}(v_l) \right)(x) \right) $$ Where: - $\mathcal{F}$, $\mathcal{F}^{-1}$ = Fourier and inverse Fourier transforms - $R_l$ = Learnable spectral weights - $W_l$ = Local linear transformation - $\sigma$ = Activation function **DeepONet Architecture:** $$ G_\theta(u)(y) = \sum_{k=1}^{p} b_k(u; \theta_b) \cdot t_k(y; \theta_t) $$ Where: - $b_k$ = Branch network outputs (encode input function $u$) - $t_k$ = Trunk network outputs (encode query location $y$) ### 2.3 Hybrid Physics-ML Architectures **Residual Learning Framework:** $$ u_{\text{full}}(x) = u_{\text{physics}}(x) + u_{\text{NN}}(x; \theta) $$ Where the neural network learns the "correction" to the physics model: $$ u_{\text{NN}} \approx u_{\text{true}} - u_{\text{physics}} $$ **Constraint: Physics Consistency** $$ \| \mathcal{N}[u_{\text{full}}] \|_2 \leq \epsilon $$ ## 3. High-Dimensional Uncertainty Quantification ### 3.1 Polynomial Chaos Expansions (PCE) **Generalized PCE Representation:** $$ u(\mathbf{x}, \boldsymbol{\xi}) = \sum_{\boldsymbol{\alpha} \in \mathcal{A}} c_{\boldsymbol{\alpha}}(\mathbf{x}) \Psi_{\boldsymbol{\alpha}}(\boldsymbol{\xi}) $$ Where: - $\boldsymbol{\xi} = (\xi_1, \ldots, \xi_d)$ = Random variables (process variations) - $\Psi_{\boldsymbol{\alpha}}$ = Multivariate orthogonal polynomials - $\boldsymbol{\alpha} = (\alpha_1, \ldots, \alpha_d)$ = Multi-index - $\mathcal{A}$ = Index set (truncated) **Orthogonality Condition:** $$ \mathbb{E}[\Psi_{\boldsymbol{\alpha}} \Psi_{\boldsymbol{\beta}}] = \int \Psi_{\boldsymbol{\alpha}}(\boldsymbol{\xi}) \Psi_{\boldsymbol{\beta}}(\boldsymbol{\xi}) \rho(\boldsymbol{\xi}) \, d\boldsymbol{\xi} = \delta_{\boldsymbol{\alpha}\boldsymbol{\beta}} $$ **Curse of Dimensionality:** - Full tensor product: $|\mathcal{A}| = \binom{d + p}{p} \sim \frac{d^p}{p!}$ - Sparse grids: $|\mathcal{A}| \sim \mathcal{O}(d \cdot (\log d)^{d-1})$ ### 3.2 Rare Event Simulation **Importance Sampling:** $$ P(Y > \gamma) = \mathbb{E}_P[\mathbf{1}_{Y > \gamma}] = \mathbb{E}_Q\left[ \mathbf{1}_{Y > \gamma} \cdot \frac{dP}{dQ} \right] $$ **Optimal Tilting Measure:** $$ Q^*(\xi) \propto \mathbf{1}_{Y(\xi) > \gamma} \cdot P(\xi) $$ **Large Deviation Principle:** $$ \lim_{n \to \infty} \frac{1}{n} \log P(S_n / n \in A) = -\inf_{x \in A} I(x) $$ Where $I(x)$ is the rate function (Legendre transform of cumulant generating function). ### 3.3 Distributionally Robust Optimization **Wasserstein Ambiguity Set:** $$ \mathcal{P} = \left\{ Q : W_p(Q, \hat{P}_n) \leq \epsilon \right\} $$ **DRO Formulation:** $$ \min_{x} \sup_{Q \in \mathcal{P}} \mathbb{E}_Q[f(x, \xi)] $$ **Tractable Reformulation (for linear $f$):** $$ \min_{x} \left\{ \frac{1}{n} \sum_{i=1}^{n} f(x, \hat{\xi}_i) + \epsilon \cdot \| \nabla_\xi f \|_* \right\} $$ ## 4. Multiscale Mathematics ### 4.1 Scale Hierarchy in Semiconductor Manufacturing | Scale | Size Range | Phenomena | Mathematical Tools | |-------|------------|-----------|---------------------| | Atomic | 0.1 - 1 nm | Dopant atoms, ALD | DFT, MD, KMC | | Mesoscale | 1 - 10 nm | LER, grain structure | Phase field, SDE | | Feature | 10 - 100 nm | Transistors, vias | Continuum PDEs | | Die | 1 - 10 mm | Pattern loading | Effective medium | | Wafer | 300 mm | Uniformity | Process models | ### 4.2 Homogenization Theory **Two-Scale Expansion:** $$ u^\epsilon(x) = u_0(x, x/\epsilon) + \epsilon u_1(x, x/\epsilon) + \epsilon^2 u_2(x, x/\epsilon) + \ldots $$ Where $y = x/\epsilon$ is the fast variable. **Cell Problem:** $$ -\nabla_y \cdot \left( A(y) \left( \nabla_y \chi^j + \mathbf{e}_j \right) \right) = 0 \quad \text{in } Y $$ **Effective (Homogenized) Coefficient:** $$ A^*_{ij} = \frac{1}{|Y|} \int_Y A(y) \left( \mathbf{e}_i + \nabla_y \chi^i \right) \cdot \left( \mathbf{e}_j + \nabla_y \chi^j \right) \, dy $$ ### 4.3 Phase Field Methods **Allen-Cahn Equation (Interface Evolution):** $$ \frac{\partial \phi}{\partial t} = -M \frac{\delta \mathcal{F}}{\delta \phi} = M \left( \epsilon^2 \nabla^2 \phi - f'(\phi) \right) $$ **Cahn-Hilliard Equation (Conserved Order Parameter):** $$ \frac{\partial c}{\partial t} = \nabla \cdot \left( M \nabla \frac{\delta \mathcal{F}}{\delta c} \right) $$ **Free Energy Functional:** $$ \mathcal{F}[\phi] = \int \left( \frac{\epsilon^2}{2} |\nabla \phi|^2 + f(\phi) \right) dV $$ Where $f(\phi) = \frac{1}{4}(\phi^2 - 1)^2$ (double-well potential). ### 4.4 Kinetic Monte Carlo (KMC) **Master Equation:** $$ \frac{dP(\sigma, t)}{dt} = \sum_{\sigma'} \left[ W(\sigma' \to \sigma) P(\sigma', t) - W(\sigma \to \sigma') P(\sigma, t) \right] $$ **Transition Rates (Arrhenius Form):** $$ W_i = \nu_0 \exp\left( -\frac{E_a^{(i)}}{k_B T} \right) $$ **BKL Algorithm:** 1. Calculate total rate: $R_{\text{tot}} = \sum_i W_i$ 2. Select event $i$ with probability: $p_i = W_i / R_{\text{tot}}$ 3. Advance time: $\Delta t = -\frac{\ln(r)}{R_{\text{tot}}}$, where $r \sim U(0,1)$ ## 5. Optimization at Unprecedented Scale ### 5.1 Bayesian Optimization **Gaussian Process Prior:** $$ f(\mathbf{x}) \sim \mathcal{GP}\left( m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}') \right) $$ **Posterior Mean and Variance:** $$ \mu_n(\mathbf{x}) = \mathbf{k}_n(\mathbf{x})^T \mathbf{K}_n^{-1} \mathbf{y}_n $$ $$ \sigma_n^2(\mathbf{x}) = k(\mathbf{x}, \mathbf{x}) - \mathbf{k}_n(\mathbf{x})^T \mathbf{K}_n^{-1} \mathbf{k}_n(\mathbf{x}) $$ **Expected Improvement (EI):** $$ \text{EI}(\mathbf{x}) = \mathbb{E}\left[ \max(0, f(\mathbf{x}) - f_{\text{best}}) \right] $$ $$ = \sigma_n(\mathbf{x}) \left[ z \Phi(z) + \phi(z) \right], \quad z = \frac{\mu_n(\mathbf{x}) - f_{\text{best}}}{\sigma_n(\mathbf{x})} $$ ### 5.2 High-Dimensional Extensions **Random Embeddings:** $$ f(\mathbf{x}) \approx g(\mathbf{A}\mathbf{x}), \quad \mathbf{A} \in \mathbb{R}^{d_e \times D}, \quad d_e \ll D $$ **Additive Structure:** $$ f(\mathbf{x}) = \sum_{j=1}^{J} f_j(\mathbf{x}_{S_j}) $$ Where $S_j \subset \{1, \ldots, D\}$ are (possibly overlapping) subsets. **Trust Region Bayesian Optimization (TuRBO):** - Maintain local GP models within trust regions - Expand/contract regions based on success/failure - Multiple trust regions for multimodal landscapes ### 5.3 Multi-Objective Optimization **Pareto Optimality:** $\mathbf{x}^*$ is Pareto optimal if $\nexists \mathbf{x}$ such that: $$ f_i(\mathbf{x}) \leq f_i(\mathbf{x}^*) \; \forall i \quad \text{and} \quad f_j(\mathbf{x}) < f_j(\mathbf{x}^*) \; \text{for some } j $$ **Expected Hypervolume Improvement (EHVI):** $$ \text{EHVI}(\mathbf{x}) = \mathbb{E}\left[ \text{HV}(\mathcal{P} \cup \{f(\mathbf{x})\}) - \text{HV}(\mathcal{P}) \right] $$ Where $\mathcal{P}$ is the current Pareto front and HV is the hypervolume indicator. ## 6. Topological and Geometric Methods ### 6.1 Persistent Homology **Simplicial Complex Filtration:** $$ \emptyset = K_0 \subseteq K_1 \subseteq K_2 \subseteq \cdots \subseteq K_n = K $$ **Persistence Pairs:** For each topological feature (connected component, loop, void): - **Birth time:** $b_i$ = scale at which feature appears - **Death time:** $d_i$ = scale at which feature disappears - **Persistence:** $\text{pers}_i = d_i - b_i$ **Persistence Diagram:** $$ \text{Dgm}(K) = \{(b_i, d_i)\}_{i=1}^{N} \subset \mathbb{R}^2 $$ **Stability Theorem:** $$ d_B(\text{Dgm}(K), \text{Dgm}(K')) \leq \| f - f' \|_\infty $$ Where $d_B$ is the bottleneck distance. ### 6.2 Optimal Transport **Monge Problem:** $$ \min_{T: T_\# \mu = \nu} \int c(x, T(x)) \, d\mu(x) $$ **Kantorovich (Relaxed) Formulation:** $$ W_p(\mu, \nu) = \left( \inf_{\gamma \in \Gamma(\mu, \nu)} \int |x - y|^p \, d\gamma(x, y) \right)^{1/p} $$ **Applications in Semiconductor:** - Comparing wafer defect maps - Loss functions for lithography optimization - Generative models for realistic defect distributions ### 6.3 Curvature-Driven Flows **Mean Curvature Flow:** $$ \frac{\partial \Gamma}{\partial t} = \kappa \mathbf{n} $$ Where $\kappa$ is the mean curvature and $\mathbf{n}$ is the unit normal. **Level Set Formulation:** $$ \frac{\partial \phi}{\partial t} + v_n |\nabla \phi| = 0 $$ With $v_n = \kappa = \nabla \cdot \left( \frac{\nabla \phi}{|\nabla \phi|} \right)$. **Surface Diffusion (4th Order):** $$ \frac{\partial \Gamma}{\partial t} = -\Delta_s \kappa \cdot \mathbf{n} $$ Where $\Delta_s$ is the surface Laplacian. ## 7. Control Theory and Real-Time Optimization ### 7.1 Run-to-Run Control **State-Space Model:** $$ \mathbf{x}_{k+1} = \mathbf{A} \mathbf{x}_k + \mathbf{B} \mathbf{u}_k + \mathbf{w}_k $$ $$ \mathbf{y}_k = \mathbf{C} \mathbf{x}_k + \mathbf{v}_k $$ **EWMA (Exponentially Weighted Moving Average) Controller:** $$ \hat{y}_{k+1} = \lambda y_k + (1 - \lambda) \hat{y}_k $$ $$ u_{k+1} = u_k + \frac{T - \hat{y}_{k+1}}{\beta} $$ Where: - $T$ = Target value - $\lambda$ = EWMA weight (0 < λ ≤ 1) - $\beta$ = Process gain ### 7.2 Model Predictive Control (MPC) **Optimization Problem at Each Step:** $$ \min_{\mathbf{u}_{0:N-1}} \sum_{k=0}^{N-1} \left[ \| \mathbf{x}_k - \mathbf{x}_{\text{ref}} \|_Q^2 + \| \mathbf{u}_k \|_R^2 \right] + \| \mathbf{x}_N \|_P^2 $$ Subject to: $$ \mathbf{x}_{k+1} = f(\mathbf{x}_k, \mathbf{u}_k) $$ $$ \mathbf{x}_k \in \mathcal{X}, \quad \mathbf{u}_k \in \mathcal{U} $$ **Robust MPC (Tube-Based):** $$ \mathbf{x}_k = \bar{\mathbf{x}}_k + \mathbf{e}_k, \quad \mathbf{e}_k \in \mathcal{E} $$ Where $\bar{\mathbf{x}}_k$ is the nominal trajectory and $\mathcal{E}$ is the robust positively invariant set. ### 7.3 Kalman Filter **Prediction Step:** $$ \hat{\mathbf{x}}_{k|k-1} = \mathbf{A} \hat{\mathbf{x}}_{k-1|k-1} + \mathbf{B} \mathbf{u}_{k-1} $$ $$ \mathbf{P}_{k|k-1} = \mathbf{A} \mathbf{P}_{k-1|k-1} \mathbf{A}^T + \mathbf{Q} $$ **Update Step:** $$ \mathbf{K}_k = \mathbf{P}_{k|k-1} \mathbf{C}^T \left( \mathbf{C} \mathbf{P}_{k|k-1} \mathbf{C}^T + \mathbf{R} \right)^{-1} $$ $$ \hat{\mathbf{x}}_{k|k} = \hat{\mathbf{x}}_{k|k-1} + \mathbf{K}_k \left( \mathbf{y}_k - \mathbf{C} \hat{\mathbf{x}}_{k|k-1} \right) $$ $$ \mathbf{P}_{k|k} = \left( \mathbf{I} - \mathbf{K}_k \mathbf{C} \right) \mathbf{P}_{k|k-1} $$ ## 8. Metrology Inverse Problems ### 8.1 Scatterometry (Optical CD) **Forward Problem (RCWA):** $$ \frac{\partial}{\partial z} \begin{pmatrix} \mathbf{E}_\perp \\ \mathbf{H}_\perp \end{pmatrix} = \mathbf{M}(z) \begin{pmatrix} \mathbf{E}_\perp \\ \mathbf{H}_\perp \end{pmatrix} $$ **Inverse Problem:** $$ \min_{\mathbf{p}} \| \mathbf{S}(\mathbf{p}) - \mathbf{S}_{\text{meas}} \|^2 + \lambda \mathcal{R}(\mathbf{p}) $$ Where: - $\mathbf{p}$ = Geometric parameters (CD, height, sidewall angle) - $\mathbf{S}$ = Mueller matrix elements - $\mathcal{R}$ = Regularizer (e.g., Tikhonov, total variation) ### 8.2 Phase Retrieval **Measurement Model:** $$ I_m = |\mathcal{A}_m x|^2, \quad m = 1, \ldots, M $$ **Wirtinger Flow:** $$ x^{(k+1)} = x^{(k)} - \frac{\mu_k}{M} \sum_{m=1}^{M} \left( |a_m^H x^{(k)}|^2 - I_m \right) a_m a_m^H x^{(k)} $$ **Uniqueness Conditions:** For $x \in \mathbb{C}^n$, uniqueness (up to global phase) requires $M \geq 4n - 4$ generic measurements. ### 8.3 Information-Theoretic Limits **Cramér-Rao Lower Bound:** $$ \text{Var}(\hat{\theta}_i) \geq \left[ \mathbf{I}(\boldsymbol{\theta})^{-1} \right]_{ii} $$ **Fisher Information Matrix:** $$ [\mathbf{I}(\boldsymbol{\theta})]_{ij} = -\mathbb{E}\left[ \frac{\partial^2 \log p(y | \boldsymbol{\theta})}{\partial \theta_i \partial \theta_j} \right] $$ **Optimal Experimental Design:** $$ \max_{\xi} \Phi(\mathbf{I}(\boldsymbol{\theta}; \xi)) $$ Where $\xi$ = experimental design, $\Phi$ = optimality criterion (D-optimal: $\det(\mathbf{I})$, A-optimal: $\text{tr}(\mathbf{I}^{-1})$) ## 9. Quantum-Classical Boundaries ### 9.1 Non-Equilibrium Green's Functions (NEGF) **Dyson Equation:** $$ G^R(E) = \left[ (E + i\eta)I - H - \Sigma^R(E) \right]^{-1} $$ **Current Calculation:** $$ I = \frac{2e}{h} \int_{-\infty}^{\infty} T(E) \left[ f_L(E) - f_R(E) \right] dE $$ **Transmission Function:** $$ T(E) = \text{Tr}\left[ \Gamma_L G^R \Gamma_R G^A \right] $$ Where $\Gamma_{L,R} = i(\Sigma_{L,R}^R - \Sigma_{L,R}^A)$. ### 9.2 Density Functional Theory (DFT) **Kohn-Sham Equations:** $$ \left[ -\frac{\hbar^2}{2m} \nabla^2 + V_{\text{eff}}(\mathbf{r}) \right] \psi_i(\mathbf{r}) = \epsilon_i \psi_i(\mathbf{r}) $$ **Effective Potential:** $$ V_{\text{eff}}(\mathbf{r}) = V_{\text{ext}}(\mathbf{r}) + V_H(\mathbf{r}) + V_{xc}(\mathbf{r}) $$ Where: - $V_{\text{ext}}$ = External (ionic) potential - $V_H = \int \frac{n(\mathbf{r}')}{|\mathbf{r} - \mathbf{r}'|} d\mathbf{r}'$ = Hartree potential - $V_{xc} = \frac{\delta E_{xc}[n]}{\delta n}$ = Exchange-correlation potential ### 9.3 Semiclassical Approximations **WKB Approximation:** $$ \psi(x) \approx \frac{C}{\sqrt{p(x)}} \exp\left( \pm \frac{i}{\hbar} \int^x p(x') \, dx' \right) $$ Where $p(x) = \sqrt{2m(E - V(x))}$. **Validity Criterion:** $$ \left| \frac{d\lambda}{dx} \right| \ll 1, \quad \text{where } \lambda = \frac{h}{p} $$ **Tunneling Probability (WKB):** $$ T \approx \exp\left( -\frac{2}{\hbar} \int_{x_1}^{x_2} |p(x)| \, dx \right) $$ ## 10. Graph and Combinatorial Methods ### 10.1 Design Rule Checking (DRC) **Constraint Satisfaction Problem (CSP):** $$ \forall (i,j) \in E: \; d(p_i, p_j) \geq d_{\min}(t_i, t_j) $$ Where: - $p_i, p_j$ = Polygon features - $d$ = Distance function (min spacing, enclosure, etc.) - $t_i, t_j$ = Layer/feature types **SAT/SMT Encoding:** $$ \bigwedge_{r \in \text{Rules}} \bigwedge_{(i,j) \in \text{Violations}(r)} \neg(x_i \land x_j) $$ ### 10.2 Graph Neural Networks for Layout **Message Passing Framework:** $$ \mathbf{h}_v^{(k+1)} = \text{UPDATE}^{(k)} \left( \mathbf{h}_v^{(k)}, \text{AGGREGATE}^{(k)} \left( \left\{ \mathbf{h}_u^{(k)} : u \in \mathcal{N}(v) \right\} \right) \right) $$ **Graph Attention:** $$ \alpha_{vu} = \frac{\exp\left( \text{LeakyReLU}(\mathbf{a}^T [\mathbf{W}\mathbf{h}_v \| \mathbf{W}\mathbf{h}_u]) \right)}{\sum_{w \in \mathcal{N}(v)} \exp\left( \text{LeakyReLU}(\mathbf{a}^T [\mathbf{W}\mathbf{h}_v \| \mathbf{W}\mathbf{h}_w]) \right)} $$ $$ \mathbf{h}_v' = \sigma\left( \sum_{u \in \mathcal{N}(v)} \alpha_{vu} \mathbf{W} \mathbf{h}_u \right) $$ ### 10.3 Hypergraph Partitioning **Min-Cut Objective:** $$ \min_{\pi: V \to \{1, \ldots, k\}} \sum_{e \in E} w_e \cdot \mathbf{1}[\text{cut}(e, \pi)] $$ Subject to balance constraints: $$ \left| |\pi^{-1}(i)| - \frac{|V|}{k} \right| \leq \epsilon \frac{|V|}{k} $$ ## Cross-Cutting Mathematical Themes ### Theme 1: Curse of Dimensionality **Tensor Train Decomposition:** $$ \mathcal{T}(i_1, \ldots, i_d) = G_1(i_1) \cdot G_2(i_2) \cdots G_d(i_d) $$ - Storage: $\mathcal{O}(dnr^2)$ vs. $\mathcal{O}(n^d)$ - Where $r$ = TT-rank ### Theme 2: Inverse Problems Framework $$ \mathbf{y} = \mathcal{A}(\mathbf{x}) + \boldsymbol{\eta} $$ **Regularized Solution:** $$ \hat{\mathbf{x}} = \arg\min_{\mathbf{x}} \| \mathbf{y} - \mathcal{A}(\mathbf{x}) \|^2 + \lambda \mathcal{R}(\mathbf{x}) $$ Common regularizers: - Tikhonov: $\mathcal{R}(\mathbf{x}) = \|\mathbf{x}\|_2^2$ - Total Variation: $\mathcal{R}(\mathbf{x}) = \|\nabla \mathbf{x}\|_1$ - Sparsity: $\mathcal{R}(\mathbf{x}) = \|\mathbf{x}\|_1$ ### Theme 3: Certification and Trust **PAC-Bayes Bound:** $$ \mathbb{E}_{h \sim Q}[L(h)] \leq \mathbb{E}_{h \sim Q}[\hat{L}(h)] + \sqrt{\frac{\text{KL}(Q \| P) + \ln(2\sqrt{n}/\delta)}{2n}} $$ **Conformal Prediction:** $$ C(x_{\text{new}}) = \{y : s(x_{\text{new}}, y) \leq \hat{q}\} $$ Where $\hat{q}$ = $(1-\alpha)$-quantile of calibration scores. ## Key Notation Summary | Symbol | Meaning | |--------|---------| | $M(x,y)$ | Mask transmission function | | $I(x,y)$ | Aerial image intensity | | $\mathcal{F}$ | Fourier transform | | $\nabla$ | Gradient operator | | $\nabla^2$, $\Delta$ | Laplacian | | $\mathbb{E}[\cdot]$ | Expectation | | $\mathcal{GP}(m, k)$ | Gaussian process with mean $m$, covariance $k$ | | $\mathcal{N}(\mu, \sigma^2)$ | Normal distribution | | $W_p(\mu, \nu)$ | $p$-Wasserstein distance | | $\text{Tr}(\cdot)$ | Matrix trace | | $\|\cdot\|_p$ | $L^p$ norm | | $\delta_{ij}$ | Kronecker delta | | $\mathbf{1}_{A}$ | Indicator function of set $A$ |
Detect light from active devices.
Efficient Neural Architecture Search reduces computational cost by sharing weights among child models through a single directed acyclic graph.
Encoder-based inversion directly maps images to latent codes through learned networks.
Use encoder to find latent.
Full Transformer with both parts (T5 BART for seq2seq tasks).
Architecture with only encoder blocks (BERT for classification/understanding).
Wearout failures.
Energy efficiency in fabs focuses on reducing power consumption per wafer through equipment optimization process improvements and facility design.
Energy-aware NAS optimizes architectures for minimal power consumption during inference.
Energy-based models assign scalar energy values to input-output configurations and make predictions by minimizing energy functions.
Models that assign energy to configurations.
Improved decoding for masked positions.
Improve MD sampling efficiency.
Ensemble Kalman Filter uses Monte Carlo samples to represent state distributions in high-dimensional systems.
Combine multiple models for better predictions.
Ensembling combines multiple models. Bagging, boosting, stacking.
Enthalpy wheels transfer both sensible and latent heat recovering moisture and thermal energy.
Encourage diverse predictions.