embodied ai,robotics
AI agents that interact with physical world.
9,967 technical terms and definitions
AI agents that interact with physical world.
Answer questions by exploring environment.
Urgent unplanned repairs.
Capabilities appearing at scale.
Capabilities that appear in large models but not small ones (reasoning multi-step problems).
Emergent capabilities appear at scale: reasoning, code, math. Unpredictable from smaller models. More scale = new abilities.
# Semiconductor Manufacturing Process: Emerging Mathematical Frontiers ## 1. Computational Lithography and Inverse Problems ### 1.1 Inverse Lithography Technology (ILT) The fundamental problem: Given a desired wafer pattern $I_{\text{target}}(x,y)$, find the optimal mask pattern $M(x',y')$. **Core Mathematical Formulation:** $$ \min_{M} \mathcal{L}(M) = \int \left| I(x,y; M) - I_{\text{target}}(x,y) \right|^2 \, dx \, dy + \lambda \mathcal{R}(M) $$ Where: - $I(x,y; M)$ = Aerial image intensity on wafer - $I_{\text{target}}(x,y)$ = Desired pattern intensity - $\mathcal{R}(M)$ = Regularization term (mask manufacturability) - $\lambda$ = Regularization parameter **Key Challenges:** - **Dimensionality:** Full-chip optimization involves $N \sim 10^9$ to $10^{12}$ variables - **Non-convexity:** The forward model $I(x,y; M)$ is highly nonlinear - **Ill-posedness:** Multiple masks can produce similar images **Hopkins Imaging Model:** $$ I(x,y) = \sum_{k} \left| \int \int H_k(f_x, f_y) \cdot \tilde{M}(f_x, f_y) \cdot e^{2\pi i (f_x x + f_y y)} \, df_x \, df_y \right|^2 $$ Where: - $H_k(f_x, f_y)$ = Transmission cross-coefficient (TCC) eigenfunctions - $\tilde{M}(f_x, f_y)$ = Fourier transform of mask transmission ### 1.2 Source-Mask Optimization (SMO) **Bilinear Optimization Problem:** $$ \min_{S, M} \mathcal{L}(S, M) = \| I(S, M) - I_{\text{target}} \|^2 + \alpha \mathcal{R}_S(S) + \beta \mathcal{R}_M(M) $$ Where: - $S$ = Source intensity distribution (illumination pupil) - $M$ = Mask transmission function - $\mathcal{R}_S$, $\mathcal{R}_M$ = Source and mask regularizers **Alternating Minimization Approach:** 1. Fix $S^{(k)}$, solve: $M^{(k+1)} = \arg\min_M \mathcal{L}(S^{(k)}, M)$ 2. Fix $M^{(k+1)}$, solve: $S^{(k+1)} = \arg\min_S \mathcal{L}(S, M^{(k+1)})$ 3. Repeat until convergence ### 1.3 Stochastic Lithography Effects At EUV wavelengths ($\lambda = 13.5$ nm), photon shot noise becomes critical. **Photon Statistics:** $$ N_{\text{photons}} \sim \text{Poisson}\left( \frac{E \cdot A}{h\nu} \right) $$ Where: - $E$ = Exposure dose (mJ/cm²) - $A$ = Pixel area - $h\nu$ = Photon energy ($\approx 92$ eV for EUV) **Line Edge Roughness (LER) Model:** $$ \text{LER} = \sqrt{\sigma_{\text{shot}}^2 + \sigma_{\text{resist}}^2 + \sigma_{\text{acid}}^2} $$ **Stochastic Resist Development (Stochastic PDE):** $$ \frac{\partial h}{\partial t} = -R(M, I, \xi) + \eta(x, y, t) $$ Where: - $h(x,y,t)$ = Resist height - $R$ = Development rate (depends on local deprotection $M$, inhibitor $I$) - $\eta$ = Spatiotemporal noise term - $\xi$ = Quenched disorder from shot noise ## 2. Physics-Informed Machine Learning ### 2.1 Physics-Informed Neural Networks (PINNs) **Standard PINN Loss Function:** $$ \mathcal{L}_{\text{PINN}} = \mathcal{L}_{\text{data}} + \lambda_{\text{PDE}} \mathcal{L}_{\text{PDE}} + \lambda_{\text{BC}} \mathcal{L}_{\text{BC}} $$ Where: - $\mathcal{L}_{\text{data}} = \frac{1}{N_d} \sum_{i=1}^{N_d} |u_\theta(x_i) - u_i^{\text{obs}}|^2$ - $\mathcal{L}_{\text{PDE}} = \frac{1}{N_r} \sum_{j=1}^{N_r} |\mathcal{N}[u_\theta](x_j)|^2$ - $\mathcal{L}_{\text{BC}} = \frac{1}{N_b} \sum_{k=1}^{N_b} |\mathcal{B}[u_\theta](x_k) - g_k|^2$ **Key Mathematical Questions:** - **Approximation Theory:** What function classes can $u_\theta$ represent under PDE constraints? - **Generalization Bounds:** How does enforcing physics improve out-of-distribution performance? ### 2.2 Neural Operators **Fourier Neural Operator (FNO):** $$ v_{l+1}(x) = \sigma \left( W_l v_l(x) + \mathcal{F}^{-1}\left( R_l \cdot \mathcal{F}(v_l) \right)(x) \right) $$ Where: - $\mathcal{F}$, $\mathcal{F}^{-1}$ = Fourier and inverse Fourier transforms - $R_l$ = Learnable spectral weights - $W_l$ = Local linear transformation - $\sigma$ = Activation function **DeepONet Architecture:** $$ G_\theta(u)(y) = \sum_{k=1}^{p} b_k(u; \theta_b) \cdot t_k(y; \theta_t) $$ Where: - $b_k$ = Branch network outputs (encode input function $u$) - $t_k$ = Trunk network outputs (encode query location $y$) ### 2.3 Hybrid Physics-ML Architectures **Residual Learning Framework:** $$ u_{\text{full}}(x) = u_{\text{physics}}(x) + u_{\text{NN}}(x; \theta) $$ Where the neural network learns the "correction" to the physics model: $$ u_{\text{NN}} \approx u_{\text{true}} - u_{\text{physics}} $$ **Constraint: Physics Consistency** $$ \| \mathcal{N}[u_{\text{full}}] \|_2 \leq \epsilon $$ ## 3. High-Dimensional Uncertainty Quantification ### 3.1 Polynomial Chaos Expansions (PCE) **Generalized PCE Representation:** $$ u(\mathbf{x}, \boldsymbol{\xi}) = \sum_{\boldsymbol{\alpha} \in \mathcal{A}} c_{\boldsymbol{\alpha}}(\mathbf{x}) \Psi_{\boldsymbol{\alpha}}(\boldsymbol{\xi}) $$ Where: - $\boldsymbol{\xi} = (\xi_1, \ldots, \xi_d)$ = Random variables (process variations) - $\Psi_{\boldsymbol{\alpha}}$ = Multivariate orthogonal polynomials - $\boldsymbol{\alpha} = (\alpha_1, \ldots, \alpha_d)$ = Multi-index - $\mathcal{A}$ = Index set (truncated) **Orthogonality Condition:** $$ \mathbb{E}[\Psi_{\boldsymbol{\alpha}} \Psi_{\boldsymbol{\beta}}] = \int \Psi_{\boldsymbol{\alpha}}(\boldsymbol{\xi}) \Psi_{\boldsymbol{\beta}}(\boldsymbol{\xi}) \rho(\boldsymbol{\xi}) \, d\boldsymbol{\xi} = \delta_{\boldsymbol{\alpha}\boldsymbol{\beta}} $$ **Curse of Dimensionality:** - Full tensor product: $|\mathcal{A}| = \binom{d + p}{p} \sim \frac{d^p}{p!}$ - Sparse grids: $|\mathcal{A}| \sim \mathcal{O}(d \cdot (\log d)^{d-1})$ ### 3.2 Rare Event Simulation **Importance Sampling:** $$ P(Y > \gamma) = \mathbb{E}_P[\mathbf{1}_{Y > \gamma}] = \mathbb{E}_Q\left[ \mathbf{1}_{Y > \gamma} \cdot \frac{dP}{dQ} \right] $$ **Optimal Tilting Measure:** $$ Q^*(\xi) \propto \mathbf{1}_{Y(\xi) > \gamma} \cdot P(\xi) $$ **Large Deviation Principle:** $$ \lim_{n \to \infty} \frac{1}{n} \log P(S_n / n \in A) = -\inf_{x \in A} I(x) $$ Where $I(x)$ is the rate function (Legendre transform of cumulant generating function). ### 3.3 Distributionally Robust Optimization **Wasserstein Ambiguity Set:** $$ \mathcal{P} = \left\{ Q : W_p(Q, \hat{P}_n) \leq \epsilon \right\} $$ **DRO Formulation:** $$ \min_{x} \sup_{Q \in \mathcal{P}} \mathbb{E}_Q[f(x, \xi)] $$ **Tractable Reformulation (for linear $f$):** $$ \min_{x} \left\{ \frac{1}{n} \sum_{i=1}^{n} f(x, \hat{\xi}_i) + \epsilon \cdot \| \nabla_\xi f \|_* \right\} $$ ## 4. Multiscale Mathematics ### 4.1 Scale Hierarchy in Semiconductor Manufacturing | Scale | Size Range | Phenomena | Mathematical Tools | |-------|------------|-----------|---------------------| | Atomic | 0.1 - 1 nm | Dopant atoms, ALD | DFT, MD, KMC | | Mesoscale | 1 - 10 nm | LER, grain structure | Phase field, SDE | | Feature | 10 - 100 nm | Transistors, vias | Continuum PDEs | | Die | 1 - 10 mm | Pattern loading | Effective medium | | Wafer | 300 mm | Uniformity | Process models | ### 4.2 Homogenization Theory **Two-Scale Expansion:** $$ u^\epsilon(x) = u_0(x, x/\epsilon) + \epsilon u_1(x, x/\epsilon) + \epsilon^2 u_2(x, x/\epsilon) + \ldots $$ Where $y = x/\epsilon$ is the fast variable. **Cell Problem:** $$ -\nabla_y \cdot \left( A(y) \left( \nabla_y \chi^j + \mathbf{e}_j \right) \right) = 0 \quad \text{in } Y $$ **Effective (Homogenized) Coefficient:** $$ A^*_{ij} = \frac{1}{|Y|} \int_Y A(y) \left( \mathbf{e}_i + \nabla_y \chi^i \right) \cdot \left( \mathbf{e}_j + \nabla_y \chi^j \right) \, dy $$ ### 4.3 Phase Field Methods **Allen-Cahn Equation (Interface Evolution):** $$ \frac{\partial \phi}{\partial t} = -M \frac{\delta \mathcal{F}}{\delta \phi} = M \left( \epsilon^2 \nabla^2 \phi - f'(\phi) \right) $$ **Cahn-Hilliard Equation (Conserved Order Parameter):** $$ \frac{\partial c}{\partial t} = \nabla \cdot \left( M \nabla \frac{\delta \mathcal{F}}{\delta c} \right) $$ **Free Energy Functional:** $$ \mathcal{F}[\phi] = \int \left( \frac{\epsilon^2}{2} |\nabla \phi|^2 + f(\phi) \right) dV $$ Where $f(\phi) = \frac{1}{4}(\phi^2 - 1)^2$ (double-well potential). ### 4.4 Kinetic Monte Carlo (KMC) **Master Equation:** $$ \frac{dP(\sigma, t)}{dt} = \sum_{\sigma'} \left[ W(\sigma' \to \sigma) P(\sigma', t) - W(\sigma \to \sigma') P(\sigma, t) \right] $$ **Transition Rates (Arrhenius Form):** $$ W_i = \nu_0 \exp\left( -\frac{E_a^{(i)}}{k_B T} \right) $$ **BKL Algorithm:** 1. Calculate total rate: $R_{\text{tot}} = \sum_i W_i$ 2. Select event $i$ with probability: $p_i = W_i / R_{\text{tot}}$ 3. Advance time: $\Delta t = -\frac{\ln(r)}{R_{\text{tot}}}$, where $r \sim U(0,1)$ ## 5. Optimization at Unprecedented Scale ### 5.1 Bayesian Optimization **Gaussian Process Prior:** $$ f(\mathbf{x}) \sim \mathcal{GP}\left( m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}') \right) $$ **Posterior Mean and Variance:** $$ \mu_n(\mathbf{x}) = \mathbf{k}_n(\mathbf{x})^T \mathbf{K}_n^{-1} \mathbf{y}_n $$ $$ \sigma_n^2(\mathbf{x}) = k(\mathbf{x}, \mathbf{x}) - \mathbf{k}_n(\mathbf{x})^T \mathbf{K}_n^{-1} \mathbf{k}_n(\mathbf{x}) $$ **Expected Improvement (EI):** $$ \text{EI}(\mathbf{x}) = \mathbb{E}\left[ \max(0, f(\mathbf{x}) - f_{\text{best}}) \right] $$ $$ = \sigma_n(\mathbf{x}) \left[ z \Phi(z) + \phi(z) \right], \quad z = \frac{\mu_n(\mathbf{x}) - f_{\text{best}}}{\sigma_n(\mathbf{x})} $$ ### 5.2 High-Dimensional Extensions **Random Embeddings:** $$ f(\mathbf{x}) \approx g(\mathbf{A}\mathbf{x}), \quad \mathbf{A} \in \mathbb{R}^{d_e \times D}, \quad d_e \ll D $$ **Additive Structure:** $$ f(\mathbf{x}) = \sum_{j=1}^{J} f_j(\mathbf{x}_{S_j}) $$ Where $S_j \subset \{1, \ldots, D\}$ are (possibly overlapping) subsets. **Trust Region Bayesian Optimization (TuRBO):** - Maintain local GP models within trust regions - Expand/contract regions based on success/failure - Multiple trust regions for multimodal landscapes ### 5.3 Multi-Objective Optimization **Pareto Optimality:** $\mathbf{x}^*$ is Pareto optimal if $\nexists \mathbf{x}$ such that: $$ f_i(\mathbf{x}) \leq f_i(\mathbf{x}^*) \; \forall i \quad \text{and} \quad f_j(\mathbf{x}) < f_j(\mathbf{x}^*) \; \text{for some } j $$ **Expected Hypervolume Improvement (EHVI):** $$ \text{EHVI}(\mathbf{x}) = \mathbb{E}\left[ \text{HV}(\mathcal{P} \cup \{f(\mathbf{x})\}) - \text{HV}(\mathcal{P}) \right] $$ Where $\mathcal{P}$ is the current Pareto front and HV is the hypervolume indicator. ## 6. Topological and Geometric Methods ### 6.1 Persistent Homology **Simplicial Complex Filtration:** $$ \emptyset = K_0 \subseteq K_1 \subseteq K_2 \subseteq \cdots \subseteq K_n = K $$ **Persistence Pairs:** For each topological feature (connected component, loop, void): - **Birth time:** $b_i$ = scale at which feature appears - **Death time:** $d_i$ = scale at which feature disappears - **Persistence:** $\text{pers}_i = d_i - b_i$ **Persistence Diagram:** $$ \text{Dgm}(K) = \{(b_i, d_i)\}_{i=1}^{N} \subset \mathbb{R}^2 $$ **Stability Theorem:** $$ d_B(\text{Dgm}(K), \text{Dgm}(K')) \leq \| f - f' \|_\infty $$ Where $d_B$ is the bottleneck distance. ### 6.2 Optimal Transport **Monge Problem:** $$ \min_{T: T_\# \mu = \nu} \int c(x, T(x)) \, d\mu(x) $$ **Kantorovich (Relaxed) Formulation:** $$ W_p(\mu, \nu) = \left( \inf_{\gamma \in \Gamma(\mu, \nu)} \int |x - y|^p \, d\gamma(x, y) \right)^{1/p} $$ **Applications in Semiconductor:** - Comparing wafer defect maps - Loss functions for lithography optimization - Generative models for realistic defect distributions ### 6.3 Curvature-Driven Flows **Mean Curvature Flow:** $$ \frac{\partial \Gamma}{\partial t} = \kappa \mathbf{n} $$ Where $\kappa$ is the mean curvature and $\mathbf{n}$ is the unit normal. **Level Set Formulation:** $$ \frac{\partial \phi}{\partial t} + v_n |\nabla \phi| = 0 $$ With $v_n = \kappa = \nabla \cdot \left( \frac{\nabla \phi}{|\nabla \phi|} \right)$. **Surface Diffusion (4th Order):** $$ \frac{\partial \Gamma}{\partial t} = -\Delta_s \kappa \cdot \mathbf{n} $$ Where $\Delta_s$ is the surface Laplacian. ## 7. Control Theory and Real-Time Optimization ### 7.1 Run-to-Run Control **State-Space Model:** $$ \mathbf{x}_{k+1} = \mathbf{A} \mathbf{x}_k + \mathbf{B} \mathbf{u}_k + \mathbf{w}_k $$ $$ \mathbf{y}_k = \mathbf{C} \mathbf{x}_k + \mathbf{v}_k $$ **EWMA (Exponentially Weighted Moving Average) Controller:** $$ \hat{y}_{k+1} = \lambda y_k + (1 - \lambda) \hat{y}_k $$ $$ u_{k+1} = u_k + \frac{T - \hat{y}_{k+1}}{\beta} $$ Where: - $T$ = Target value - $\lambda$ = EWMA weight (0 < λ ≤ 1) - $\beta$ = Process gain ### 7.2 Model Predictive Control (MPC) **Optimization Problem at Each Step:** $$ \min_{\mathbf{u}_{0:N-1}} \sum_{k=0}^{N-1} \left[ \| \mathbf{x}_k - \mathbf{x}_{\text{ref}} \|_Q^2 + \| \mathbf{u}_k \|_R^2 \right] + \| \mathbf{x}_N \|_P^2 $$ Subject to: $$ \mathbf{x}_{k+1} = f(\mathbf{x}_k, \mathbf{u}_k) $$ $$ \mathbf{x}_k \in \mathcal{X}, \quad \mathbf{u}_k \in \mathcal{U} $$ **Robust MPC (Tube-Based):** $$ \mathbf{x}_k = \bar{\mathbf{x}}_k + \mathbf{e}_k, \quad \mathbf{e}_k \in \mathcal{E} $$ Where $\bar{\mathbf{x}}_k$ is the nominal trajectory and $\mathcal{E}$ is the robust positively invariant set. ### 7.3 Kalman Filter **Prediction Step:** $$ \hat{\mathbf{x}}_{k|k-1} = \mathbf{A} \hat{\mathbf{x}}_{k-1|k-1} + \mathbf{B} \mathbf{u}_{k-1} $$ $$ \mathbf{P}_{k|k-1} = \mathbf{A} \mathbf{P}_{k-1|k-1} \mathbf{A}^T + \mathbf{Q} $$ **Update Step:** $$ \mathbf{K}_k = \mathbf{P}_{k|k-1} \mathbf{C}^T \left( \mathbf{C} \mathbf{P}_{k|k-1} \mathbf{C}^T + \mathbf{R} \right)^{-1} $$ $$ \hat{\mathbf{x}}_{k|k} = \hat{\mathbf{x}}_{k|k-1} + \mathbf{K}_k \left( \mathbf{y}_k - \mathbf{C} \hat{\mathbf{x}}_{k|k-1} \right) $$ $$ \mathbf{P}_{k|k} = \left( \mathbf{I} - \mathbf{K}_k \mathbf{C} \right) \mathbf{P}_{k|k-1} $$ ## 8. Metrology Inverse Problems ### 8.1 Scatterometry (Optical CD) **Forward Problem (RCWA):** $$ \frac{\partial}{\partial z} \begin{pmatrix} \mathbf{E}_\perp \\ \mathbf{H}_\perp \end{pmatrix} = \mathbf{M}(z) \begin{pmatrix} \mathbf{E}_\perp \\ \mathbf{H}_\perp \end{pmatrix} $$ **Inverse Problem:** $$ \min_{\mathbf{p}} \| \mathbf{S}(\mathbf{p}) - \mathbf{S}_{\text{meas}} \|^2 + \lambda \mathcal{R}(\mathbf{p}) $$ Where: - $\mathbf{p}$ = Geometric parameters (CD, height, sidewall angle) - $\mathbf{S}$ = Mueller matrix elements - $\mathcal{R}$ = Regularizer (e.g., Tikhonov, total variation) ### 8.2 Phase Retrieval **Measurement Model:** $$ I_m = |\mathcal{A}_m x|^2, \quad m = 1, \ldots, M $$ **Wirtinger Flow:** $$ x^{(k+1)} = x^{(k)} - \frac{\mu_k}{M} \sum_{m=1}^{M} \left( |a_m^H x^{(k)}|^2 - I_m \right) a_m a_m^H x^{(k)} $$ **Uniqueness Conditions:** For $x \in \mathbb{C}^n$, uniqueness (up to global phase) requires $M \geq 4n - 4$ generic measurements. ### 8.3 Information-Theoretic Limits **Cramér-Rao Lower Bound:** $$ \text{Var}(\hat{\theta}_i) \geq \left[ \mathbf{I}(\boldsymbol{\theta})^{-1} \right]_{ii} $$ **Fisher Information Matrix:** $$ [\mathbf{I}(\boldsymbol{\theta})]_{ij} = -\mathbb{E}\left[ \frac{\partial^2 \log p(y | \boldsymbol{\theta})}{\partial \theta_i \partial \theta_j} \right] $$ **Optimal Experimental Design:** $$ \max_{\xi} \Phi(\mathbf{I}(\boldsymbol{\theta}; \xi)) $$ Where $\xi$ = experimental design, $\Phi$ = optimality criterion (D-optimal: $\det(\mathbf{I})$, A-optimal: $\text{tr}(\mathbf{I}^{-1})$) ## 9. Quantum-Classical Boundaries ### 9.1 Non-Equilibrium Green's Functions (NEGF) **Dyson Equation:** $$ G^R(E) = \left[ (E + i\eta)I - H - \Sigma^R(E) \right]^{-1} $$ **Current Calculation:** $$ I = \frac{2e}{h} \int_{-\infty}^{\infty} T(E) \left[ f_L(E) - f_R(E) \right] dE $$ **Transmission Function:** $$ T(E) = \text{Tr}\left[ \Gamma_L G^R \Gamma_R G^A \right] $$ Where $\Gamma_{L,R} = i(\Sigma_{L,R}^R - \Sigma_{L,R}^A)$. ### 9.2 Density Functional Theory (DFT) **Kohn-Sham Equations:** $$ \left[ -\frac{\hbar^2}{2m} \nabla^2 + V_{\text{eff}}(\mathbf{r}) \right] \psi_i(\mathbf{r}) = \epsilon_i \psi_i(\mathbf{r}) $$ **Effective Potential:** $$ V_{\text{eff}}(\mathbf{r}) = V_{\text{ext}}(\mathbf{r}) + V_H(\mathbf{r}) + V_{xc}(\mathbf{r}) $$ Where: - $V_{\text{ext}}$ = External (ionic) potential - $V_H = \int \frac{n(\mathbf{r}')}{|\mathbf{r} - \mathbf{r}'|} d\mathbf{r}'$ = Hartree potential - $V_{xc} = \frac{\delta E_{xc}[n]}{\delta n}$ = Exchange-correlation potential ### 9.3 Semiclassical Approximations **WKB Approximation:** $$ \psi(x) \approx \frac{C}{\sqrt{p(x)}} \exp\left( \pm \frac{i}{\hbar} \int^x p(x') \, dx' \right) $$ Where $p(x) = \sqrt{2m(E - V(x))}$. **Validity Criterion:** $$ \left| \frac{d\lambda}{dx} \right| \ll 1, \quad \text{where } \lambda = \frac{h}{p} $$ **Tunneling Probability (WKB):** $$ T \approx \exp\left( -\frac{2}{\hbar} \int_{x_1}^{x_2} |p(x)| \, dx \right) $$ ## 10. Graph and Combinatorial Methods ### 10.1 Design Rule Checking (DRC) **Constraint Satisfaction Problem (CSP):** $$ \forall (i,j) \in E: \; d(p_i, p_j) \geq d_{\min}(t_i, t_j) $$ Where: - $p_i, p_j$ = Polygon features - $d$ = Distance function (min spacing, enclosure, etc.) - $t_i, t_j$ = Layer/feature types **SAT/SMT Encoding:** $$ \bigwedge_{r \in \text{Rules}} \bigwedge_{(i,j) \in \text{Violations}(r)} \neg(x_i \land x_j) $$ ### 10.2 Graph Neural Networks for Layout **Message Passing Framework:** $$ \mathbf{h}_v^{(k+1)} = \text{UPDATE}^{(k)} \left( \mathbf{h}_v^{(k)}, \text{AGGREGATE}^{(k)} \left( \left\{ \mathbf{h}_u^{(k)} : u \in \mathcal{N}(v) \right\} \right) \right) $$ **Graph Attention:** $$ \alpha_{vu} = \frac{\exp\left( \text{LeakyReLU}(\mathbf{a}^T [\mathbf{W}\mathbf{h}_v \| \mathbf{W}\mathbf{h}_u]) \right)}{\sum_{w \in \mathcal{N}(v)} \exp\left( \text{LeakyReLU}(\mathbf{a}^T [\mathbf{W}\mathbf{h}_v \| \mathbf{W}\mathbf{h}_w]) \right)} $$ $$ \mathbf{h}_v' = \sigma\left( \sum_{u \in \mathcal{N}(v)} \alpha_{vu} \mathbf{W} \mathbf{h}_u \right) $$ ### 10.3 Hypergraph Partitioning **Min-Cut Objective:** $$ \min_{\pi: V \to \{1, \ldots, k\}} \sum_{e \in E} w_e \cdot \mathbf{1}[\text{cut}(e, \pi)] $$ Subject to balance constraints: $$ \left| |\pi^{-1}(i)| - \frac{|V|}{k} \right| \leq \epsilon \frac{|V|}{k} $$ ## Cross-Cutting Mathematical Themes ### Theme 1: Curse of Dimensionality **Tensor Train Decomposition:** $$ \mathcal{T}(i_1, \ldots, i_d) = G_1(i_1) \cdot G_2(i_2) \cdots G_d(i_d) $$ - Storage: $\mathcal{O}(dnr^2)$ vs. $\mathcal{O}(n^d)$ - Where $r$ = TT-rank ### Theme 2: Inverse Problems Framework $$ \mathbf{y} = \mathcal{A}(\mathbf{x}) + \boldsymbol{\eta} $$ **Regularized Solution:** $$ \hat{\mathbf{x}} = \arg\min_{\mathbf{x}} \| \mathbf{y} - \mathcal{A}(\mathbf{x}) \|^2 + \lambda \mathcal{R}(\mathbf{x}) $$ Common regularizers: - Tikhonov: $\mathcal{R}(\mathbf{x}) = \|\mathbf{x}\|_2^2$ - Total Variation: $\mathcal{R}(\mathbf{x}) = \|\nabla \mathbf{x}\|_1$ - Sparsity: $\mathcal{R}(\mathbf{x}) = \|\mathbf{x}\|_1$ ### Theme 3: Certification and Trust **PAC-Bayes Bound:** $$ \mathbb{E}_{h \sim Q}[L(h)] \leq \mathbb{E}_{h \sim Q}[\hat{L}(h)] + \sqrt{\frac{\text{KL}(Q \| P) + \ln(2\sqrt{n}/\delta)}{2n}} $$ **Conformal Prediction:** $$ C(x_{\text{new}}) = \{y : s(x_{\text{new}}, y) \leq \hat{q}\} $$ Where $\hat{q}$ = $(1-\alpha)$-quantile of calibration scores. ## Key Notation Summary | Symbol | Meaning | |--------|---------| | $M(x,y)$ | Mask transmission function | | $I(x,y)$ | Aerial image intensity | | $\mathcal{F}$ | Fourier transform | | $\nabla$ | Gradient operator | | $\nabla^2$, $\Delta$ | Laplacian | | $\mathbb{E}[\cdot]$ | Expectation | | $\mathcal{GP}(m, k)$ | Gaussian process with mean $m$, covariance $k$ | | $\mathcal{N}(\mu, \sigma^2)$ | Normal distribution | | $W_p(\mu, \nu)$ | $p$-Wasserstein distance | | $\text{Tr}(\cdot)$ | Matrix trace | | $\|\cdot\|_p$ | $L^p$ norm | | $\delta_{ij}$ | Kronecker delta | | $\mathbf{1}_{A}$ | Indicator function of set $A$ |
New device concepts.
Rigorous simulation of light through masks.
Systems to capture and neutralize hazardous emissions from tools.
Detect light from active devices.
Emissivity characterizes surface's effectiveness at emitting thermal radiation affecting radiative cooling.
Create custom emojis.
Detect emotions from text.
Detect emotions from text voice facial expressions.
Generate with emotional awareness.
Generate compassionate replies.
Generate responses that show understanding of emotions.
Maximize influence on future.
Empowerment is an intrinsic motivation measure quantifying an agent's ability to control its future states through mutual information maximization.
Efficient Neural Architecture Search reduces computational cost by sharing weights among child models through a single directed acyclic graph.
EnCodec combines convolutional encoder-decoder with residual VQ and adversarial training for audio compression.
Encoder-decoder models (T5) encode input then decode output. Good for translation, summarization.
Encoder-based inversion directly maps images to latent codes through learned networks.
Encoder-only models (BERT) see full context. Good for classification, embeddings. Not for generation.
Use encoder to find latent.
Full Transformer with both parts (T5 BART for seq2seq tasks).
Architecture with only encoder blocks (BERT for classification/understanding).
One-hot encode categorical features. Binary columns per category.
Robot blade or paddle that grips wafers.
Slowing scaling trends.
Wearout failures.
Defects at end of implant range.
Special token marking end.
End-to-end ASR directly maps audio to text without separate acoustic and language models.
Evaluate entire pipeline.
Learn entire SLAM pipeline.
# Semiconductor Manufacturing Etch Endpoint Process ## Overview In semiconductor fabrication, **etching** selectively removes material from wafers to create circuit patterns. The **endpoint detection problem** is determining precisely when to stop etching. $$ \text{Endpoint} = f(\text{target layer removal}, \text{underlayer preservation}) $$ ## The Core Challenge ### Why Endpoint Detection Matters - **Under-etching**: Leaves residual material → defects, shorts, incomplete patterns - **Over-etching**: Damages underlying layers → profile degradation, reliability issues At advanced nodes (3nm, 5nm), tolerances are measured in angstroms: $$ \Delta d_{\text{tolerance}} \approx 1-5 \text{ Å} $$ ## Primary Endpoint Detection Techniques ### 1. Optical Emission Spectroscopy (OES) The most widely used technique for plasma (dry) etching. #### Principle During plasma etching, reactive species and etch byproducts emit characteristic photons. The emission intensity $I(\lambda)$ at wavelength $\lambda$ follows: $$ I(\lambda) \propto n_{\text{species}} \cdot \sigma_{\text{emission}}(\lambda) \cdot E_{\text{plasma}} $$ Where: - $n_{\text{species}}$ = density of emitting species - $\sigma_{\text{emission}}$ = emission cross-section - $E_{\text{plasma}}$ = plasma excitation energy #### Key Wavelengths for Common Etch Chemistries | Species | Wavelength (nm) | Application | |---------|-----------------|-------------| | CO | 483.5, 519.8 | SiO₂ etch indicator | | F | 685.6, 703.7 | Fluorine radical monitoring | | Si | 288.2 | Silicon exposure detection | | Cl | 837.6 | Chlorine-based etch | | O | 777.4 | Oxygen monitoring | #### Signal Processing The endpoint is typically detected using derivative methods: $$ \frac{dI}{dt} = \lim_{\Delta t \to 0} \frac{I(t + \Delta t) - I(t)}{\Delta t} $$ Endpoint trigger condition: $$ \left| \frac{dI}{dt} \right| > \theta_{\text{threshold}} $$ #### Advantages - Non-contact, non-destructive measurement - Real-time monitoring capability - Works across entire wafer surface #### Limitations - Weak signals for very thin films ($d < 10$ nm) - Pattern density affects signal intensity - Requires optical access to plasma chamber ### 2. Laser Interferometry #### Principle A monochromatic laser beam reflects from the wafer surface. As etching progresses, film thickness changes alter the interference pattern. The reflected intensity follows: $$ I_{\text{reflected}} = I_1 + I_2 + 2\sqrt{I_1 I_2} \cos\left(\frac{4\pi n d}{\lambda} + \phi_0\right) $$ Where: - $I_1, I_2$ = intensities from top surface and interface reflections - $n$ = refractive index of the film - $d$ = film thickness - $\lambda$ = laser wavelength - $\phi_0$ = initial phase offset #### Fringe Analysis Each complete oscillation (fringe) corresponds to: $$ \Delta d_{\text{per fringe}} = \frac{\lambda}{2n} $$ **Example calculation** for SiO₂ with HeNe laser ($\lambda = 632.8$ nm): $$ \Delta d = \frac{632.8 \text{ nm}}{2 \times 1.46} \approx 216.7 \text{ nm/fringe} $$ #### Etch Rate Determination $$ \text{Etch Rate} = \frac{\lambda}{2n} \cdot \frac{1}{T_{\text{fringe}}} $$ Where $T_{\text{fringe}}$ is the period of one complete oscillation. #### Advantages - Quantitative thickness measurement - Real-time etch rate monitoring - High precision for transparent films #### Limitations - Requires optically transparent or semi-transparent films - Pattern density complicates signal interpretation - Multiple interfaces create complex interference ### 3. Residual Gas Analysis (Mass Spectrometry) #### Principle Analyze exhaust gas composition. Different materials produce different volatile byproducts: $$ \text{Material}_{\text{solid}} + \text{Etchant}_{\text{gas}} \rightarrow \text{Byproduct}_{\text{volatile}} $$ #### Example Reactions **Silicon etching with fluorine:** $$ \text{Si} + 4\text{F} \rightarrow \text{SiF}_4 \uparrow $$ **Oxide etching with fluorine:** $$ \text{SiO}_2 + 4\text{F} \rightarrow \text{SiF}_4 + \text{O}_2 \uparrow $$ **Aluminum etching with chlorine:** $$ \text{Al} + 3\text{Cl} \rightarrow \text{AlCl}_3 \uparrow $$ #### Mass-to-Charge Ratios | Byproduct | m/z | Parent Material | |-----------|-----|-----------------| | SiF₄ | 104 | Si, SiO₂ | | SiCl₄ | 170 | Si | | AlCl₃ | 133 | Al | | CO₂ | 44 | SiO₂, organics | | TiCl₄ | 190 | Ti, TiN | #### Advantages - Works regardless of optical properties - Chemically specific detection - Can detect multiple transitions #### Limitations - Response time limited by gas transport: $\tau \approx 0.5-2$ s - Requires differential pumping - Sensitivity issues at low etch rates ### 4. RF Impedance Monitoring #### Principle Plasma impedance changes when material composition changes. The plasma can be modeled as: $$ Z_{\text{plasma}} = R_{\text{plasma}} + j\omega L_{\text{plasma}} + \frac{1}{j\omega C_{\text{sheath}}} $$ #### Monitored Parameters - **Voltage**: $V_{\text{RF}}$ - **Current**: $I_{\text{RF}}$ - **Phase**: $\phi = \arctan\left(\frac{X}{R}\right)$ - **Impedance magnitude**: $|Z| = \sqrt{R^2 + X^2}$ #### Advantages - Uses existing RF infrastructure - No additional optical access needed - Sensitive to plasma chemistry changes #### Limitations - Subtle signal changes - Affected by many process parameters - Requires sophisticated signal processing ## Advanced Considerations ### Aspect Ratio Dependent Etching (ARDE) High aspect ratio (HAR) features etch slower due to transport limitations: $$ \text{Etch Rate}(AR) = \text{Etch Rate}_0 \cdot \exp\left(-\frac{AR}{AR_c}\right) $$ Where: - $AR = \frac{\text{depth}}{\text{width}}$ = aspect ratio - $AR_c$ = characteristic aspect ratio (process-dependent) **Consequence**: Dense arrays reach endpoint before isolated features. ### Pattern Loading Effect Local etch rate depends on pattern density $\rho$: $$ ER(\rho) = ER_{\text{open}} \cdot \frac{1}{1 + K \cdot \rho} $$ Where $K$ is the loading coefficient. ### Selectivity The selectivity $S$ between materials A and B: $$ S = \frac{ER_A}{ER_B} $$ **Higher selectivity allows more overetch margin:** $$ t_{\text{overetch,max}} = \frac{d_{\text{underlayer}} \cdot S}{ER_A} $$ ## Practical Endpoint Strategy ### Overetch Calculation Total etch time: $$ t_{\text{total}} = t_{\text{endpoint}} + t_{\text{overetch}} $$ Overetch percentage: $$ \text{Overetch \%} = \frac{t_{\text{overetch}}}{t_{\text{main}}} \times 100 $$ Typical values: 20-50% depending on uniformity and selectivity. ### Statistical Process Control Endpoint time follows a distribution: $$ t_{\text{EP}} \sim \mathcal{N}(\mu_{\text{EP}}, \sigma_{\text{EP}}^2) $$ Control limits: $$ \text{UCL} = \mu + 3\sigma, \quad \text{LCL} = \mu - 3\sigma $$ ## Multi-Sensor Fusion Modern systems combine multiple techniques: $$ \text{Endpoint}_{\text{final}} = \sum_{i} w_i \cdot \text{Signal}_i $$ Where weights $w_i$ are optimized by machine learning algorithms. ### Sensor Contributions | Sensor | Primary Detection | |--------|-------------------| | OES | Bulk composition change | | Interferometry | Precise thickness | | RF monitoring | Plasma state shifts | | Full-wafer imaging | Spatial uniformity | ## Key Equations Summary ### Interferometry $$ \boxed{\Delta d = \frac{\lambda}{2n}} $$ ### OES Endpoint Trigger $$ \boxed{\left| \frac{dI}{dt} \right| > \theta} $$ ### Selectivity $$ \boxed{S = \frac{ER_{\text{target}}}{ER_{\text{stop}}}} $$ ### ARDE Model $$ \boxed{ER(AR) = ER_0 \cdot e^{-AR/AR_c}} $$ ## Conclusion Etch endpoint detection is critical for: 1. **Yield**: Complete clearing without damage 2. **Uniformity**: Consistent results across wafer 3. **Reliability**: Device performance and longevity The combination of OES, interferometry, mass spectrometry, and RF monitoring—enhanced by machine learning—enables the precision required for sub-10nm semiconductor manufacturing.
Stop based on real-time detection.
Elemental analysis in SEM/TEM.
Energy efficiency in fabs focuses on reducing power consumption per wafer through equipment optimization process improvements and facility design.
Capture waste heat for reuse in other processes.
Energy-aware NAS optimizes architectures for minimal power consumption during inference.
Energy-based models assign scalar energy values to input-output configurations and make predictions by minimizing energy functions.
Models that assign energy to configurations.
Metric combining energy and speed.
Extended metric including area.
Generate interesting replies.
Control design modifications.
Document describing change.