Semiconductor Manufacturing Process: Emerging Mathematical Frontiers
1. Computational Lithography and Inverse Problems
1.1 Inverse Lithography Technology (ILT)
The fundamental problem: Given a desired wafer pattern $I_{\text{target}}(x,y)$, find the optimal mask pattern $M(x',y')$.
Core Mathematical Formulation:
$$
\min_{M} \mathcal{L}(M) = \int \left| I(x,y; M) - I_{\text{target}}(x,y) \right|^2 \, dx \, dy + \lambda \mathcal{R}(M)
$$
Where:
- $I(x,y; M)$ = Aerial image intensity on wafer
- $I_{\text{target}}(x,y)$ = Desired pattern intensity
- $\mathcal{R}(M)$ = Regularization term (mask manufacturability)
- $\lambda$ = Regularization parameter
Key Challenges:
- Dimensionality: Full-chip optimization involves $N \sim 10^9$ to $10^{12}$ variables
- Non-convexity: The forward model $I(x,y; M)$ is highly nonlinear
- Ill-posedness: Multiple masks can produce similar images
Hopkins Imaging Model:
$$
I(x,y) = \sum_{k} \left| \int \int H_k(f_x, f_y) \cdot \tilde{M}(f_x, f_y) \cdot e^{2\pi i (f_x x + f_y y)} \, df_x \, df_y \right|^2
$$
Where:
- $H_k(f_x, f_y)$ = Transmission cross-coefficient (TCC) eigenfunctions
- $\tilde{M}(f_x, f_y)$ = Fourier transform of mask transmission
1.2 Source-Mask Optimization (SMO)
Bilinear Optimization Problem:
$$
\min_{S, M} \mathcal{L}(S, M) = \| I(S, M) - I_{\text{target}} \|^2 + \alpha \mathcal{R}_S(S) + \beta \mathcal{R}_M(M)
$$
Where:
- $S$ = Source intensity distribution (illumination pupil)
- $M$ = Mask transmission function
- $\mathcal{R}_S$, $\mathcal{R}_M$ = Source and mask regularizers
Alternating Minimization Approach:
1. Fix $S^{(k)}$, solve: $M^{(k+1)} = \arg\min_M \mathcal{L}(S^{(k)}, M)$
2. Fix $M^{(k+1)}$, solve: $S^{(k+1)} = \arg\min_S \mathcal{L}(S, M^{(k+1)})$
3. Repeat until convergence
1.3 Stochastic Lithography Effects
At EUV wavelengths ($\lambda = 13.5$ nm), photon shot noise becomes critical.
Photon Statistics:
$$
N_{\text{photons}} \sim \text{Poisson}\left( \frac{E \cdot A}{h
u} \right)
$$
Where:
- $E$ = Exposure dose (mJ/cm²)
- $A$ = Pixel area
- $h
u$ = Photon energy ($\approx 92$ eV for EUV)
Line Edge Roughness (LER) Model:
$$
\text{LER} = \sqrt{\sigma_{\text{shot}}^2 + \sigma_{\text{resist}}^2 + \sigma_{\text{acid}}^2}
$$
Stochastic Resist Development (Stochastic PDE):
$$
\frac{\partial h}{\partial t} = -R(M, I, \xi) + \eta(x, y, t)
$$
Where:
- $h(x,y,t)$ = Resist height
- $R$ = Development rate (depends on local deprotection $M$, inhibitor $I$)
- $\eta$ = Spatiotemporal noise term
- $\xi$ = Quenched disorder from shot noise
2. Physics-Informed Machine Learning
2.1 Physics-Informed Neural Networks (PINNs)
Standard PINN Loss Function:
$$
\mathcal{L}_{\text{PINN}} = \mathcal{L}_{\text{data}} + \lambda_{\text{PDE}} \mathcal{L}_{\text{PDE}} + \lambda_{\text{BC}} \mathcal{L}_{\text{BC}}
$$
Where:
- $\mathcal{L}_{\text{data}} = \frac{1}{N_d} \sum_{i=1}^{N_d} |u_\theta(x_i) - u_i^{\text{obs}}|^2$
- $\mathcal{L}_{\text{PDE}} = \frac{1}{N_r} \sum_{j=1}^{N_r} |\mathcal{N}[u_\theta](x_j)|^2$
- $\mathcal{L}_{\text{BC}} = \frac{1}{N_b} \sum_{k=1}^{N_b} |\mathcal{B}[u_\theta](x_k) - g_k|^2$
Key Mathematical Questions:
- Approximation Theory: What function classes can $u_\theta$ represent under PDE constraints?
- Generalization Bounds: How does enforcing physics improve out-of-distribution performance?
2.2 Neural Operators
Fourier Neural Operator (FNO):
$$
v_{l+1}(x) = \sigma \left( W_l v_l(x) + \mathcal{F}^{-1}\left( R_l \cdot \mathcal{F}(v_l) \right)(x) \right)
$$
Where:
- $\mathcal{F}$, $\mathcal{F}^{-1}$ = Fourier and inverse Fourier transforms
- $R_l$ = Learnable spectral weights
- $W_l$ = Local linear transformation
- $\sigma$ = Activation function
DeepONet Architecture:
$$
G_\theta(u)(y) = \sum_{k=1}^{p} b_k(u; \theta_b) \cdot t_k(y; \theta_t)
$$
Where:
- $b_k$ = Branch network outputs (encode input function $u$)
- $t_k$ = Trunk network outputs (encode query location $y$)
2.3 Hybrid Physics-ML Architectures
Residual Learning Framework:
$$
u_{\text{full}}(x) = u_{\text{physics}}(x) + u_{\text{NN}}(x; \theta)
$$
Where the neural network learns the "correction" to the physics model:
$$
u_{\text{NN}} \approx u_{\text{true}} - u_{\text{physics}}
$$
Constraint: Physics Consistency
$$
\| \mathcal{N}[u_{\text{full}}] \|_2 \leq \epsilon
$$
3. High-Dimensional Uncertainty Quantification
3.1 Polynomial Chaos Expansions (PCE)
Generalized PCE Representation:
$$
u(\mathbf{x}, \boldsymbol{\xi}) = \sum_{\boldsymbol{\alpha} \in \mathcal{A}} c_{\boldsymbol{\alpha}}(\mathbf{x}) \Psi_{\boldsymbol{\alpha}}(\boldsymbol{\xi})
$$
Where:
- $\boldsymbol{\xi} = (\xi_1, \ldots, \xi_d)$ = Random variables (process variations)
- $\Psi_{\boldsymbol{\alpha}}$ = Multivariate orthogonal polynomials
- $\boldsymbol{\alpha} = (\alpha_1, \ldots, \alpha_d)$ = Multi-index
- $\mathcal{A}$ = Index set (truncated)
Orthogonality Condition:
$$
\mathbb{E}[\Psi_{\boldsymbol{\alpha}} \Psi_{\boldsymbol{\beta}}] = \int \Psi_{\boldsymbol{\alpha}}(\boldsymbol{\xi}) \Psi_{\boldsymbol{\beta}}(\boldsymbol{\xi}) \rho(\boldsymbol{\xi}) \, d\boldsymbol{\xi} = \delta_{\boldsymbol{\alpha}\boldsymbol{\beta}}
$$
Curse of Dimensionality:
- Full tensor product: $|\mathcal{A}| = \binom{d + p}{p} \sim \frac{d^p}{p!}$
- Sparse grids: $|\mathcal{A}| \sim \mathcal{O}(d \cdot (\log d)^{d-1})$
3.2 Rare Event Simulation
Importance Sampling:
$$
P(Y > \gamma) = \mathbb{E}_P[\mathbf{1}_{Y > \gamma}] = \mathbb{E}_Q\left[ \mathbf{1}_{Y > \gamma} \cdot \frac{dP}{dQ} \right]
$$
Optimal Tilting Measure:
$$
Q^*(\xi) \propto \mathbf{1}_{Y(\xi) > \gamma} \cdot P(\xi)
$$
Large Deviation Principle:
$$
\lim_{n \to \infty} \frac{1}{n} \log P(S_n / n \in A) = -\inf_{x \in A} I(x)
$$
Where $I(x)$ is the rate function (Legendre transform of cumulant generating function).
3.3 Distributionally Robust Optimization
Wasserstein Ambiguity Set:
$$
\mathcal{P} = \left\{ Q : W_p(Q, \hat{P}_n) \leq \epsilon \right\}
$$
DRO Formulation:
$$
\min_{x} \sup_{Q \in \mathcal{P}} \mathbb{E}_Q[f(x, \xi)]
$$
Tractable Reformulation (for linear $f$):
$$
\min_{x} \left\{ \frac{1}{n} \sum_{i=1}^{n} f(x, \hat{\xi}_i) + \epsilon \cdot \|
abla_\xi f \|_* \right\}
$$
4. Multiscale Mathematics
4.1 Scale Hierarchy in Semiconductor Manufacturing
| Scale | Size Range | Phenomena | Mathematical Tools |
|-------|------------|-----------|---------------------|
| Atomic | 0.1 - 1 nm | Dopant atoms, ALD | DFT, MD, KMC |
| Mesoscale | 1 - 10 nm | LER, grain structure | Phase field, SDE |
| Feature | 10 - 100 nm | Transistors, vias | Continuum PDEs |
| Die | 1 - 10 mm | Pattern loading | Effective medium |
| Wafer | 300 mm | Uniformity | Process models |
4.2 Homogenization Theory
Two-Scale Expansion:
$$
u^\epsilon(x) = u_0(x, x/\epsilon) + \epsilon u_1(x, x/\epsilon) + \epsilon^2 u_2(x, x/\epsilon) + \ldots
$$
Where $y = x/\epsilon$ is the fast variable.
Cell Problem:
$$
-
abla_y \cdot \left( A(y) \left(
abla_y \chi^j + \mathbf{e}_j \right) \right) = 0 \quad \text{in } Y
$$
Effective (Homogenized) Coefficient:
$$
A^*_{ij} = \frac{1}{|Y|} \int_Y A(y) \left( \mathbf{e}_i +
abla_y \chi^i \right) \cdot \left( \mathbf{e}_j +
abla_y \chi^j \right) \, dy
$$
4.3 Phase Field Methods
Allen-Cahn Equation (Interface Evolution):
$$
\frac{\partial \phi}{\partial t} = -M \frac{\delta \mathcal{F}}{\delta \phi} = M \left( \epsilon^2
abla^2 \phi - f'(\phi) \right)
$$
Cahn-Hilliard Equation (Conserved Order Parameter):
$$
\frac{\partial c}{\partial t} =
abla \cdot \left( M
abla \frac{\delta \mathcal{F}}{\delta c} \right)
$$
Free Energy Functional:
$$
\mathcal{F}[\phi] = \int \left( \frac{\epsilon^2}{2} |
abla \phi|^2 + f(\phi) \right) dV
$$
Where $f(\phi) = \frac{1}{4}(\phi^2 - 1)^2$ (double-well potential).
4.4 Kinetic Monte Carlo (KMC)
Master Equation:
$$
\frac{dP(\sigma, t)}{dt} = \sum_{\sigma'} \left[ W(\sigma' \to \sigma) P(\sigma', t) - W(\sigma \to \sigma') P(\sigma, t) \right]
$$
Transition Rates (Arrhenius Form):
$$
W_i =
u_0 \exp\left( -\frac{E_a^{(i)}}{k_B T} \right)
$$
BKL Algorithm:
1. Calculate total rate: $R_{\text{tot}} = \sum_i W_i$
2. Select event $i$ with probability: $p_i = W_i / R_{\text{tot}}$
3. Advance time: $\Delta t = -\frac{\ln(r)}{R_{\text{tot}}}$, where $r \sim U(0,1)$
5. Optimization at Unprecedented Scale
5.1 Bayesian Optimization
Gaussian Process Prior:
$$
f(\mathbf{x}) \sim \mathcal{GP}\left( m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}') \right)
$$
Posterior Mean and Variance:
$$
\mu_n(\mathbf{x}) = \mathbf{k}_n(\mathbf{x})^T \mathbf{K}_n^{-1} \mathbf{y}_n
$$
$$
\sigma_n^2(\mathbf{x}) = k(\mathbf{x}, \mathbf{x}) - \mathbf{k}_n(\mathbf{x})^T \mathbf{K}_n^{-1} \mathbf{k}_n(\mathbf{x})
$$
Expected Improvement (EI):
$$
\text{EI}(\mathbf{x}) = \mathbb{E}\left[ \max(0, f(\mathbf{x}) - f_{\text{best}}) \right]
$$
$$
= \sigma_n(\mathbf{x}) \left[ z \Phi(z) + \phi(z) \right], \quad z = \frac{\mu_n(\mathbf{x}) - f_{\text{best}}}{\sigma_n(\mathbf{x})}
$$
5.2 High-Dimensional Extensions
Random Embeddings:
$$
f(\mathbf{x}) \approx g(\mathbf{A}\mathbf{x}), \quad \mathbf{A} \in \mathbb{R}^{d_e \times D}, \quad d_e \ll D
$$
Additive Structure:
$$
f(\mathbf{x}) = \sum_{j=1}^{J} f_j(\mathbf{x}_{S_j})
$$
Where $S_j \subset \{1, \ldots, D\}$ are (possibly overlapping) subsets.
Trust Region Bayesian Optimization (TuRBO):
- Maintain local GP models within trust regions
- Expand/contract regions based on success/failure
- Multiple trust regions for multimodal landscapes
5.3 Multi-Objective Optimization
Pareto Optimality:
$\mathbf{x}^*$ is Pareto optimal if $
exists \mathbf{x}$ such that:
$$
f_i(\mathbf{x}) \leq f_i(\mathbf{x}^) \; \forall i \quad \text{and} \quad f_j(\mathbf{x}) < f_j(\mathbf{x}^) \; \text{for some } j
$$
Expected Hypervolume Improvement (EHVI):
$$
\text{EHVI}(\mathbf{x}) = \mathbb{E}\left[ \text{HV}(\mathcal{P} \cup \{f(\mathbf{x})\}) - \text{HV}(\mathcal{P}) \right]
$$
Where $\mathcal{P}$ is the current Pareto front and HV is the hypervolume indicator.
6. Topological and Geometric Methods
6.1 Persistent Homology
Simplicial Complex Filtration:
$$
\emptyset = K_0 \subseteq K_1 \subseteq K_2 \subseteq \cdots \subseteq K_n = K
$$
Persistence Pairs:
For each topological feature (connected component, loop, void):
- Birth time: $b_i$ = scale at which feature appears
- Death time: $d_i$ = scale at which feature disappears
- Persistence: $\text{pers}_i = d_i - b_i$
Persistence Diagram:
$$
\text{Dgm}(K) = \{(b_i, d_i)\}_{i=1}^{N} \subset \mathbb{R}^2
$$
Stability Theorem:
$$
d_B(\text{Dgm}(K), \text{Dgm}(K')) \leq \| f - f' \|_\infty
$$
Where $d_B$ is the bottleneck distance.
6.2 Optimal Transport
Monge Problem:
$$
\min_{T: T_\# \mu =
u} \int c(x, T(x)) \, d\mu(x)
$$
Kantorovich (Relaxed) Formulation:
$$
W_p(\mu,
u) = \left( \inf_{\gamma \in \Gamma(\mu,
u)} \int |x - y|^p \, d\gamma(x, y) \right)^{1/p}
$$
Applications in Semiconductor:
- Comparing wafer defect maps
- Loss functions for lithography optimization
- Generative models for realistic defect distributions
6.3 Curvature-Driven Flows
Mean Curvature Flow:
$$
\frac{\partial \Gamma}{\partial t} = \kappa \mathbf{n}
$$
Where $\kappa$ is the mean curvature and $\mathbf{n}$ is the unit normal.
Level Set Formulation:
$$
\frac{\partial \phi}{\partial t} + v_n |
abla \phi| = 0
$$
With $v_n = \kappa =
abla \cdot \left( \frac{
abla \phi}{|
abla \phi|} \right)$.
Surface Diffusion (4th Order):
$$
\frac{\partial \Gamma}{\partial t} = -\Delta_s \kappa \cdot \mathbf{n}
$$
Where $\Delta_s$ is the surface Laplacian.
7. Control Theory and Real-Time Optimization
7.1 Run-to-Run Control
State-Space Model:
$$
\mathbf{x}_{k+1} = \mathbf{A} \mathbf{x}_k + \mathbf{B} \mathbf{u}_k + \mathbf{w}_k
$$
$$
\mathbf{y}_k = \mathbf{C} \mathbf{x}_k + \mathbf{v}_k
$$
EWMA (Exponentially Weighted Moving Average) Controller:
$$
\hat{y}_{k+1} = \lambda y_k + (1 - \lambda) \hat{y}_k
$$
$$
u_{k+1} = u_k + \frac{T - \hat{y}_{k+1}}{\beta}
$$
Where:
- $T$ = Target value
- $\lambda$ = EWMA weight (0 < λ ≤ 1)
- $\beta$ = Process gain
7.2 Model Predictive Control (MPC)
Optimization Problem at Each Step:
$$
\min_{\mathbf{u}_{0:N-1}} \sum_{k=0}^{N-1} \left[ \| \mathbf{x}_k - \mathbf{x}_{\text{ref}} \|_Q^2 + \| \mathbf{u}_k \|_R^2 \right] + \| \mathbf{x}_N \|_P^2
$$
Subject to:
$$
\mathbf{x}_{k+1} = f(\mathbf{x}_k, \mathbf{u}_k)
$$
$$
\mathbf{x}_k \in \mathcal{X}, \quad \mathbf{u}_k \in \mathcal{U}
$$
Robust MPC (Tube-Based):
$$
\mathbf{x}_k = \bar{\mathbf{x}}_k + \mathbf{e}_k, \quad \mathbf{e}_k \in \mathcal{E}
$$
Where $\bar{\mathbf{x}}_k$ is the nominal trajectory and $\mathcal{E}$ is the robust positively invariant set.
7.3 Kalman Filter
Prediction Step:
$$
\hat{\mathbf{x}}_{k|k-1} = \mathbf{A} \hat{\mathbf{x}}_{k-1|k-1} + \mathbf{B} \mathbf{u}_{k-1}
$$
$$
\mathbf{P}_{k|k-1} = \mathbf{A} \mathbf{P}_{k-1|k-1} \mathbf{A}^T + \mathbf{Q}
$$
Update Step:
$$
\mathbf{K}_k = \mathbf{P}_{k|k-1} \mathbf{C}^T \left( \mathbf{C} \mathbf{P}_{k|k-1} \mathbf{C}^T + \mathbf{R} \right)^{-1}
$$
$$
\hat{\mathbf{x}}_{k|k} = \hat{\mathbf{x}}_{k|k-1} + \mathbf{K}_k \left( \mathbf{y}_k - \mathbf{C} \hat{\mathbf{x}}_{k|k-1} \right)
$$
$$
\mathbf{P}_{k|k} = \left( \mathbf{I} - \mathbf{K}_k \mathbf{C} \right) \mathbf{P}_{k|k-1}
$$
8. Metrology Inverse Problems
8.1 Scatterometry (Optical CD)
Forward Problem (RCWA):
$$
\frac{\partial}{\partial z} \begin{pmatrix} \mathbf{E}_\perp \\ \mathbf{H}_\perp \end{pmatrix} = \mathbf{M}(z) \begin{pmatrix} \mathbf{E}_\perp \\ \mathbf{H}_\perp \end{pmatrix}
$$
Inverse Problem:
$$
\min_{\mathbf{p}} \| \mathbf{S}(\mathbf{p}) - \mathbf{S}_{\text{meas}} \|^2 + \lambda \mathcal{R}(\mathbf{p})
$$
Where:
- $\mathbf{p}$ = Geometric parameters (CD, height, sidewall angle)
- $\mathbf{S}$ = Mueller matrix elements
- $\mathcal{R}$ = Regularizer (e.g., Tikhonov, total variation)
8.2 Phase Retrieval
Measurement Model:
$$
I_m = |\mathcal{A}_m x|^2, \quad m = 1, \ldots, M
$$
Wirtinger Flow:
$$
x^{(k+1)} = x^{(k)} - \frac{\mu_k}{M} \sum_{m=1}^{M} \left( |a_m^H x^{(k)}|^2 - I_m \right) a_m a_m^H x^{(k)}
$$
Uniqueness Conditions:
For $x \in \mathbb{C}^n$, uniqueness (up to global phase) requires $M \geq 4n - 4$ generic measurements.
8.3 Information-Theoretic Limits
Cramér-Rao Lower Bound:
$$
\text{Var}(\hat{\theta}_i) \geq \left[ \mathbf{I}(\boldsymbol{\theta})^{-1} \right]_{ii}
$$
Fisher Information Matrix:
$$
[\mathbf{I}(\boldsymbol{\theta})]_{ij} = -\mathbb{E}\left[ \frac{\partial^2 \log p(y | \boldsymbol{\theta})}{\partial \theta_i \partial \theta_j} \right]
$$
Optimal Experimental Design:
$$
\max_{\xi} \Phi(\mathbf{I}(\boldsymbol{\theta}; \xi))
$$
Where $\xi$ = experimental design, $\Phi$ = optimality criterion (D-optimal: $\det(\mathbf{I})$, A-optimal: $\text{tr}(\mathbf{I}^{-1})$)
9. Quantum-Classical Boundaries
9.1 Non-Equilibrium Green's Functions (NEGF)
Dyson Equation:
$$
G^R(E) = \left[ (E + i\eta)I - H - \Sigma^R(E) \right]^{-1}
$$
Current Calculation:
$$
I = \frac{2e}{h} \int_{-\infty}^{\infty} T(E) \left[ f_L(E) - f_R(E) \right] dE
$$
Transmission Function:
$$
T(E) = \text{Tr}\left[ \Gamma_L G^R \Gamma_R G^A \right]
$$
Where $\Gamma_{L,R} = i(\Sigma_{L,R}^R - \Sigma_{L,R}^A)$.
9.2 Density Functional Theory (DFT)
Kohn-Sham Equations:
$$
\left[ -\frac{\hbar^2}{2m}
abla^2 + V_{\text{eff}}(\mathbf{r}) \right] \psi_i(\mathbf{r}) = \epsilon_i \psi_i(\mathbf{r})
$$
Effective Potential:
$$
V_{\text{eff}}(\mathbf{r}) = V_{\text{ext}}(\mathbf{r}) + V_H(\mathbf{r}) + V_{xc}(\mathbf{r})
$$
Where:
- $V_{\text{ext}}$ = External (ionic) potential
- $V_H = \int \frac{n(\mathbf{r}')}{|\mathbf{r} - \mathbf{r}'|} d\mathbf{r}'$ = Hartree potential
- $V_{xc} = \frac{\delta E_{xc}[n]}{\delta n}$ = Exchange-correlation potential
9.3 Semiclassical Approximations
WKB Approximation:
$$
\psi(x) \approx \frac{C}{\sqrt{p(x)}} \exp\left( \pm \frac{i}{\hbar} \int^x p(x') \, dx' \right)
$$
Where $p(x) = \sqrt{2m(E - V(x))}$.
Validity Criterion:
$$
\left| \frac{d\lambda}{dx} \right| \ll 1, \quad \text{where } \lambda = \frac{h}{p}
$$
Tunneling Probability (WKB):
$$
T \approx \exp\left( -\frac{2}{\hbar} \int_{x_1}^{x_2} |p(x)| \, dx \right)
$$
10. Graph and Combinatorial Methods
10.1 Design Rule Checking (DRC)
Constraint Satisfaction Problem (CSP):
$$
\forall (i,j) \in E: \; d(p_i, p_j) \geq d_{\min}(t_i, t_j)
$$
Where:
- $p_i, p_j$ = Polygon features
- $d$ = Distance function (min spacing, enclosure, etc.)
- $t_i, t_j$ = Layer/feature types
SAT/SMT Encoding:
$$
\bigwedge_{r \in \text{Rules}} \bigwedge_{(i,j) \in \text{Violations}(r)}
eg(x_i \land x_j)
$$
10.2 Graph Neural Networks for Layout
Message Passing Framework:
$$
\mathbf{h}_v^{(k+1)} = \text{UPDATE}^{(k)} \left( \mathbf{h}_v^{(k)}, \text{AGGREGATE}^{(k)} \left( \left\{ \mathbf{h}_u^{(k)} : u \in \mathcal{N}(v) \right\} \right) \right)
$$
Graph Attention:
$$
\alpha_{vu} = \frac{\exp\left( \text{LeakyReLU}(\mathbf{a}^T [\mathbf{W}\mathbf{h}_v \| \mathbf{W}\mathbf{h}_u]) \right)}{\sum_{w \in \mathcal{N}(v)} \exp\left( \text{LeakyReLU}(\mathbf{a}^T [\mathbf{W}\mathbf{h}_v \| \mathbf{W}\mathbf{h}_w]) \right)}
$$
$$
\mathbf{h}_v' = \sigma\left( \sum_{u \in \mathcal{N}(v)} \alpha_{vu} \mathbf{W} \mathbf{h}_u \right)
$$
10.3 Hypergraph Partitioning
Min-Cut Objective:
$$
\min_{\pi: V \to \{1, \ldots, k\}} \sum_{e \in E} w_e \cdot \mathbf{1}[\text{cut}(e, \pi)]
$$
Subject to balance constraints:
$$
\left| |\pi^{-1}(i)| - \frac{|V|}{k} \right| \leq \epsilon \frac{|V|}{k}
$$
Cross-Cutting Mathematical Themes
Theme 1: Curse of Dimensionality
Tensor Train Decomposition:
$$
\mathcal{T}(i_1, \ldots, i_d) = G_1(i_1) \cdot G_2(i_2) \cdots G_d(i_d)
$$
- Storage: $\mathcal{O}(dnr^2)$ vs. $\mathcal{O}(n^d)$
- Where $r$ = TT-rank
Theme 2: Inverse Problems Framework
$$
\mathbf{y} = \mathcal{A}(\mathbf{x}) + \boldsymbol{\eta}
$$
Regularized Solution:
$$
\hat{\mathbf{x}} = \arg\min_{\mathbf{x}} \| \mathbf{y} - \mathcal{A}(\mathbf{x}) \|^2 + \lambda \mathcal{R}(\mathbf{x})
$$
Common regularizers:
- Tikhonov: $\mathcal{R}(\mathbf{x}) = \|\mathbf{x}\|_2^2$
- Total Variation: $\mathcal{R}(\mathbf{x}) = \|
abla \mathbf{x}\|_1$
- Sparsity: $\mathcal{R}(\mathbf{x}) = \|\mathbf{x}\|_1$
Theme 3: Certification and Trust
PAC-Bayes Bound:
$$
\mathbb{E}_{h \sim Q}[L(h)] \leq \mathbb{E}_{h \sim Q}[\hat{L}(h)] + \sqrt{\frac{\text{KL}(Q \| P) + \ln(2\sqrt{n}/\delta)}{2n}}
$$
Conformal Prediction:
$$
C(x_{\text{new}}) = \{y : s(x_{\text{new}}, y) \leq \hat{q}\}
$$
Where $\hat{q}$ = $(1-\alpha)$-quantile of calibration scores.
Key Notation Summary
| Symbol | Meaning |
|--------|---------|
| $M(x,y)$ | Mask transmission function |
| $I(x,y)$ | Aerial image intensity |
| $\mathcal{F}$ | Fourier transform |
| $
abla$ | Gradient operator |
| $
abla^2$, $\Delta$ | Laplacian |
| $\mathbb{E}[\cdot]$ | Expectation |
| $\mathcal{GP}(m, k)$ | Gaussian process with mean $m$, covariance $k$ |
| $\mathcal{N}(\mu, \sigma^2)$ | Normal distribution |
| $W_p(\mu,
u)$ | $p$-Wasserstein distance |
| $\text{Tr}(\cdot)$ | Matrix trace |
| $\|\cdot\|_p$ | $L^p$ norm |
| $\delta_{ij}$ | Kronecker delta |
| $\mathbf{1}_{A}$ | Indicator function of set $A$ |