link prediction, graph neural networks
Link prediction estimates likelihood of edges between node pairs for graph completion and recommendation.
442 technical terms and definitions
Link prediction estimates likelihood of edges between node pairs for graph completion and recommendation.
Linear Upper Confidence Bound algorithm balances exploration and exploitation in contextual bandits for recommendation.
Ask Linux/Ubuntu questions and I will explain commands, file structure, permissions, and safe ways to modify your system.
Memory-efficient optimizer using sign of gradients.
Lip reading extracts linguistic information from visual mouth movements using spatiotemporal neural networks for silent speech recognition.
Lip sync generates video of face speaking audio. Wav2Lip, SadTalker. Virtual avatars.
Bound sensitivity to input changes.
Architectures with bounded Lipschitz constant.
Rinse wafer and analyze solution.
Direct liquid cooling.
Liquid cooling circulates coolant through cold plates or microchannels providing superior heat removal for high-power devices.
Use liquid crystals to find hot spots.
# Liquid Crystal Hot Spot Failure Analysis: Advanced Techniques ## 1. Introduction Liquid crystal thermography (LCT) is a **non-destructive failure analysis (FA)** technique used in semiconductor and electronics testing. It exploits the temperature-sensitive optical properties of **cholesteric (chiral nematic) liquid crystals**. ## 2. Fundamental Principles ### 2.1 Thermochromic Behavior Cholesteric liquid crystals selectively reflect light at wavelengths dependent on their helical pitch $p$, which changes with temperature $T$. The **Bragg reflection condition** for peak wavelength: $$ \lambda_{\text{max}} = n_{\text{avg}} \cdot p $$ Where: - $\lambda_{\text{max}}$ = peak reflected wavelength (nm) - $n_{\text{avg}}$ = average refractive index of the liquid crystal - $p$ = helical pitch (nm) The pitch-temperature relationship: $$ p(T) = p_0 \left[ 1 + \alpha (T - T_0) \right]^{-1} $$ Where: - $p_0$ = pitch at reference temperature $T_0$ - $\alpha$ = thermal expansion coefficient of the pitch ($\text{K}^{-1}$) ### 2.2 Joule Heating at Defect Sites Power dissipation at a defect location: $$ P = I^2 R = \frac{V^2}{R} $$ Temperature rise due to localized heating: $$ \Delta T = \frac{P}{G_{\text{th}}} = \frac{P \cdot R_{\text{th}}}{1} $$ Where: - $P$ = power dissipation (W) - $G_{\text{th}}$ = thermal conductance (W/K) - $R_{\text{th}}$ = thermal resistance (K/W) ### 2.3 Thermal Diffusion The **heat diffusion equation** governing temperature distribution: $$ \frac{\partial T}{\partial t} = \alpha_{\text{th}} \nabla^2 T + \frac{Q}{\rho c_p} $$ Where: - $\alpha_{\text{th}} = \frac{k}{\rho c_p}$ = thermal diffusivity ($\text{m}^2/\text{s}$) - k = thermal conductivity (W/m-K) - $\rho$ = density (kg/m³) - c_p = specific heat capacity (J/kg-K) - $Q$ = volumetric heat source (W/m³) **Thermal diffusion length** (for pulsed excitation at frequency $f$): $$ \mu = \sqrt{\frac{\alpha_{\text{th}}}{\pi f}} $$ ## 3. Spatial Resolution and Sensitivity ### 3.1 Resolution Limits The effective spatial resolution $\delta$ is limited by: $$ \delta = \sqrt{\delta_{\text{opt}}^2 + \delta_{\text{th}}^2} $$ Where: - $\delta_{\text{opt}}$ = optical resolution limit (diffraction-limited: $\delta_{\text{opt}} \approx \frac{\lambda}{2 \cdot \text{NA}}$) - $\delta_{\text{th}}$ = thermal spreading in the substrate ### 3.2 Minimum Detectable Power $$ P_{\text{min}} = \frac{\Delta T_{\text{min}} \cdot k \cdot A}{d} $$ Where: - $\Delta T_{\text{min}}$ = minimum detectable temperature change (~0.1°C) - $k$ = thermal conductivity of substrate - $A$ = defect area - $d$ = depth of defect below surface ## 4. Advanced Failure Modes Detectable ### 4.1 Electrical Defects - **Gate oxide shorts and leakage paths** - Current through defective oxide: $I_{\text{leak}} = \frac{V_{\text{ox}}}{R_{\text{defect}}}$ - Power: $P = I_{\text{leak}} \cdot V_{\text{ox}}$ - **Metal bridging and shorts** - Bridge resistance: $R_{\text{bridge}} = \frac{\rho L}{A}$ - Localized dissipation creates thermal signature - **Junction leakage and latch-up** - Parasitic thyristor current: $I_{\text{latch}} = \frac{V_{DD}}{R_{\text{well}} + R_{\text{sub}}}$ - **Electromigration damage** - Current density threshold (Black's equation): $$ \text{MTTF} = A \cdot J^{-n} \cdot \exp\left(\frac{E_a}{k_B T}\right) $$ ### 4.2 Thermal/Mechanical Defects - **Die-attach voids** - Effective thermal resistance with void fraction $\phi$: $$ R_{\text{th,eff}} = \frac{R_{\text{th,0}}}{1 - \phi} $$ - **Delamination** - Creates thermal barrier, increasing local $\Delta T$ ## 5. Advanced Methodologies ### 5.1 Backside Analysis For flip-chip or devices with opaque frontside metallization: - **Die thinning requirement**: Thickness $t \approx 50-100 \, \mu\text{m}$ - **Silicon transparency**: $\lambda > 1.1 \, \mu\text{m}$ (bandgap energy $E_g = 1.12 \, \text{eV}$) $$ E_g = \frac{hc}{\lambda_{\text{cutoff}}} \Rightarrow \lambda_{\text{cutoff}} = \frac{1.24 \, \mu\text{m} \cdot \text{eV}}{E_g} $$ ### 5.2 Lock-in Thermography Modulated power excitation with lock-in detection: $$ P(t) = P_0 \left[1 + \cos(2\pi f_{\text{mod}} t)\right] $$ **Temperature response (amplitude and phase)**: $$ T(x, t) = T_0 + \Delta T(x) \cos\left(2\pi f_{\text{mod}} t - \phi(x)\right) $$ Phase lag due to thermal diffusion: $$ \phi(x) = \frac{x}{\mu} = x \sqrt{\frac{\pi f_{\text{mod}}}{\alpha_{\text{th}}}} $$ **Signal-to-noise improvement**: $$ \text{SNR}_{\text{lock-in}} = \text{SNR}_{\text{DC}} \cdot \sqrt{N_{\text{cycles}}} $$ ### 5.3 Pulsed Excitation For transient thermal analysis: $$ \Delta T(t) = \frac{P}{G_{\text{th}}} \left(1 - e^{-t/\tau_{\text{th}}}\right) $$ Where thermal time constant: $$ \tau_{\text{th}} = R_{\text{th}} \cdot C_{\text{th}} = \frac{\rho c_p V}{k A / d} $$ ## 6. Comparison with Other Thermal Techniques | Technique | Resolution | Sensitivity | Speed | Equation Basis | |-----------|-----------|-------------|-------|----------------| | Liquid Crystal | $5-20 \, \mu\text{m}$ | $\sim 0.1°\text{C}$ | Moderate | Bragg: $\lambda = np$ | | IR Thermography | $3-5 \, \mu\text{m}$ | $\sim 10 \, \text{mK}$ | Fast | Stefan-Boltzmann: $P = \varepsilon \sigma T^4$ | | Thermoreflectance | $< 1 \, \mu\text{m}$ | $\sim 10 \, \text{mK}$ | Fast | $\frac{\Delta R}{R} = \kappa \Delta T$ | | Scanning Thermal | $< 100 \, \text{nm}$ | $\sim 1 \, \text{mK}$ | Slow | Fourier: $q = -k\nabla T$ | ## 7. Practical Workflow ### 7.1 Sample Preparation 1. **Decapsulation** - Chemical (fuming $\text{HNO}_3$, $\text{H}_2\text{SO}_4$) - Plasma etching - Mechanical (for ceramic packages) 2. **Surface cleaning** - Solvent rinse (acetone, IPA) - Plasma cleaning for organic residue 3. **Liquid crystal application** - Airbrush: layer thickness $\sim 10-50 \, \mu\text{m}$ - Spin coating: $\omega \sim 1000-3000 \, \text{rpm}$ ### 7.2 Bias Conditions - **DC bias**: $V_{\text{test}} = V_{\text{DD}} \times (1.0 - 1.2)$ - **Current limiting**: $I_{\text{max}}$ to prevent thermal runaway - **Power budget**: $$ P_{\text{total}} = P_{\text{quiescent}} + P_{\text{defect}} $$ ### 7.3 Temperature Control Stage temperature setpoint: $$ T_{\text{stage}} = T_{\text{LC,center}} - \Delta T_{\text{expected}} $$ Where $T_{\text{LC,center}}$ is the center of the liquid crystal's active color-play range. ## 8. Detection Limits ### 8.1 Minimum Detectable Power For a defect at depth d in silicon (k_Si = 148 W/m-K): $$ P_{\text{min}} \approx 4\pi k d \cdot \Delta T_{\text{min}} $$ **Example calculation**: - $d = 10 \, \mu\text{m} = 10 \times 10^{-6} \, \text{m}$ - $\Delta T_{\text{min}} = 0.1 \, \text{K}$ - k = 148 W/m-K $$ P_{\text{min}} = 4\pi \times 148 \times 10 \times 10^{-6} \times 0.1 \approx 1.86 \, \text{mW} $$ ### 8.2 Defect Size vs. Power Relationship Assuming hemispherical heat spreading: $$ \Delta T = \frac{P}{2\pi k r} $$ Solving for minimum detectable defect radius at given power: $$ r_{\text{min}} = \frac{P}{2\pi k \Delta T_{\text{min}}} $$ ## 9. Integration with Physical Failure Analysis ### 9.1 FIB Cross-Sectioning Workflow 1. **Coordinate transfer** - Optical microscope coordinates $\rightarrow$ FIB stage coordinates - Alignment markers for registration 2. **Protective deposition** - Pt or W layer: $\sim 1-2 \, \mu\text{m}$ thick 3. **Cross-section milling** - Rough cut: $30 \, \text{kV}$, high current ($\sim \text{nA}$) - Fine polish: $30 \, \text{kV}$, low current ($\sim \text{pA}$) ### 9.2 Failure Signature Correlation | Thermal Signature | Likely Physical Defect | |-------------------|------------------------| | Point source | Gate oxide pinhole, metal spike | | Linear | Metal bridge, crack | | Diffuse area | Junction leakage, ESD damage | | Periodic pattern | Systematic process defect | ## 10. Error Analysis ### 10.1 Temperature Measurement Uncertainty $$ \sigma_T = \sqrt{\sigma_{\text{LC}}^2 + \sigma_{\text{stage}}^2 + \sigma_{\text{optical}}^2} $$ ### 10.2 Position Uncertainty Due to thermal spreading: $$ \sigma_x \approx \mu = \sqrt{\frac{\alpha_{\text{th}} \cdot t_{\text{exposure}}}{\pi}} $$ ## 11. Equations | Parameter | Equation | |-----------|----------| | Bragg wavelength | $\lambda_{\text{max}} = n_{\text{avg}} \cdot p$ | | Power dissipation | $P = I^2 R = V^2/R$ | | Thermal diffusion length | $\mu = \sqrt{\alpha_{\text{th}} / \pi f}$ | | Temperature rise | $\Delta T = P \cdot R_{\text{th}}$ | | Lock-in phase | $\phi = x/\mu$ | | Minimum power | $P_{\text{min}} = 4\pi k d \cdot \Delta T_{\text{min}}$ | ## 12. Standards - **JEDEC JESD22-A** — Failure analysis procedures - **MIL-STD-883** — Test methods for microelectronics - **SEMI E10** — Equipment reliability metrics
Liquid crystal thermography uses temperature-sensitive color changes for visual thermal mapping.
Use liquid epoxy.
Liquid metal thermal interface materials provide extremely low thermal resistance through high conductivity and conformability.
Liquid neural networks adapt dynamics based on inputs enabling continual learning.
Dynamically adapting networks with time-varying parameters inspired by biological neurons.
Continuously adapting networks inspired by biological neurons.
Continuous-time RNNs with adaptive dynamics.
Listen Attend Spell is an attention-based encoder-decoder architecture for end-to-end speech recognition without phonetic representations.
ListNet uses listwise ranking approach with cross-entropy loss over permutation probabilities.
Listwise ranking optimizes entire ranked list quality considering inter-item dependencies.
Optimize entire ranked list.
LiteLLM provides unified interface to many LLM providers. Drop-in replacement.
Advanced multi-patterning with spacer freeze.
Layouts optimized for lithography.
# Semiconductor Manufacturing: Optics and Lithography Mathematical Modeling A comprehensive guide to the mathematical foundations of semiconductor lithography, covering electromagnetic theory, Fourier optics, optimization mathematics, and stochastic processes. ## 1. Fundamental Imaging Theory ### 1.1 The Resolution Limits The Rayleigh equations define the physical limits of optical lithography: **Resolution:** $$ R = k_1 \cdot \frac{\lambda}{NA} $$ **Depth of Focus:** $$ DOF = k_2 \cdot \frac{\lambda}{NA^2} $$ **Parameter Definitions:** - $\lambda$ — Wavelength of light (193nm for ArF immersion, 13.5nm for EUV) - $NA = n \cdot \sin(\theta)$ — Numerical aperture - $n$ — Refractive index of immersion medium - $\theta$ — Half-angle of the lens collection cone - $k_1, k_2$ — Process-dependent factors (typically $k_1 \geq 0.25$ from Rayleigh criterion; modern processes achieve $k_1 \sim 0.3–0.4$) **Fundamental Tension:** - Improving resolution requires: - Increasing $NA$, OR - Decreasing $\lambda$ - Both degrade depth of focus **quadratically** ($\propto NA^{-2}$) ## 2. Fourier Optics Framework The projection lithography system is modeled as a **linear shift-invariant system** in the Fourier domain. ### 2.1 Coherent Imaging For a perfectly coherent source, the image field is given by convolution: $$ E_{image}(x,y) = E_{object}(x,y) \otimes h(x,y) $$ In frequency space (via Fourier transform): $$ \tilde{E}_{image}(f_x, f_y) = \tilde{E}_{object}(f_x, f_y) \cdot H(f_x, f_y) $$ **Key Components:** - $h(x,y)$ — Amplitude Point Spread Function (PSF) - $H(f_x, f_y)$ — Coherent Transfer Function (pupil function) - Typically a `circ` function for circular aperture - Cuts off spatial frequencies beyond $\frac{NA}{\lambda}$ ### 2.2 Partially Coherent Imaging — The Hopkins Formulation Real lithography systems operate in the **partially coherent regime**: $$ \sigma = 0.3 - 0.9 $$ where $\sigma$ is the ratio of condenser NA to objective NA. #### Transmission Cross Coefficient (TCC) Integral The aerial image intensity is: $$ I(x,y) = \int\!\!\!\int\!\!\!\int\!\!\!\int TCC(f_1,g_1,f_2,g_2) \cdot M(f_1,g_1) \cdot M^*(f_2,g_2) \cdot e^{2\pi i[(f_1-f_2)x + (g_1-g_2)y]} \, df_1 \, dg_1 \, df_2 \, dg_2 $$ The TCC itself is defined as: $$ TCC(f_1,g_1,f_2,g_2) = \int\!\!\!\int J(f,g) \cdot P(f+f_1, g+g_1) \cdot P^*(f+f_2, g+g_2) \, df \, dg $$ **Parameter Definitions:** - $J(f,g)$ — Source intensity distribution (conventional, annular, dipole, quadrupole, or freeform) - $P$ — Pupil function (including aberrations) - $M$ — Mask transmission/diffraction spectrum - $M^*$ — Complex conjugate of mask spectrum **Computational Note:** This is a 4D integral over frequency space for every image point — computationally expensive but essential for accuracy. ## 3. Computational Acceleration: SOCS Decomposition Direct TCC computation is prohibitive. The **Sum of Coherent Systems (SOCS)** method uses eigendecomposition: $$ TCC(f_1,g_1,f_2,g_2) \approx \sum_{i=1}^{N} \lambda_i \cdot \phi_i(f_1,g_1) \cdot \phi_i^*(f_2,g_2) $$ **Decomposition Components:** - $\lambda_i$ — Eigenvalues (sorted by magnitude) - $\phi_i$ — Eigenfunctions (kernels) The image becomes a sum of coherent images: $$ I(x,y) \approx \sum_{i=1}^{N} \lambda_i \cdot \left| m(x,y) \otimes \phi_i(x,y) \right|^2 $$ **Computational Properties:** - Typically $N = 10–50$ kernels capture $>99\%$ of imaging behavior - Each convolution computed via FFT - Complexity: $O(N \log N)$ per kernel ## 4. Vector Electromagnetic Effects at High NA When $NA > 0.7$ (immersion lithography reaches $NA \sim 1.35$), scalar diffraction theory fails. The **vector nature of light** must be modeled. ### 4.1 Richards-Wolf Vector Diffraction The electric field near focus: $$ \mathbf{E}(r,\psi,z) = -\frac{ikf}{2\pi} \int_0^{\theta_{max}} \int_0^{2\pi} \mathbf{A}(\theta,\phi) \cdot P(\theta,\phi) \cdot e^{ik[z\cos\theta + r\sin\theta\cos(\phi-\psi)]} \sin\theta \, d\theta \, d\phi $$ **Variables:** - $\mathbf{A}(\theta,\phi)$ — Polarization-dependent amplitude vector - $P(\theta,\phi)$ — Pupil function - $k = \frac{2\pi}{\lambda}$ — Wave number - $(r, \psi, z)$ — Cylindrical coordinates at image plane ### 4.2 Polarization Effects For high-NA imaging, polarization significantly affects image contrast: | Polarization | Description | Behavior | |:-------------|:------------|:---------| | **TE (s-polarization)** | Electric field ⊥ to plane of incidence | Interferes constructively | | **TM (p-polarization)** | Electric field ∥ to plane of incidence | Suffers contrast loss at high angles | **Consequences:** - Horizontal vs. vertical features print differently - Requires illumination polarization control: - Tangential polarization - Radial polarization - Optimized/freeform polarization ## 5. Aberration Modeling: Zernike Polynomials Wavefront aberrations are expanded in **Zernike polynomials** over the unit pupil: $$ W(\rho,\theta) = \sum_{n,m} Z_n^m \cdot R_n^{|m|}(\rho) \cdot \begin{cases} \cos(m\theta) & m \geq 0 \\ \sin(|m|\theta) & m < 0 \end{cases} $$ ### 5.1 Key Aberrations Affecting Lithography | Zernike Term | Aberration | Effect on Imaging | |:-------------|:-----------|:------------------| | $Z_4$ | Defocus | Pattern-dependent CD shift | | $Z_5, Z_6$ | Astigmatism | H/V feature difference | | $Z_7, Z_8$ | Coma | Pattern shift, asymmetric printing | | $Z_9$ | Spherical | Through-pitch CD variation | | $Z_{10}, Z_{11}$ | Trefoil | Three-fold symmetric distortion | ### 5.2 Aberrated Pupil Function The pupil function with aberrations: $$ P(\rho,\theta) = P_0(\rho,\theta) \cdot \exp\left[\frac{2\pi i}{\lambda} W(\rho,\theta)\right] $$ **Engineering Specifications:** - Modern scanners control Zernikes through adjustable lens elements - Typical specification: $< 0.5\text{nm}$ RMS wavefront error ## 6. Rigorous Mask Modeling ### 6.1 Thin Mask (Kirchhoff) Approximation Assumes the mask is infinitely thin: $$ M(x,y) = t(x,y) \cdot e^{i\phi(x,y)} $$ **Limitations:** - Fails for advanced nodes - Mask topography (absorber thickness $\sim 50–70\text{nm}$) affects diffraction ### 6.2 Rigorous Electromagnetic Field (EMF) Methods #### 6.2.1 Rigorous Coupled-Wave Analysis (RCWA) The mask is treated as a **periodic grating**. Fields are expanded in Fourier series: $$ E(x,z) = \sum_n E_n(z) \cdot e^{i(k_{x0} + nK)x} $$ **Parameters:** - $K = \frac{2\pi}{\text{pitch}}$ — Grating vector - $k_{x0}$ — Incident wave x-component Substituting into Maxwell's equations yields **coupled ODEs** solved as an eigenvalue problem in each z-layer. #### 6.2.2 FDTD (Finite-Difference Time-Domain) Directly discretizes Maxwell's curl equations on a **Yee grid**: $$ \frac{\partial \mathbf{E}}{\partial t} = \frac{1}{\epsilon} \nabla \times \mathbf{H} $$ $$ \frac{\partial \mathbf{H}}{\partial t} = -\frac{1}{\mu} \nabla \times \mathbf{E} $$ **Characteristics:** - Explicit time-stepping - Computationally intensive - Handles arbitrary geometries ## 7. Photoresist Modeling ### 7.1 Exposure: Dill ABC Model The photoactive compound (PAC) concentration $M$ evolves as: $$ \frac{\partial M}{\partial t} = -I(z,t) \cdot [A \cdot M + B] \cdot M $$ **Parameters:** - $A$ — Bleachable absorption coefficient - $B$ — Non-bleachable absorption coefficient - $I(z,t)$ — Intensity in the resist Light intensity in the resist follows Beer-Lambert: $$ \frac{\partial I}{\partial z} = -\alpha(M) \cdot I $$ where $\alpha = A \cdot M + B$. ### 7.2 Post-Exposure Bake: Reaction-Diffusion For **chemically amplified resists (CAR)**: $$ \frac{\partial m}{\partial t} = D\nabla^2 m - k_{amp} \cdot m \cdot [H^+] $$ **Variables:** - $m$ — Blocking group concentration - $D$ — Diffusivity (temperature-dependent, Arrhenius behavior) - $[H^+]$ — Acid concentration Acid diffusion and quenching: $$ \frac{\partial [H^+]}{\partial t} = D_H \nabla^2 [H^+] - k_q [H^+][Q] $$ where $Q$ is quencher concentration. ### 7.3 Development: Mack Model Development rate as a function of inhibitor concentration $m$: $$ R(m) = R_{max} \cdot \frac{(a+1)(1-m)^n}{a + (1-m)^n} + R_{min} $$ **Parameters:** - $a, n$ — Kinetic parameters - $R_{max}$ — Maximum development rate - $R_{min}$ — Minimum development rate (unexposed) This creates the **nonlinear resist response** that sharpens edges. ## 8. Optical Proximity Correction (OPC) ### 8.1 The Inverse Problem Given target pattern $T$, find mask $M$ such that: $$ \text{Image}(M) \approx T $$ ### 8.2 Model-Based OPC Iterative edge-based correction. Cost function: $$ \mathcal{L} = \sum_i w_i \cdot (EPE_i)^2 + \lambda \cdot R(M) $$ **Components:** - $EPE_i$ — Edge Placement Error (distance from target at evaluation point $i$) - $w_i$ — Weight for each evaluation point - $R(M)$ — Regularization term for mask manufacturability Gradient descent update: $$ M^{(k+1)} = M^{(k)} - \eta \frac{\partial \mathcal{L}}{\partial M} $$ **Gradient Computation Methods:** - Adjoint methods (efficient for many output points) - Direct differentiation of SOCS kernels ### 8.3 Inverse Lithography Technology (ILT) Full pixel-based mask optimization: $$ \min_M \left\| I(M) - I_{target} \right\|^2 + \lambda_1 \|M\|_{TV} + \lambda_2 \|\nabla^2 M\|^2 $$ **Regularization Terms:** - $\|M\|_{TV}$ — Total Variation promotes sharp mask edges - $\|\nabla^2 M\|^2$ — Laplacian term controls curvature **Result:** ILT produces **curvilinear masks** with superior imaging, enabled by multi-beam mask writers. ## 9. Source-Mask Optimization (SMO) Joint optimization of illumination source $J$ and mask $M$: $$ \min_{J,M} \mathcal{L}(J,M) = \left\| I(J,M) - I_{target} \right\|^2 + \text{process window terms} $$ ### 9.1 Constraints **Source Constraints:** - Pixelized representation - Non-negative intensity: $J \geq 0$ - Power constraint: $\int J \, dA = P_0$ **Mask Constraints:** - Minimum feature size - Maximum curvature - Manufacturability rules ### 9.2 Mathematical Properties The problem is **bilinear in $J$ and $M$** (linear in each separately), enabling: - Alternating optimization - Joint gradient methods ### 9.3 Process Window Co-optimization Adds robustness across focus and dose variations: $$ \mathcal{L}_{PW} = \sum_{focus, dose} w_{f,d} \cdot \left\| I_{f,d}(J,M) - I_{target} \right\|^2 $$ ## 10. EUV-Specific Mathematics ### 10.1 Multilayer Reflector Mo/Si multilayer with **40–50 bilayer pairs**. Peak reflectivity from Bragg condition: $$ 2d \cdot \cos\theta = n\lambda $$ **Parameters:** - $d \approx 6.9\text{nm}$ — Bilayer period for $\lambda = 13.5\text{nm}$ - Near-normal incidence ($\theta \approx 0°$) #### Transfer Matrix Method Reflectivity calculation: $$ \begin{pmatrix} E_{out}^+ \\ E_{out}^- \end{pmatrix} = \prod_{j=1}^{N} M_j \begin{pmatrix} E_{in}^+ \\ E_{in}^- \end{pmatrix} $$ where $M_j$ is the transfer matrix for layer $j$. ### 10.2 Mask 3D Effects EUV masks are **reflective** with absorber patterns. At 6° chief ray angle: - **Shadowing:** Different illumination angles see different absorber profiles - **Best focus shift:** Pattern-dependent focus offsets Requires **full 3D EMF simulation** (RCWA or FDTD) for accurate modeling. ### 10.3 Stochastic Effects At EUV, photon counts are low enough that **shot noise** matters: $$ \sigma_{photon} = \sqrt{N_{photon}} $$ #### Line Edge Roughness (LER) Contributions - Photon shot noise - Acid shot noise - Resist molecular granularity #### Power Spectral Density Model $$ PSD(f) = \frac{A}{1 + (2\pi f \xi)^{2+2H}} $$ **Parameters:** - $\xi$ — Correlation length - $H$ — Hurst exponent (typically $0.5–0.8$) - $A$ — Amplitude #### Stochastic Simulation via Monte Carlo - Poisson-distributed photon absorption - Random acid generation and diffusion - Development with local rate variations ## 11. Process Window Analysis ### 11.1 Bossung Curves CD vs. focus at multiple dose levels: $$ CD(E, F) = CD_0 + a_1 E + a_2 F + a_3 E^2 + a_4 F^2 + a_5 EF + \cdots $$ Polynomial expansion fitted to simulation/measurement. ### 11.2 Normalized Image Log-Slope (NILS) $$ NILS = w \cdot \left. \frac{d \ln I}{dx} \right|_{edge} $$ **Parameters:** - $w$ — Feature width - Evaluated at the edge position **Design Rule:** $NILS > 2$ generally required for acceptable process latitude. **Relationship to Exposure Latitude:** $$ EL \propto NILS $$ ### 11.3 Depth of Focus (DOF) and Exposure Latitude (EL) Trade-off Visualized as overlapping process windows across pattern types — the **common process window** must satisfy all critical features. ## 12. Multi-Patterning Mathematics ### 12.1 SADP (Self-Aligned Double Patterning) $$ \text{Spacer pitch} = \frac{\text{Mandrel pitch}}{2} $$ **Design Rule Constraints:** - Mandrel CD and pitch - Spacer thickness uniformity - Cut pattern overlay ### 12.2 LELE (Litho-Etch-Litho-Etch) Decomposition **Graph coloring problem:** Assign features to masks such that: - Features on same mask satisfy minimum spacing - Total mask count minimized (typically 2) **Computational Properties:** - For 1D patterns: Equivalent to 2-colorable graph (bipartite) - For 2D: **NP-complete** in general **Solution Methods:** - Integer Linear Programming (ILP) - SAT solvers - Heuristic algorithms **Conflict Graph Edge Weight:** $$ w_{ij} = \begin{cases} \infty & \text{if } d_{ij} < d_{min,same} \\ 0 & \text{otherwise} \end{cases} $$ ## 13. Machine Learning Integration ### 13.1 Surrogate Models Neural networks approximate aerial image or resist profile: $$ I_{NN}(x; M) \approx I_{physics}(x; M) $$ **Benefits:** - Training on physics simulation data - Inference 100–1000× faster ### 13.2 OPC with ML - **CNNs:** Predict edge corrections - **GANs:** Generate mask patterns - **Reinforcement Learning:** Iterative OPC optimization ### 13.3 Hotspot Detection Classification of lithographic failure sites: $$ P(\text{hotspot} \mid \text{pattern}) = \sigma(W \cdot \phi(\text{pattern}) + b) $$ where $\sigma$ is the sigmoid function and $\phi$ extracts pattern features. ## 14. Mathematical Optimization Framework ### 14.1 Constrained Optimization Formulation $$ \min f(x) \quad \text{subject to} \quad g(x) \leq 0, \quad h(x) = 0 $$ **Solution Methods:** - Sequential Quadratic Programming (SQP) - Interior Point Methods - Augmented Lagrangian ### 14.2 Regularization Techniques | Regularization | Formula | Effect | |:---------------|:--------|:-------| | L1 (Sparsity) | $\|\nabla M\|_1$ | Promotes sparse gradients | | L2 (Smoothness) | $\|\nabla M\|_2^2$ | Promotes smooth transitions | | Total Variation | $\int |\nabla M| \, dx$ | Preserves edges while smoothing | ## 15. Mathematical Stack | Layer | Mathematics | |:------|:------------| | Electromagnetic Propagation | Maxwell's equations, RCWA, FDTD | | Image Formation | Fourier optics, TCC, Hopkins, vector diffraction | | Aberrations | Zernike polynomials, wavefront phase | | Photoresist | Coupled PDEs (reaction-diffusion) | | Correction (OPC/ILT) | Inverse problems, constrained optimization | | SMO | Bilinear optimization, gradient methods | | Stochastics (EUV) | Poisson processes, Monte Carlo | | Multi-Patterning | Graph theory, combinatorial optimization | | Machine Learning | Neural networks, surrogate models | ## Reference Formulas ### Core Equations ``` Resolution: R = k₁ × λ / NA Depth of Focus: DOF = k₂ × λ / NA² Numerical Aperture: NA = n × sin(θ) NILS: NILS = w × (d ln I / dx)|edge Bragg Condition: 2d × cos(θ) = nλ Shot Noise: σ = √N ``` ### Typical Parameter Values | Parameter | Typical Value | Application | |:----------|:--------------|:------------| | $\lambda$ (ArF) | 193 nm | Immersion lithography | | $\lambda$ (EUV) | 13.5 nm | EUV lithography | | $NA$ (Immersion) | 1.35 | High-NA ArF | | $NA$ (EUV) | 0.33 – 0.55 | Current/High-NA EUV | | $k_1$ | 0.3 – 0.4 | Advanced nodes | | $\sigma$ (Partial Coherence) | 0.3 – 0.9 | Illumination | | Zernike RMS | < 0.5 nm | Aberration spec |
# Semiconductor Manufacturing Process: Lithography Mathematical Modeling ## 1. Introduction Lithography is the critical patterning step in semiconductor manufacturing that transfers circuit designs onto silicon wafers. It is essentially the "printing press" of chip making and determines the minimum feature sizes achievable. ### 1.1 Basic Process Flow 1. Coat wafer with photoresist 2. Expose photoresist to light through a mask/reticle 3. Develop the photoresist (remove exposed or unexposed regions) 4. Etch or deposit through the patterned resist 5. Strip the remaining resist ### 1.2 Types of Lithography - **Optical lithography:** DUV at 193nm, EUV at 13.5nm - **Electron beam lithography:** Direct-write, maskless - **Nanoimprint lithography:** Mechanical pattern transfer - **X-ray lithography:** Short wavelength exposure ## 2. Optical Image Formation The foundation of lithography modeling is **partially coherent imaging theory**, formalized through the Hopkins integral. ### 2.1 Hopkins Integral The intensity distribution at the image plane is given by: $$ I(x,y) = \iiint\!\!\!\int TCC(f_1,g_1;f_2,g_2) \cdot \tilde{M}(f_1,g_1) \cdot \tilde{M}^*(f_2,g_2) \cdot e^{2\pi i[(f_1-f_2)x + (g_1-g_2)y]} \, df_1\,dg_1\,df_2\,dg_2 $$ Where: - $I(x,y)$ — Intensity at image plane coordinates $(x,y)$ - $\tilde{M}(f,g)$ — Fourier transform of the mask transmission function - $TCC$ — Transmission Cross Coefficient ### 2.2 Transmission Cross Coefficient (TCC) The TCC encodes both the illumination source and lens pupil: $$ TCC(f_1,g_1;f_2,g_2) = \iint S(f,g) \cdot P(f+f_1,g+g_1) \cdot P^*(f+f_2,g+g_2) \, df\,dg $$ Where: - $S(f,g)$ — Source intensity distribution - $P(f,g)$ — Pupil function (encodes aberrations, NA cutoff) - $P^*$ — Complex conjugate of the pupil function ### 2.3 Sum of Coherent Systems (SOCS) To accelerate computation, the TCC is decomposed using eigendecomposition: $$ TCC(f_1,g_1;f_2,g_2) = \sum_{k=1}^{N} \lambda_k \cdot \phi_k(f_1,g_1) \cdot \phi_k^*(f_2,g_2) $$ The image becomes a weighted sum of coherent images: $$ I(x,y) = \sum_{k=1}^{N} \lambda_k \left| \mathcal{F}^{-1}\{\phi_k \cdot \tilde{M}\} \right|^2 $$ ### 2.4 Coherence Factor The partial coherence factor $\sigma$ is defined as: $$ \sigma = \frac{NA_{source}}{NA_{lens}} $$ - $\sigma = 0$ — Fully coherent illumination - $\sigma = 1$ — Matched illumination - $\sigma > 1$ — Overfilled illumination ## 3. Resolution Limits and Scaling Laws ### 3.1 Rayleigh Criterion The minimum resolvable feature size: $$ R = k_1 \frac{\lambda}{NA} $$ Where: - $R$ — Minimum resolvable feature - $k_1$ — Process factor (theoretical limit $\approx 0.25$, practical $\approx 0.3\text{--}0.4$) - $\lambda$ — Wavelength of light - $NA$ — Numerical aperture $= n \sin\theta$ ### 3.2 Depth of Focus $$ DOF = k_2 \frac{\lambda}{NA^2} $$ Where: - $DOF$ — Depth of focus - $k_2$ — Process-dependent constant ### 3.3 Technology Comparison | Technology | $\lambda$ (nm) | NA | Min. Feature | DOF | |:-----------|:---------------|:-----|:-------------|:----| | DUV ArF | 193 | 1.35 | ~38 nm | ~100 nm | | EUV | 13.5 | 0.33 | ~13 nm | ~120 nm | | High-NA EUV | 13.5 | 0.55 | ~8 nm | ~45 nm | ### 3.4 Resolution Enhancement Techniques (RETs) Key techniques to reduce effective $k_1$: - **Off-Axis Illumination (OAI):** Dipole, quadrupole, annular - **Phase-Shift Masks (PSM):** Alternating, attenuated - **Optical Proximity Correction (OPC):** Bias, serifs, sub-resolution assist features (SRAFs) - **Multiple Patterning:** LELE, SADP, SAQP ## 4. Rigorous Electromagnetic Mask Modeling ### 4.1 Thin Mask Approximation (Kirchhoff) For features much larger than wavelength: $$ E_{mask}(x,y) = t(x,y) \cdot E_{incident} $$ Where $t(x,y)$ is the complex transmission function. ### 4.2 Maxwell's Equations For sub-wavelength features, we must solve Maxwell's equations rigorously: $$ \nabla \times \mathbf{E} = -\frac{\partial \mathbf{B}}{\partial t} $$ $$ \nabla \times \mathbf{H} = \mathbf{J} + \frac{\partial \mathbf{D}}{\partial t} $$ ### 4.3 RCWA (Rigorous Coupled-Wave Analysis) For periodic structures with grating period $d$, fields are expanded in Floquet modes: $$ E(x,z) = \sum_{n=-N}^{N} A_n(z) \cdot e^{i k_{xn} x} $$ Where the wavevector components are: $$ k_{xn} = k_0 \sin\theta_0 + \frac{2\pi n}{d} $$ This yields a matrix eigenvalue problem: $$ \frac{d^2}{dz^2}\mathbf{A} = \mathbf{K}^2 \mathbf{A} $$ Where $\mathbf{K}$ couples different diffraction orders through the dielectric tensor. ### 4.4 FDTD (Finite-Difference Time-Domain) Discretizing Maxwell's equations on a Yee grid: $$ \frac{\partial H_y}{\partial t} = \frac{1}{\mu}\left(\frac{\partial E_x}{\partial z} - \frac{\partial E_z}{\partial x}\right) $$ $$ \frac{\partial E_x}{\partial t} = \frac{1}{\epsilon}\left(\frac{\partial H_y}{\partial z} - J_x\right) $$ ### 4.5 EUV Mask 3D Effects Shadowing from absorber thickness $h$ at angle $\theta$: $$ \Delta x = h \tan\theta $$ For EUV at 6° chief ray angle: $$ \Delta x \approx 0.105 \cdot h $$ ## 5. Photoresist Modeling ### 5.1 Dill ABC Model (Exposure) The photoactive compound (PAC) concentration evolves as: $$ \frac{\partial M(z,t)}{\partial t} = -I(z,t) \cdot M(z,t) \cdot C $$ Light absorption follows Beer-Lambert law: $$ \frac{dI}{dz} = -\alpha(M) \cdot I $$ $$ \alpha(M) = A \cdot M + B $$ Where: - $A$ — Bleachable absorption coefficient - $B$ — Non-bleachable absorption coefficient - $C$ — Exposure rate constant (quantum efficiency) - $M$ — Normalized PAC concentration ### 5.2 Post-Exposure Bake (PEB) — Reaction-Diffusion For chemically amplified resists (CARs): $$ \frac{\partial h}{\partial t} = D \nabla^2 h + k \cdot h \cdot M_{blocking} $$ Where: - $h$ — Acid concentration - $D$ — Diffusion coefficient - $k$ — Reaction rate constant - $M_{blocking}$ — Blocking group concentration The blocking group deprotection: $$ \frac{\partial M_{blocking}}{\partial t} = -k_{amp} \cdot h \cdot M_{blocking} $$ ### 5.3 Mack Development Rate Model $$ r(m) = r_{max} \cdot \frac{(a+1)(1-m)^n}{a + (1-m)^n} + r_{min} $$ Where: - $r$ — Development rate - $m$ — Normalized PAC concentration remaining - $n$ — Contrast (dissolution selectivity) - $a$ — Inhibition depth - $r_{max}$ — Maximum development rate (fully exposed) - $r_{min}$ — Minimum development rate (unexposed) ### 5.4 Enhanced Mack Model Including surface inhibition: $$ r(m,z) = r_{max} \cdot \frac{(a+1)(1-m)^n}{a + (1-m)^n} \cdot \left(1 - e^{-z/l}\right) + r_{min} $$ Where $l$ is the surface inhibition depth. ## 6. Optical Proximity Correction (OPC) ### 6.1 Forward Problem Given mask $M$, compute the printed wafer image: $$ I = F(M) $$ Where $F$ represents the complete optical and resist model. ### 6.2 Inverse Problem Given target pattern $T$, find mask $M$ such that: $$ F(M) \approx T $$ ### 6.3 Edge Placement Error (EPE) $$ EPE_i = x_{printed,i} - x_{target,i} $$ ### 6.4 OPC Optimization Formulation Minimize the cost function: $$ \mathcal{L}(M) = \sum_{i=1}^{N} w_i \cdot EPE_i^2 + \lambda \cdot R(M) $$ Where: - $w_i$ — Weight for evaluation point $i$ - $R(M)$ — Regularization term for mask manufacturability - $\lambda$ — Regularization strength ### 6.5 Gradient-Based OPC Using gradient descent: $$ M_{n+1} = M_n - \eta \frac{\partial \mathcal{L}}{\partial M} $$ The gradient requires computing: $$ \frac{\partial \mathcal{L}}{\partial M} = \sum_i 2 w_i \cdot EPE_i \cdot \frac{\partial EPE_i}{\partial M} + \lambda \frac{\partial R}{\partial M} $$ ### 6.6 Adjoint Method for Gradient Computation The sensitivity $\frac{\partial I}{\partial M}$ is computed efficiently using the adjoint formulation: $$ \frac{\partial \mathcal{L}}{\partial M} = \text{Re}\left\{ \tilde{M}^* \cdot \mathcal{F}\left\{ \sum_k \lambda_k \phi_k^* \cdot \mathcal{F}^{-1}\left\{ \phi_k \cdot \frac{\partial \mathcal{L}}{\partial I} \right\} \right\} \right\} $$ This avoids computing individual sensitivities for each mask pixel. ### 6.7 Mask Manufacturability Constraints Common regularization terms: - **Minimum feature size:** $R_1(M) = \sum \max(0, w_{min} - w_i)^2$ - **Minimum space:** $R_2(M) = \sum \max(0, s_{min} - s_i)^2$ - **Edge curvature:** $R_3(M) = \int |\kappa(s)|^2 ds$ - **Shot count:** $R_4(M) = N_{vertices}$ ## 7. Source-Mask Optimization (SMO) ### 7.1 Joint Optimization Formulation $$ \min_{S,M} \sum_{\text{patterns}} \|I(S,M) - T\|^2 + \lambda_S R_S(S) + \lambda_M R_M(M) $$ Where: - $S$ — Source intensity distribution - $M$ — Mask transmission function - $T$ — Target pattern - $R_S(S)$ — Source manufacturability regularization - $R_M(M)$ — Mask manufacturability regularization ### 7.2 Source Parameterization Pixelated source with constraints: $$ S(f,g) = \sum_{i,j} s_{ij} \cdot \text{rect}\left(\frac{f - f_i}{\Delta f}\right) \cdot \text{rect}\left(\frac{g - g_j}{\Delta g}\right) $$ Subject to: $$ 0 \leq s_{ij} \leq 1 \quad \forall i,j $$ $$ \sum_{i,j} s_{ij} = S_{total} $$ ### 7.3 Alternating Optimization **Algorithm:** 1. Initialize $S_0$, $M_0$ 2. For iteration $n = 1, 2, \ldots$: - Fix $S_n$, optimize $M_{n+1} = \arg\min_M \mathcal{L}(S_n, M)$ - Fix $M_{n+1}$, optimize $S_{n+1} = \arg\min_S \mathcal{L}(S, M_{n+1})$ 3. Repeat until convergence ### 7.4 Gradient Computation for SMO Source gradient: $$ \frac{\partial I}{\partial S}(x,y) = \left| \mathcal{F}^{-1}\{P \cdot \tilde{M}\}(x,y) \right|^2 $$ Mask gradient uses the adjoint method as in OPC. ## 8. Stochastic Effects and EUV ### 8.1 Photon Shot Noise Photon counts follow a Poisson distribution: $$ P(n) = \frac{\bar{n}^n e^{-\bar{n}}}{n!} $$ For EUV at 13.5 nm, photon energy is: $$ E_{photon} = \frac{hc}{\lambda} = \frac{1240 \text{ eV} \cdot \text{nm}}{13.5 \text{ nm}} \approx 92 \text{ eV} $$ Mean photons per pixel: $$ \bar{n} = \frac{\text{Dose} \cdot A_{pixel}}{E_{photon}} $$ ### 8.2 Relative Shot Noise $$ \frac{\sigma_n}{\bar{n}} = \frac{1}{\sqrt{\bar{n}}} $$ For 30 mJ/cm² dose and 10 nm pixel: $$ \bar{n} \approx 200 \text{ photons} \implies \sigma/\bar{n} \approx 7\% $$ ### 8.3 Line Edge Roughness (LER) Characterized by power spectral density: $$ PSD(f) = \frac{LER^2 \cdot \xi}{1 + (2\pi f \xi)^{2(1+H)}} $$ Where: - $LER$ — RMS line edge roughness (3σ value) - $\xi$ — Correlation length - $H$ — Hurst exponent (0 < H < 1) - $f$ — Spatial frequency ### 8.4 LER Decomposition $$ LER^2 = LWR^2/2 + \sigma_{placement}^2 $$ Where: - $LWR$ — Line width roughness - $\sigma_{placement}$ — Line placement error ### 8.5 Stochastic Defectivity Probability of printing failure (e.g., missing contact): $$ P_{fail} = 1 - \prod_{i} \left(1 - P_{fail,i}\right) $$ For a chip with $10^{10}$ contacts at 99.9999999% yield per contact: $$ P_{chip,fail} \approx 1\% $$ ### 8.6 Monte Carlo Simulation Steps 1. **Photon absorption:** Generate random events $\sim \text{Poisson}(\bar{n})$ 2. **Acid generation:** Each photon generates acid at random location 3. **Diffusion:** Brownian motion during PEB: $\langle r^2 \rangle = 6Dt$ 4. **Deprotection:** Local reaction based on acid concentration 5. **Development:** Cellular automata or level-set method ## 9. Multiple Patterning Mathematics ### 9.1 Graph Coloring Formulation When pitch $< \lambda/(2NA)$, single-exposure patterning fails. **Graph construction:** - Nodes $V$ = features (polygons) - Edges $E$ = spacing conflicts (features too close for one mask) - Colors $C$ = different masks ### 9.2 k-Colorability Problem Find assignment $c: V \rightarrow \{1, 2, \ldots, k\}$ such that: $$ c(u) \neq c(v) \quad \forall (u,v) \in E $$ This is **NP-complete** for $k \geq 3$. ### 9.3 Integer Linear Programming (ILP) Formulation Binary variables: $x_{v,c} \in \{0,1\}$ (node $v$ assigned color $c$) **Objective:** $$ \min \sum_{(u,v) \in E} \sum_c x_{u,c} \cdot x_{v,c} \cdot w_{uv} $$ **Constraints:** $$ \sum_{c=1}^{k} x_{v,c} = 1 \quad \forall v \in V $$ $$ x_{u,c} + x_{v,c} \leq 1 \quad \forall (u,v) \in E, \forall c $$ ### 9.4 Self-Aligned Multiple Patterning (SADP) Spacer pitch after $n$ iterations: $$ p_n = \frac{p_0}{2^n} $$ Where $p_0$ is the initial (lithographic) pitch. ## 10. Process Control Mathematics ### 10.1 Overlay Control Polynomial model across the wafer: $$ OVL_x(x,y) = a_0 + a_1 x + a_2 y + a_3 xy + a_4 x^2 + a_5 y^2 + \ldots $$ **Physical interpretation:** | Coefficient | Physical Effect | |:------------|:----------------| | $a_0$ | Translation | | $a_1$, $a_2$ | Scale (magnification) | | $a_3$ | Rotation | | $a_4$, $a_5$ | Non-orthogonality | ### 10.2 Overlay Correction Least squares fitting: $$ \mathbf{a} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y} $$ Where $\mathbf{X}$ is the design matrix and $\mathbf{y}$ is measured overlay. ### 10.3 Run-to-Run Control — EWMA Exponentially Weighted Moving Average: $$ \hat{y}_{n+1} = \lambda y_n + (1-\lambda)\hat{y}_n $$ Where: - $\hat{y}_{n+1}$ — Predicted output - $y_n$ — Measured output at step $n$ - $\lambda$ — Smoothing factor $(0 < \lambda < 1)$ ### 10.4 CDU Variance Decomposition $$ \sigma^2_{total} = \sigma^2_{local} + \sigma^2_{field} + \sigma^2_{wafer} + \sigma^2_{lot} $$ **Sources:** - **Local:** Shot noise, LER, resist - **Field:** Lens aberrations, mask - **Wafer:** Focus/dose uniformity - **Lot:** Tool-to-tool variation ### 10.5 Process Capability Index $$ C_{pk} = \min\left(\frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma}\right) $$ Where: - $USL$, $LSL$ — Upper/lower specification limits - $\mu$ — Process mean - $\sigma$ — Process standard deviation ## 11. Machine Learning Integration ### 11.1 Applications Overview | Application | Method | Purpose | |:------------|:-------|:--------| | Hotspot detection | CNNs | Predict yield-limiting patterns | | OPC acceleration | Neural surrogates | Replace expensive physics sims | | Metrology | Regression models | Virtual measurements | | Defect classification | Image classifiers | Automated inspection | | Etch prediction | Physics-informed NN | Predict etch profiles | ### 11.2 Neural Network Surrogate Model A neural network approximates the forward model: $$ \hat{I}(x,y) = f_{NN}(\text{mask}, \text{source}, \text{focus}, \text{dose}; \theta) $$ Training objective: $$ \theta^* = \arg\min_\theta \sum_{i=1}^{N} \|f_{NN}(M_i; \theta) - I_i^{rigorous}\|^2 $$ ### 11.3 Hotspot Detection with CNNs Binary classification: $$ P(\text{hotspot} | \text{pattern}) = \sigma(\mathbf{W} \cdot \mathbf{features} + b) $$ Where $\sigma$ is the sigmoid function and features are extracted by convolutional layers. ### 11.4 Inverse Lithography with Deep Learning Generator network $G$ maps target to mask: $$ \hat{M} = G(T; \theta_G) $$ Training with physics-based loss: $$ \mathcal{L} = \|F(G(T)) - T\|^2 + \lambda \cdot R(G(T)) $$ ## 12. Mathematical Disciplines | Mathematical Domain | Application in Lithography | |:--------------------|:---------------------------| | **Fourier Optics** | Image formation, aberrations, frequency analysis | | **Electromagnetic Theory** | RCWA, FDTD, rigorous mask simulation | | **Partial Differential Equations** | Resist diffusion, development, reaction kinetics | | **Optimization Theory** | OPC, SMO, inverse problems, gradient descent | | **Probability & Statistics** | Shot noise, LER, SPC, process control | | **Linear Algebra** | Matrix methods, eigendecomposition, least squares | | **Graph Theory** | Multiple patterning decomposition, routing | | **Numerical Methods** | FEM, finite differences, Monte Carlo | | **Machine Learning** | Surrogate models, pattern recognition, CNNs | | **Signal Processing** | Image analysis, metrology, filtering | ## Key Equations Quick Reference ### Imaging $$ I(x,y) = \sum_{k} \lambda_k \left| \mathcal{F}^{-1}\{\phi_k \cdot \tilde{M}\} \right|^2 $$ ### Resolution $$ R = k_1 \frac{\lambda}{NA} $$ ### Depth of Focus $$ DOF = k_2 \frac{\lambda}{NA^2} $$ ### Development Rate $$ r(m) = r_{max} \cdot \frac{(a+1)(1-m)^n}{a + (1-m)^n} + r_{min} $$ ### LER Power Spectrum $$ PSD(f) = \frac{LER^2 \cdot \xi}{1 + (2\pi f \xi)^{2(1+H)}} $$ ### OPC Cost Function $$ \mathcal{L}(M) = \sum_{i} w_i \cdot EPE_i^2 + \lambda \cdot R(M) $$
Model resist exposure and development.
Improved version of Llama with better safety and performance.
llama.cpp runs LLMs efficiently on CPU. Quantization. Local inference.
Llama Guard classifies unsafe content. Input and output filtering. Meta open source.
Meta's open-source foundation language model family.
LlamaIndex enables data-augmented LLM applications with retrieval and indexing.
Data framework for LLM applications focused on ingestion and retrieval.
LlamaIndex specializes in RAG and data connectors. Index various data sources.
Visual instruction tuning for LLMs.
LLaVA fine-tunes LLM with visual instruction data. Strong open multimodal model.
Llemma is open math model. Trained on Proof-Pile. Math reasoning.
LLM agents use language models to autonomously pursue goals through iterative planning and tool use.
LLM-as-judge uses strong model to evaluate weaker ones. Scales better than human eval, correlates reasonably.
Use strong LLM to evaluate other model outputs.
# LLM Mathematics Modeling
## 1. Mathematical Foundations of LLMs
### 1.1 Transformer Architecture Mathematics
The transformer architecture (Vaswani et al., 2017) consists of these core mathematical operations:
#### Self-Attention Mechanism
The scaled dot-product attention is defined as:
$$
\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V
$$
**Variable Definitions:**
- $Q$ — Query matrix: $Q = XW^Q$ where $W^Q \in \mathbb{R}^{d_{model} \times d_k}$
- $K$ — Key matrix: $K = XW^K$ where $W^K \in \mathbb{R}^{d_{model} \times d_k}$
- $V$ — Value matrix: $V = XW^V$ where $W^V \in \mathbb{R}^{d_{model} \times d_v}$
- $d_k$ — Dimension of key vectors (scaling factor prevents gradient vanishing)
- $\sqrt{d_k}$ — Scaling factor to normalize dot products
#### Multi-Head Attention
$$
\text{MultiHead}(Q, K, V) = \text{Concat}(\text{head}_1, ..., \text{head}_h)W^O
$$
Where each head is computed as:
$$
\text{head}_i = \text{Attention}(QW_i^Q, KW_i^K, VW_i^V)
$$
**Parameters:**
- $h$ — Number of attention heads (typically 8, 12, 32, or more)
- $W^O \in \mathbb{R}^{hd_v \times d_{model}}$ — Output projection matrix
#### Feed-Forward Networks (FFN)
Position-wise feed-forward network applied to each position:
$$
\text{FFN}(x) = \max(0, xW_1 + b_1)W_2 + b_2
$$
Or with GELU activation (more common in modern LLMs):
$$
\text{FFN}(x) = \text{GELU}(xW_1 + b_1)W_2 + b_2
$$
Where GELU is defined as:
$$
\text{GELU}(x) = x \cdot \Phi(x) = x \cdot \frac{1}{2}\left[1 + \text{erf}\left(\frac{x}{\sqrt{2}}\right)\right]
$$
#### Layer Normalization
$$
\text{LayerNorm}(x) = \gamma \cdot \frac{x - \mu}{\sigma + \epsilon} + \beta
$$
**Where:**
- $\mu = \frac{1}{d}\sum_{i=1}^{d} x_i$ — Mean across features
- $\sigma = \sqrt{\frac{1}{d}\sum_{i=1}^{d}(x_i - \mu)^2}$ — Standard deviation
- $\gamma, \beta$ — Learnable scale and shift parameters
- $\epsilon$ — Small constant for numerical stability (typically $10^{-5}$)
### 1.2 Statistical Language Modeling
#### Autoregressive Probability Model
LLMs estimate the conditional probability distribution:
$$
P(w_t | w_1, w_2, ..., w_{t-1}; \theta)
$$
The joint probability of a sequence factorizes as:
$$
P(w_1, w_2, ..., w_T) = \prod_{t=1}^{T} P(w_t | w_{
Query language for constraining LLM generation.
LM Studio provides GUI for local LLMs. Download, run, chat. User-friendly.
Platform for comparing models via human voting.
Load balancer distributes requests across model servers. Nginx/HAProxy route traffic. Health checks ensure availability.
Ensure experts are used roughly equally to avoid underutilization.
Load balancing distributes work evenly preventing bottlenecks in agent systems.