open weight,partial open,middle
Open weights but not training code/data. Middle ground. Most open LLMs are this.
217 technical terms and definitions
Open weights but not training code/data. Middle ground. Most open LLMs are this.
Answer using retrieved documents.
Free-form conversations.
Target has unknown classes.
Open-source models have publicly available weights and training details.
Detect any object described in text.
Detect and classify unknown objects.
OpenAI text-embedding-ada-002. Easy to use. Decent quality.
OpenAI SDK available in Python and TypeScript. Official client. Streaming, tools.
OpenAPI (Swagger) specifies REST APIs. Auto-generate docs and clients.
OpenHermes is community fine-tune. Teknium. Strong instruction following.
Unified observability framework.
OpenVINO optimizes models for Intel hardware using graph optimization and kernel libraries.
Intel's toolkit for optimizing and deploying models.
Operating expenses convert inventory into throughput representing ongoing costs.
Test under normal conditions.
Stress level for reliable operation.
Operation primitives are basic computational blocks like convolutions pooling and skip connections used in NAS.
Change execution order for efficiency.
Operational carbon emissions occur during product use phase from energy consumption.
Verify operational performance.
Operator fusion combines multiple operations into single kernel reducing memory traffic and latency.
Merge consecutive operations to reduce memory transfers.
Operators are primitives (matmul, conv, attention). Kernels are hardware implementations. Optimize hot operators.
OPT is Meta Open Pre-trained Transformer. Various sizes up to 175B.
Compare measured spectra to library.
Optical emission failure analysis detects photon emission from hot carriers or breakdown events localizing defects.
Reference surface for flatness checks.
Optical flow estimation computes pixel motion between frames for interpolation and stabilization.
Estimate pixel motion between frames.
Estimate motion between frames.
# Optical Proximity Correction (OPC): Mathematical Modeling ## 1. The Physical Problem When projecting mask patterns onto a silicon wafer using light (typically 193nm DUV or 13.5nm EUV), several phenomena distort the image: - **Diffraction**: Light bending around features near or below the wavelength - **Interference**: Constructive/destructive wave interactions - **Optical aberrations**: Lens imperfections - **Resist effects**: Photochemical behavior during exposure and development - **Etch loading**: Pattern-density-dependent etch rates **OPC pre-distorts the mask** so that after all these effects, the printed pattern matches the design intent. ### Key Parameters | Parameter | Typical Value | Description | |-----------|---------------|-------------| | $\lambda$ | 193 nm (DUV), 13.5 nm (EUV) | Exposure wavelength | | $NA$ | 0.33 - 1.35 | Numerical aperture | | $k_1$ | 0.25 - 0.40 | Process factor | | Resolution | $\frac{k_1 \lambda}{NA}$ | Minimum feature size | ## 2. Hopkins Imaging Model The foundational mathematical framework for **partially coherent lithographic imaging** comes from Hopkins' theory (1953). ### Aerial Image Intensity The aerial image intensity at position $\mathbf{r} = (x, y)$ is given by: $$ I(\mathbf{r}) = \iiint\!\!\!\iint TCC(\mathbf{f}_1, \mathbf{f}_2) \cdot M(\mathbf{f}_1) \cdot M^*(\mathbf{f}_2) \cdot e^{2\pi i (\mathbf{f}_1 - \mathbf{f}_2) \cdot \mathbf{r}} \, d\mathbf{f}_1 \, d\mathbf{f}_2 $$ Where: - $M(\mathbf{f})$ — Fourier transform of the mask transmission function - $M^*(\mathbf{f})$ — Complex conjugate of $M(\mathbf{f})$ - $TCC$ — Transmission Cross Coefficient - $\mathbf{f} = (f_x, f_y)$ — Spatial frequency coordinates ### Transmission Cross Coefficient (TCC) The TCC encodes the optical system characteristics: $$ TCC(\mathbf{f}_1, \mathbf{f}_2) = \iint J(\mathbf{f}) \cdot H(\mathbf{f} + \mathbf{f}_1) \cdot H^*(\mathbf{f} + \mathbf{f}_2) \, d\mathbf{f} $$ Where: - $J(\mathbf{f})$ — Source (illumination) intensity distribution (mutual intensity at mask) - $H(\mathbf{f})$ — Pupil function of the projection lens - $H^*(\mathbf{f})$ — Complex conjugate of pupil function ### Pupil Function For an ideal circular aperture: $$ H(\mathbf{f}) = \begin{cases} 1 & \text{if } |\mathbf{f}| \leq \frac{NA}{\lambda} \\ 0 & \text{otherwise} \end{cases} $$ With aberrations included: $$ H(\mathbf{f}) = P(\mathbf{f}) \cdot e^{i \cdot W(\mathbf{f})} $$ Where $W(\mathbf{f})$ is the wavefront aberration function (Zernike polynomial expansion). ## 3. SOCS Decomposition ### Sum of Coherent Systems To make computation tractable, the TCC (a Hermitian matrix when discretized) is decomposed via **eigenvalue decomposition**: $$ TCC(\mathbf{f}_1, \mathbf{f}_2) = \sum_{n=1}^{N} \lambda_n \cdot \phi_n(\mathbf{f}_1) \cdot \phi_n^*(\mathbf{f}_2) $$ Where: - $\lambda_n$ — Eigenvalues (sorted in descending order) - $\phi_n(\mathbf{f})$ — Eigenvectors (orthonormal kernels) ### Image Computation This allows the image to be computed as a **sum of coherent images**: $$ I(\mathbf{r}) = \sum_{n=1}^{N} \lambda_n \left| \mathcal{F}^{-1}\{\phi_n \cdot M\} \right|^2 $$ Or equivalently: $$ I(\mathbf{r}) = \sum_{n=1}^{N} \lambda_n \left| I_n(\mathbf{r}) \right|^2 $$ Where each coherent image is: $$ I_n(\mathbf{r}) = \mathcal{F}^{-1}\{\phi_n(\mathbf{f}) \cdot M(\mathbf{f})\} $$ ### Practical Considerations - **Eigenvalue decay**: $\lambda_n$ decay rapidly; typically only 10–50 terms needed - **Speedup**: Converts one $O(N^4)$ partially coherent calculation into $\sim$20 $O(N^2 \log N)$ FFT operations - **Accuracy**: Trade-off between number of terms and simulation accuracy ## 4. OPC Problem Formulation ### Forward Problem Given mask $M(\mathbf{r})$, predict wafer pattern $W(\mathbf{r})$: $$ M \xrightarrow{\text{optics}} I(\mathbf{r}) \xrightarrow{\text{resist}} R(\mathbf{r}) \xrightarrow{\text{etch}} W(\mathbf{r}) $$ **Mathematical chain:** 1. **Optical Model**: $I = \mathcal{O}(M)$ — Hopkins/SOCS imaging 2. **Resist Model**: $R = \mathcal{R}(I)$ — Threshold or convolution model 3. **Etch Model**: $W = \mathcal{E}(R)$ — Etch bias and loading ### Inverse Problem (OPC) Given target pattern $T(\mathbf{r})$, find mask $M(\mathbf{r})$ such that: $$ W(M) \approx T $$ **This is fundamentally ill-posed:** - Non-unique: Many masks could produce similar results - Nonlinear: The imaging equation is quadratic in mask transmission - Constrained: Mask must be manufacturable ## 5. Edge Placement Error Minimization ### Objective Function The standard OPC objective minimizes **Edge Placement Error (EPE)**: $$ \min_M \mathcal{L}(M) = \sum_{i=1}^{N_{\text{edges}}} w_i \cdot \text{EPE}_i^2 $$ Where: $$ \text{EPE}_i = x_i^{\text{printed}} - x_i^{\text{target}} $$ - $x_i^{\text{printed}}$ — Actual edge position after lithography - $x_i^{\text{target}}$ — Desired edge position from design - $w_i$ — Weight for edge $i$ (can prioritize critical features) ### Constraints Subject to mask manufacturability: - **Minimum feature size**: $\text{CD}_{\text{mask}} \geq \text{CD}_{\min}$ - **Minimum spacing**: $\text{Space}_{\text{mask}} \geq \text{Space}_{\min}$ - **Maximum jog**: Limit on edge fragmentation complexity - **MEEF constraint**: Mask Error Enhancement Factor within spec ### Iterative Edge-Based OPC Algorithm The classic algorithm moves mask edges iteratively: $$ \Delta x^{(n+1)} = \Delta x^{(n)} - \alpha \cdot \text{EPE}^{(n)} $$ Where: - $\Delta x$ — Edge movement from original position - $\alpha$ — Damping factor (typically 0.3–0.8) - $n$ — Iteration number **Convergence criterion:** $$ \max_i |\text{EPE}_i| < \epsilon \quad \text{or} \quad n > n_{\max} $$ ### Gradient Computation Using the chain rule: $$ \frac{\partial \text{EPE}}{\partial m} = \frac{\partial \text{EPE}}{\partial I} \cdot \frac{\partial I}{\partial m} $$ Where $m$ represents mask parameters (edge positions, segment lengths). At a contour position where $I = I_{th}$: $$ \frac{\partial x_{\text{edge}}}{\partial m} = -\frac{1}{|\nabla I|} \cdot \frac{\partial I}{\partial m} $$ The **image log-slope (ILS)** is a key metric: $$ \text{ILS} = \frac{1}{I} \left| \frac{\partial I}{\partial x} \right|_{I = I_{th}} $$ Higher ILS → better process latitude, lower EPE sensitivity. ## 6. Resist Modeling ### Threshold Model (Simplest) The resist develops where intensity exceeds threshold: $$ R(\mathbf{r}) = \begin{cases} 1 & \text{if } I(\mathbf{r}) > I_{th} \\ 0 & \text{otherwise} \end{cases} $$ The printed contour is the $I_{th}$ isoline. ### Variable Threshold Resist (VTR) The threshold varies with local context: $$ I_{th}(\mathbf{r}) = I_{th,0} + \beta_1 \cdot \bar{I}_{\text{local}} + \beta_2 \cdot \nabla^2 I + \beta_3 \cdot (\nabla I)^2 + \ldots $$ Where: - $I_{th,0}$ — Base threshold - $\bar{I}_{\text{local}}$ — Local average intensity (density effect) - $\nabla^2 I$ — Laplacian (curvature effect) - $\beta_i$ — Fitted coefficients ### Compact Phenomenological Models For OPC speed, empirical models are used instead of physics-based resist simulation: $$ R(\mathbf{r}) = \sum_{j=1}^{N_k} w_j \cdot \left( K_j \otimes g_j(I) \right) $$ Where: - $K_j$ — Convolution kernels (typically Gaussians): $$K_j(\mathbf{r}) = \frac{1}{2\pi\sigma_j^2} \exp\left( -\frac{|\mathbf{r}|^2}{2\sigma_j^2} \right)$$ - $g_j(I)$ — Nonlinear functions: $I$, $I^2$, $\log(I)$, $\sqrt{I}$, etc. - $w_j$ — Fitted weights - $\otimes$ — Convolution operator ### Physical Interpretation | Kernel Width | Physical Effect | |--------------|-----------------| | Small $\sigma$ | Optical proximity effects | | Medium $\sigma$ | Acid/base diffusion in resist | | Large $\sigma$ | Long-range loading effects | ### Model Calibration Parameters are fitted to wafer measurements: $$ \min_{\theta} \sum_{k=1}^{N_{\text{test}}} \left( \text{CD}_k^{\text{measured}} - \text{CD}_k^{\text{model}}(\theta) \right)^2 + \lambda \|\theta\|^2 $$ Where: - $\theta = \{w_j, \sigma_j, \beta_i, \ldots\}$ — Model parameters - $\lambda \|\theta\|^2$ — Regularization term - Test structures: Lines, spaces, contacts, line-ends at various pitches/densities ## 7. Inverse Lithography Technology ### Full Optimization Formulation ILT treats the mask as a continuous optimization variable (pixelated): $$ \min_{M} \mathcal{L}(M) = \| W(M) - T \|^2 + \lambda \cdot \mathcal{R}(M) $$ Where: - $W(M)$ — Predicted wafer pattern - $T$ — Target pattern - $\mathcal{R}(M)$ — Regularization for manufacturability - $\lambda$ — Regularization weight ### Cost Function Components **Pattern Fidelity Term:** $$ \mathcal{L}_{\text{fidelity}} = \int \left( W(\mathbf{r}) - T(\mathbf{r}) \right)^2 d\mathbf{r} $$ Or in discrete form: $$ \mathcal{L}_{\text{fidelity}} = \sum_{\mathbf{r} \in \text{grid}} \left( W(\mathbf{r}) - T(\mathbf{r}) \right)^2 $$ ### Regularization Terms **Total Variation** (promotes piecewise constant, sharp edges): $$ \mathcal{R}_{TV}(M) = \int |\nabla M| \, d\mathbf{r} = \int \sqrt{\left(\frac{\partial M}{\partial x}\right)^2 + \left(\frac{\partial M}{\partial y}\right)^2} \, d\mathbf{r} $$ **Curvature Penalty** (promotes smooth contours): $$ \mathcal{R}_{\kappa}(M) = \oint_{\partial M} \kappa^2 \, ds $$ Where $\kappa$ is the local curvature of the mask boundary. **Minimum Feature Size** (MRC - Mask Rule Check): $$ \mathcal{R}_{MRC}(M) = \sum_{\text{violations}} \text{penalty}(\text{violation severity}) $$ **Sigmoid Regularization** (push mask toward binary): $$ \mathcal{R}_{\text{binary}}(M) = \int M(1-M) \, d\mathbf{r} $$ ### Level Set Formulation Represent the mask boundary implicitly via level set function $\phi(\mathbf{r})$: - Inside chrome: $\phi(\mathbf{r}) < 0$ - Outside chrome: $\phi(\mathbf{r}) > 0$ - Boundary: $\phi(\mathbf{r}) = 0$ **Evolution equation:** $$ \frac{\partial \phi}{\partial t} = -v \cdot |\nabla \phi| $$ Where velocity $v$ is derived from the cost function gradient: $$ v = -\frac{\delta \mathcal{L}}{\delta \phi} $$ **Advantages:** - Naturally handles topological changes (features splitting/merging) - Implicit curvature regularization available - Well-studied numerical methods ### Optimization Algorithms Since the problem is **non-convex**, various methods are used: 1. **Gradient Descent with Momentum:** $$ M^{(n+1)} = M^{(n)} - \eta \nabla_M \mathcal{L} + \mu \left( M^{(n)} - M^{(n-1)} \right) $$ 2. **Conjugate Gradient:** $$ d^{(n+1)} = -\nabla \mathcal{L}^{(n+1)} + \beta^{(n)} d^{(n)} $$ 3. **Adam Optimizer:** $$ m_t = \beta_1 m_{t-1} + (1-\beta_1) g_t $$ $$ v_t = \beta_2 v_{t-1} + (1-\beta_2) g_t^2 $$ $$ M_{t+1} = M_t - \eta \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon} $$ 4. **Genetic Algorithms** (for discrete/combinatorial aspects) 5. **Simulated Annealing** (for escaping local minima) ## 8. Source-Mask Optimization ### Joint Optimization SMO optimizes both illumination source $S$ and mask $M$ simultaneously: $$ \min_{S, M} \sum_{j \in \text{PW}} w_j \cdot \| W(S, M, \text{condition}_j) - T \|^2 $$ ### Source Parameterization **Pixelated Source:** $$ S = \{s_{ij}\} \quad \text{where } s_{ij} \in [0, 1] $$ Each pixel in the pupil plane is a free variable. **Parametric Source:** - Annular: $(R_{\text{inner}}, R_{\text{outer}})$ - Quadrupole: $(R, \theta, \sigma)$ - Freeform: Spline or Zernike coefficients ### Alternating Optimization **Algorithm:** ``` Initialize: S⁰, M⁰ for k = 1 to max_iter: # Step 1: Fix S, optimize M (standard OPC) M^k = argmin_M L(S^(k-1), M) # Step 2: Fix M, optimize S S^k = argmin_S L(S, M^k) # Check convergence if |L^k - L^(k-1)| < tolerance: break ``` **Note:** Step 2 is often convex in $S$ when $M$ is fixed (linear in source pixels for intensity-based metrics). ### Mathematical Form for Source Optimization When mask is fixed, the image is linear in source: $$ I(\mathbf{r}; S) = \sum_{ij} s_{ij} \cdot I_{ij}(\mathbf{r}) $$ Where $I_{ij}$ is the image contribution from source pixel $(i,j)$. This makes source optimization a **quadratic program** (convex if cost is convex in $I$). ## 9. Process Window Optimization ### Multi-Condition Optimization Real manufacturing has variations. Robust OPC optimizes across a **process window (PW)**: $$ \min_M \sum_{j \in \text{PW}} w_j \cdot \mathcal{L}(M, \text{condition}_j) $$ ### Process Window Dimensions | Dimension | Typical Range | Effect | |-----------|---------------|--------| | Focus | $\pm 50$ nm | Defocus blur | | Dose | $\pm 3\%$ | Threshold shift | | Mask CD | $\pm 2$ nm | Feature size bias | | Aberrations | Per-lens | Pattern distortion | ### Worst-Case (Minimax) Formulation $$ \min_M \max_{j \in \text{PW}} \text{EPE}_j(M) $$ This is more conservative but ensures robustness. ### Soft Constraints via Barrier Functions $$ \mathcal{L}_{PW}(M) = \sum_j w_j \cdot \text{EPE}_j^2 + \mu \sum_j \sum_i \max(0, |\text{EPE}_{ij}| - \text{spec})^2 $$ ### Process Window Metrics **Common Process Window (CPW):** $$ \text{CPW} = \text{Focus Range} \times \text{Dose Range} $$ Where all specs are simultaneously met. **Exposure Latitude (EL):** $$ \text{EL} = \frac{\Delta \text{Dose}}{\text{Dose}_{\text{nom}}} \times 100\% $$ **Depth of Focus (DOF):** $$ \text{DOF} = \text{Focus range where } |\text{EPE}| < \text{spec} $$ ## 10. Stochastic Effects (EUV) At EUV wavelengths (13.5 nm), **photon counts are low** and shot noise becomes significant. ### Photon Statistics Number of photons per pixel follows **Poisson distribution**: $$ P(n | \bar{n}) = \frac{\bar{n}^n e^{-\bar{n}}}{n!} $$ Where: $$ \bar{n} = \frac{E \cdot A \cdot \eta}{\frac{hc}{\lambda}} $$ - $E$ — Exposure dose (mJ/cm²) - $A$ — Pixel area - $\eta$ — Quantum efficiency - $\frac{hc}{\lambda}$ — Photon energy ### Signal-to-Noise Ratio $$ \text{SNR} = \frac{\bar{n}}{\sqrt{\bar{n}}} = \sqrt{\bar{n}} $$ For reliable imaging, need $\text{SNR} > 5$, requiring $\bar{n} > 25$ photons/pixel. ### Line Edge Roughness (LER) Random edge fluctuations characterized by: - **3σ LER**: $3 \times \text{standard deviation of edge position}$ - **Correlation length** $\xi$: Spatial extent of roughness **Power Spectral Density:** $$ \text{PSD}(f) = \frac{2\sigma^2 \xi}{1 + (2\pi f \xi)^{2\alpha}} $$ Where $\alpha$ is the roughness exponent (typically 0.5–1.0). ### Stochastic Defect Probability Probability of a stochastic failure (missing contact, bridging): $$ P_{\text{fail}} = 1 - \prod_{\text{features}} (1 - p_i) $$ For rare events, approximately: $$ P_{\text{fail}} \approx \sum_i p_i $$ ### Stochastic-Aware OPC Objective $$ \min_M \mathbb{E}[\text{EPE}^2] + \lambda_1 \cdot \text{Var}(\text{EPE}) + \lambda_2 \cdot P_{\text{fail}} $$ ### Monte Carlo Simulation For stochastic modeling: 1. Sample photon arrival: $n_{ij} \sim \text{Poisson}(\bar{n}_{ij})$ 2. Simulate acid generation: Proportional to absorbed photons 3. Simulate diffusion: Random walk or stochastic PDE 4. Simulate development: Threshold with noise 5. Repeat $N$ times, compute statistics ## 11. Machine Learning Approaches ### Neural Network Forward Models Train networks to approximate expensive simulations: $$ \hat{I} = f_\theta(M) \approx I_{\text{optical}}(M) $$ **Architectures:** - **CNN**: Convolutional neural networks for local pattern effects - **U-Net**: Encoder-decoder for image-to-image translation - **GAN**: Generative adversarial networks for realistic image generation **Training:** $$ \min_\theta \sum_{k} \| f_\theta(M_k) - I_k^{\text{simulation}} \|^2 $$ ### End-to-End ILT with Deep Learning Directly predict corrected masks: $$ \hat{M}_{\text{OPC}} = G_\theta(T) $$ **Training data:** Pairs $(T, M_{\text{optimal}})$ from conventional ILT. **Loss function:** $$ \mathcal{L} = \| W(G_\theta(T)) - T \|^2 + \lambda \| G_\theta(T) - M_{\text{ref}} \|^2 $$ ### Hybrid Approaches Combine ML speed with physics accuracy: 1. **ML Initialization**: $M^{(0)} = G_\theta(T)$ 2. **Physics Refinement**: Run conventional OPC starting from $M^{(0)}$ **Benefits:** - Faster convergence (good starting point) - Physics ensures accuracy - ML handles global pattern context ### Neural Network Architectures for OPC | Architecture | Use Case | Advantages | |--------------|----------|------------| | CNN | Local correction prediction | Fast inference | | U-Net | Full mask prediction | Multi-scale features | | GAN | Realistic mask generation | Sharp boundaries | | Transformer | Global context | Long-range dependencies | | Physics-Informed NN | Constrained prediction | Respects physics | ## 12. Computational Complexity ### Scale of Full-Chip OPC - **Features per chip**: $10^9 - 10^{10}$ - **Evaluation points**: $\sim 10^{12}$ (multiple points per feature) - **Iterations**: 10–50 per feature - **Optical simulations**: $O(N \log N)$ per FFT ### Complexity Analysis **Single feature OPC:** $$ T_{\text{feature}} = O(N_{\text{iter}} \times N_{\text{SOCS}} \times N_{\text{grid}} \log N_{\text{grid}}) $$ **Full chip:** $$ T_{\text{chip}} = O(N_{\text{features}} \times T_{\text{feature}}) $$ **Result:** Hours to days on large compute clusters. ### Acceleration Strategies **Hierarchical Processing:** - Identify repeated cells (memory arrays, standard cells) - Compute OPC once, reuse for identical instances - Speedup: $10\times - 100\times$ for regular designs **GPU Parallelization:** - FFTs parallelize well on GPUs - Convolutions map to tensor operations - Multiple features processed simultaneously - Speedup: $10\times - 50\times$ **Approximate Models:** - **Kernel-based**: Pre-compute influence functions - **Variable resolution**: Fine grid only near edges - **Neural surrogates**: Replace simulation with inference **Domain Decomposition:** - Divide chip into tiles - Process tiles in parallel - Handle tile boundaries with overlap or iteration ## 13. Mathematical Toolkit Summary | Domain | Techniques | |--------|-----------| | **Optics** | Fourier transforms, Hopkins theory, SOCS decomposition, Abbe imaging | | **Optimization** | Gradient descent, conjugate gradient, level sets, genetic algorithms, simulated annealing | | **Linear Algebra** | Eigendecomposition (TCC), sparse matrices, SVD, matrix factorization | | **PDEs** | Diffusion equations (resist), level set evolution, Hamilton-Jacobi | | **Statistics** | Poisson processes, Monte Carlo, stochastic simulation, Bayesian inference | | **Machine Learning** | CNNs, GANs, U-Net, transformers, physics-informed neural networks | | **Computational Geometry** | Polygon operations, fragmentation, contour extraction, Boolean operations | | **Numerical Methods** | FFT, finite differences, quadrature, interpolation | ## Equations Quick Reference ### Hopkins Imaging $$ I(\mathbf{r}) = \iiint\!\!\!\iint TCC(\mathbf{f}_1, \mathbf{f}_2) \cdot M(\mathbf{f}_1) \cdot M^*(\mathbf{f}_2) \cdot e^{2\pi i (\mathbf{f}_1 - \mathbf{f}_2) \cdot \mathbf{r}} \, d\mathbf{f}_1 \, d\mathbf{f}_2 $$ ### SOCS Image $$ I(\mathbf{r}) = \sum_{n=1}^{N} \lambda_n \left| \mathcal{F}^{-1}\{\phi_n \cdot M\} \right|^2 $$ ### EPE Minimization $$ \min_M \sum_{i} w_i \left( x_i^{\text{printed}} - x_i^{\text{target}} \right)^2 $$ ### ILT Cost Function $$ \min_{M} \| W(M) - T \|^2 + \lambda \cdot \mathcal{R}(M) $$ ### Level Set Evolution $$ \frac{\partial \phi}{\partial t} = -v \cdot |\nabla \phi| $$ ### Poisson Photon Statistics $$ P(n | \bar{n}) = \frac{\bar{n}^n e^{-\bar{n}}}{n!} $$
Feature size depends on neighboring patterns.
# Optics and Lithography Mathematical Modeling A comprehensive guide to the mathematical foundations of semiconductor lithography, covering electromagnetic theory, Fourier optics, optimization mathematics, and stochastic processes. 1. Fundamental Imaging Theory 1.1 The Resolution Limits The Rayleigh equations define the physical limits of optical lithography: Resolution: $$ R = k_1 \cdot \frac{\lambda}{NA} $$ Depth of Focus: $$ DOF = k_2 \cdot \frac{\lambda}{NA^2} $$ Parameter Definitions: - $\lambda$ — Wavelength of light (193nm for ArF immersion, 13.5nm for EUV) - $NA = n \cdot \sin(\theta)$ — Numerical aperture - $n$ — Refractive index of immersion medium - $\theta$ — Half-angle of the lens collection cone - $k_1, k_2$ — Process-dependent factors (typically $k_1 \geq 0.25$ from Rayleigh criterion; modern processes achieve $k_1 \sim 0.3–0.4$) Fundamental Tension: - Improving resolution requires: - Increasing $NA$, OR - Decreasing $\lambda$ - Both degrade depth of focus quadratically ($\propto NA^{-2}$) 2. Fourier Optics Framework The projection lithography system is modeled as a linear shift-invariant system in the Fourier domain. 2.1 Coherent Imaging For a perfectly coherent source, the image field is given by convolution: $$ E_{image}(x,y) = E_{object}(x,y) \otimes h(x,y) $$ In frequency space (via Fourier transform): $$ \tilde{E}_{image}(f_x, f_y) = \tilde{E}_{object}(f_x, f_y) \cdot H(f_x, f_y) $$ Key Components: - $h(x,y)$ — Amplitude Point Spread Function (PSF) - $H(f_x, f_y)$ — Coherent Transfer Function (pupil function) - Typically a `circ` function for circular aperture - Cuts off spatial frequencies beyond $\frac{NA}{\lambda}$ 2.2 Partially Coherent Imaging — The Hopkins Formulation Real lithography systems operate in the partially coherent regime : $$ \sigma = 0.3 - 0.9 $$ where $\sigma$ is the ratio of condenser NA to objective NA. Transmission Cross Coefficient (TCC) Integral The aerial image intensity is: $$ I(x,y) = \int\!\!\!\int\!\!\!\int\!\!\!\int TCC(f_1,g_1,f_2,g_2) \cdot M(f_1,g_1) \cdot M^*(f_2,g_2) \cdot e^{2\pi i[(f_1-f_2)x + (g_1-g_2)y]} \, df_1 \, dg_1 \, df_2 \, dg_2 $$ The TCC itself is defined as: $$ TCC(f_1,g_1,f_2,g_2) = \int\!\!\!\int J(f,g) \cdot P(f+f_1, g+g_1) \cdot P^*(f+f_2, g+g_2) \, df \, dg $$ Parameter Definitions: - $J(f,g)$ — Source intensity distribution (conventional, annular, dipole, quadrupole, or freeform) - $P$ — Pupil function (including aberrations) - $M$ — Mask transmission/diffraction spectrum - $M^*$ — Complex conjugate of mask spectrum Computational Note: This is a 4D integral over frequency space for every image point — computationally expensive but essential for accuracy. 3. Computational Acceleration: SOCS Decomposition Direct TCC computation is prohibitive. The Sum of Coherent Systems (SOCS) method uses eigendecomposition: $$ TCC(f_1,g_1,f_2,g_2) \approx \sum_{i=1}^{N} \lambda_i \cdot \phi_i(f_1,g_1) \cdot \phi_i^*(f_2,g_2) $$ Decomposition Components: - $\lambda_i$ — Eigenvalues (sorted by magnitude) - $\phi_i$ — Eigenfunctions (kernels) The image becomes a sum of coherent images: $$ I(x,y) \approx \sum_{i=1}^{N} \lambda_i \cdot \left| m(x,y) \otimes \phi_i(x,y) \right|^2 $$ Computational Properties: - Typically $N = 10–50$ kernels capture $>99\%$ of imaging behavior - Each convolution computed via FFT - Complexity: $O(N \log N)$ per kernel 4. Vector Electromagnetic Effects at High NA When $NA > 0.7$ (immersion lithography reaches $NA \sim 1.35$), scalar diffraction theory fails. The vector nature of light must be modeled. 4.1 Richards-Wolf Vector Diffraction The electric field near focus: $$ \mathbf{E}(r,\psi,z) = -\frac{ikf}{2\pi} \int_0^{\theta_{max}} \int_0^{2\pi} \mathbf{A}(\theta,\phi) \cdot P(\theta,\phi) \cdot e^{ik[z\cos\theta + r\sin\theta\cos(\phi-\psi)]} \sin\theta \, d\theta \, d\phi $$ Variables: - $\mathbf{A}(\theta,\phi)$ — Polarization-dependent amplitude vector - $P(\theta,\phi)$ — Pupil function - $k = \frac{2\pi}{\lambda}$ — Wave number - $(r, \psi, z)$ — Cylindrical coordinates at image plane 4.2 Polarization Effects For high-NA imaging, polarization significantly affects image contrast: | Polarization | Description | Behavior | |:-------------|:------------|:---------| | TE (s-polarization) | Electric field ⊥ to plane of incidence | Interferes constructively | | TM (p-polarization) | Electric field ∥ to plane of incidence | Suffers contrast loss at high angles | Consequences: - Horizontal vs. vertical features print differently - Requires illumination polarization control: - Tangential polarization - Radial polarization - Optimized/freeform polarization 5. Aberration Modeling: Zernike Polynomials Wavefront aberrations are expanded in Zernike polynomials over the unit pupil: $$ W(\rho,\theta) = \sum_{n,m} Z_n^m \cdot R_n^{|m|}(\rho) \cdot \begin{cases} \cos(m\theta) & m \geq 0 \\ \sin(|m|\theta) & m < 0 \end{cases} $$ 5.1 Key Aberrations Affecting Lithography | Zernike Term | Aberration | Effect on Imaging | |:-------------|:-----------|:------------------| | $Z_4$ | Defocus | Pattern-dependent CD shift | | $Z_5, Z_6$ | Astigmatism | H/V feature difference | | $Z_7, Z_8$ | Coma | Pattern shift, asymmetric printing | | $Z_9$ | Spherical | Through-pitch CD variation | | $Z_{10}, Z_{11}$ | Trefoil | Three-fold symmetric distortion | 5.2 Aberrated Pupil Function The pupil function with aberrations: $$ P(\rho,\theta) = P_0(\rho,\theta) \cdot \exp\left[\frac{2\pi i}{\lambda} W(\rho,\theta)\right] $$ Engineering Specifications: - Modern scanners control Zernikes through adjustable lens elements - Typical specification: $< 0.5\text{nm}$ RMS wavefront error 6. Rigorous Mask Modeling 6.1 Thin Mask (Kirchhoff) Approximation Assumes the mask is infinitely thin: $$ M(x,y) = t(x,y) \cdot e^{i\phi(x,y)} $$ Limitations: - Fails for advanced nodes - Mask topography (absorber thickness $\sim 50–70\text{nm}$) affects diffraction 6.2 Rigorous Electromagnetic Field (EMF) Methods 6.2.1 Rigorous Coupled-Wave Analysis (RCWA) The mask is treated as a periodic grating . Fields are expanded in Fourier series: $$ E(x,z) = \sum_n E_n(z) \cdot e^{i(k_{x0} + nK)x} $$ Parameters: - $K = \frac{2\pi}{\text{pitch}}$ — Grating vector - $k_{x0}$ — Incident wave x-component Substituting into Maxwell's equations yields coupled ODEs solved as an eigenvalue problem in each z-layer. 6.2.2 FDTD (Finite-Difference Time-Domain) Directly discretizes Maxwell's curl equations on a Yee grid : $$ \frac{\partial \mathbf{E}}{\partial t} = \frac{1}{\epsilon} \nabla \times \mathbf{H} $$ $$ \frac{\partial \mathbf{H}}{\partial t} = -\frac{1}{\mu} \nabla \times \mathbf{E} $$ Characteristics: - Explicit time-stepping - Computationally intensive - Handles arbitrary geometries 7. Photoresist Modeling 7.1 Exposure: Dill ABC Model The photoactive compound (PAC) concentration $M$ evolves as: $$ \frac{\partial M}{\partial t} = -I(z,t) \cdot [A \cdot M + B] \cdot M $$ Parameters: - $A$ — Bleachable absorption coefficient - $B$ — Non-bleachable absorption coefficient - $I(z,t)$ — Intensity in the resist Light intensity in the resist follows Beer-Lambert: $$ \frac{\partial I}{\partial z} = -\alpha(M) \cdot I $$ where $\alpha = A \cdot M + B$. 7.2 Post-Exposure Bake: Reaction-Diffusion For chemically amplified resists (CAR) : $$ \frac{\partial m}{\partial t} = D\nabla^2 m - k_{amp} \cdot m \cdot [H^+] $$ Variables: - $m$ — Blocking group concentration - $D$ — Diffusivity (temperature-dependent, Arrhenius behavior) - $[H^+]$ — Acid concentration Acid diffusion and quenching: $$ \frac{\partial [H^+]}{\partial t} = D_H \nabla^2 [H^+] - k_q [H^+][Q] $$ where $Q$ is quencher concentration. 7.3 Development: Mack Model Development rate as a function of inhibitor concentration $m$: $$ R(m) = R_{max} \cdot \frac{(a+1)(1-m)^n}{a + (1-m)^n} + R_{min} $$ Parameters: - $a, n$ — Kinetic parameters - $R_{max}$ — Maximum development rate - $R_{min}$ — Minimum development rate (unexposed) This creates the nonlinear resist response that sharpens edges. 8. Optical Proximity Correction (OPC) 8.1 The Inverse Problem Given target pattern $T$, find mask $M$ such that: $$ \text{Image}(M) \approx T $$ 8.2 Model-Based OPC Iterative edge-based correction. Cost function: $$ \mathcal{L} = \sum_i w_i \cdot (EPE_i)^2 + \lambda \cdot R(M) $$ Components: - $EPE_i$ — Edge Placement Error (distance from target at evaluation point $i$) - $w_i$ — Weight for each evaluation point - $R(M)$ — Regularization term for mask manufacturability Gradient descent update: $$ M^{(k+1)} = M^{(k)} - \eta \frac{\partial \mathcal{L}}{\partial M} $$ Gradient Computation Methods: - Adjoint methods (efficient for many output points) - Direct differentiation of SOCS kernels 8.3 Inverse Lithography Technology (ILT) Full pixel-based mask optimization: $$ \min_M \left\| I(M) - I_{target} \right\|^2 + \lambda_1 \|M\|_{TV} + \lambda_2 \|\nabla^2 M\|^2 $$ Regularization Terms: - $\|M\|_{TV}$ — Total Variation promotes sharp mask edges - $\|\nabla^2 M\|^2$ — Laplacian term controls curvature Result: ILT produces curvilinear masks with superior imaging, enabled by multi-beam mask writers. 9. Source-Mask Optimization (SMO) Joint optimization of illumination source $J$ and mask $M$: $$ \min_{J,M} \mathcal{L}(J,M) = \left\| I(J,M) - I_{target} \right\|^2 + \text{process window terms} $$ 9.1 Constraints Source Constraints: - Pixelized representation - Non-negative intensity: $J \geq 0$ - Power constraint: $\int J \, dA = P_0$ Mask Constraints: - Minimum feature size - Maximum curvature - Manufacturability rules 9.2 Mathematical Properties The problem is bilinear in $J$ and $M$ (linear in each separately), enabling: - Alternating optimization - Joint gradient methods 9.3 Process Window Co-optimization Adds robustness across focus and dose variations: $$ \mathcal{L}_{PW} = \sum_{focus, dose} w_{f,d} \cdot \left\| I_{f,d}(J,M) - I_{target} \right\|^2 $$ 10. EUV-Specific Mathematics 10.1 Multilayer Reflector Mo/Si multilayer with 40–50 bilayer pairs . Peak reflectivity from Bragg condition: $$ 2d \cdot \cos\theta = n\lambda $$ Parameters: - $d \approx 6.9\text{nm}$ — Bilayer period for $\lambda = 13.5\text{nm}$ - Near-normal incidence ($\theta \approx 0°$) Transfer Matrix Method Reflectivity calculation: $$ \begin{pmatrix} E_{out}^+ \\ E_{out}^- \end{pmatrix} = \prod_{j=1}^{N} M_j \begin{pmatrix} E_{in}^+ \\ E_{in}^- \end{pmatrix} $$ where $M_j$ is the transfer matrix for layer $j$. 10.2 Mask 3D Effects EUV masks are reflective with absorber patterns. At 6° chief ray angle: - Shadowing: Different illumination angles see different absorber profiles - Best focus shift: Pattern-dependent focus offsets Requires full 3D EMF simulation (RCWA or FDTD) for accurate modeling. 10.3 Stochastic Effects At EUV, photon counts are low enough that shot noise matters: $$ \sigma_{photon} = \sqrt{N_{photon}} $$ Line Edge Roughness (LER) Contributions - Photon shot noise - Acid shot noise - Resist molecular granularity Power Spectral Density Model $$ PSD(f) = \frac{A}{1 + (2\pi f \xi)^{2+2H}} $$ Parameters: - $\xi$ — Correlation length - $H$ — Hurst exponent (typically $0.5–0.8$) - $A$ — Amplitude Stochastic Simulation via Monte Carlo - Poisson-distributed photon absorption - Random acid generation and diffusion - Development with local rate variations 11. Process Window Analysis 11.1 Bossung Curves CD vs. focus at multiple dose levels: $$ CD(E, F) = CD_0 + a_1 E + a_2 F + a_3 E^2 + a_4 F^2 + a_5 EF + \cdots $$ Polynomial expansion fitted to simulation/measurement. 11.2 Normalized Image Log-Slope (NILS) $$ NILS = w \cdot \left. \frac{d \ln I}{dx} \right|_{edge} $$ Parameters: - $w$ — Feature width - Evaluated at the edge position Design Rule: $NILS > 2$ generally required for acceptable process latitude. Relationship to Exposure Latitude: $$ EL \propto NILS $$ 11.3 Depth of Focus (DOF) and Exposure Latitude (EL) Trade-off Visualized as overlapping process windows across pattern types — the common process window must satisfy all critical features. 12. Multi-Patterning Mathematics 12.1 SADP (Self-Aligned Double Patterning) $$ \text{Spacer pitch} = \frac{\text{Mandrel pitch}}{2} $$ Design Rule Constraints: - Mandrel CD and pitch - Spacer thickness uniformity - Cut pattern overlay 12.2 LELE (Litho-Etch-Litho-Etch) Decomposition Graph coloring problem: Assign features to masks such that: - Features on same mask satisfy minimum spacing - Total mask count minimized (typically 2) Computational Properties: - For 1D patterns: Equivalent to 2-colorable graph (bipartite) - For 2D: NP-complete in general Solution Methods: - Integer Linear Programming (ILP) - SAT solvers - Heuristic algorithms Conflict Graph Edge Weight: $$ w_{ij} = \begin{cases} \infty & \text{if } d_{ij} < d_{min,same} \\ 0 & \text{otherwise} \end{cases} $$ 13. Machine Learning Integration 13.1 Surrogate Models Neural networks approximate aerial image or resist profile: $$ I_{NN}(x; M) \approx I_{physics}(x; M) $$ Benefits: - Training on physics simulation data - Inference 100–1000× faster 13.2 OPC with ML - CNNs: Predict edge corrections - GANs: Generate mask patterns - Reinforcement Learning: Iterative OPC optimization 13.3 Hotspot Detection Classification of lithographic failure sites: $$ P(\text{hotspot} \mid \text{pattern}) = \sigma(W \cdot \phi(\text{pattern}) + b) $$ where $\sigma$ is the sigmoid function and $\phi$ extracts pattern features. 14. Mathematical Optimization Framework 14.1 Constrained Optimization Formulation $$ \min f(x) \quad \text{subject to} \quad g(x) \leq 0, \quad h(x) = 0 $$ Solution Methods: - Sequential Quadratic Programming (SQP) - Interior Point Methods - Augmented Lagrangian 14.2 Regularization Techniques | Regularization | Formula | Effect | |:---------------|:--------|:-------| | L1 (Sparsity) | $\|\nabla M\|_1$ | Promotes sparse gradients | | L2 (Smoothness) | $\|\nabla M\|_2^2$ | Promotes smooth transitions | | Total Variation | $\int |\nabla M| \, dx$ | Preserves edges while smoothing | 15. Mathematical Stack: | Layer | Mathematics | |:------|:------------| | Electromagnetic Propagation | Maxwell's equations, RCWA, FDTD | | Image Formation | Fourier optics, TCC, Hopkins, vector diffraction | | Aberrations | Zernike polynomials, wavefront phase | | Photoresist | Coupled PDEs (reaction-diffusion) | | Correction (OPC/ILT) | Inverse problems, constrained optimization | | SMO | Bilinear optimization, gradient methods | | Stochastics (EUV) | Poisson processes, Monte Carlo | | Multi-Patterning | Graph theory, combinatorial optimization | | Machine Learning | Neural networks, surrogate models | Formulas: Core Equations Resolution: R = k₁ × λ / NA Depth of Focus: DOF = k₂ × λ / NA² Numerical Aperture: NA = n × sin(θ) NILS: NILS = w × (d ln I / dx)|edge Bragg Condition: 2d × cos(θ) = nλ Shot Noise: σ = √N
Mathematically optimal experiment plans.
Computer-generated custom design.
# Semiconductor Manufacturing Process Optimization and Computational Mathematical Modeling ## 1. The Fundamental Challenge Modern semiconductor manufacturing involves **500–1000+ sequential process steps** to produce chips with billions of transistors at nanometer scales. Each step has dozens of tunable parameters, creating an optimization challenge that is: - **Extraordinarily high-dimensional** — hundreds to thousands of parameters - **Highly nonlinear** — complex interactions between process variables - **Expensive to explore experimentally** — each wafer costs thousands of dollars - **Multi-objective** — balancing yield, throughput, cost, and performance **Key Manufacturing Processes:** 1. **Lithography** — Pattern transfer using light/EUV exposure 2. **Etching** — Material removal (wet/dry plasma etching) 3. **Deposition** — Material addition (CVD, PVD, ALD) 4. **Ion Implantation** — Dopant introduction 5. **Thermal Processing** — Diffusion, annealing, oxidation 6. **Chemical-Mechanical Planarization (CMP)** — Surface planarization ## 2. The Mathematical Foundation ### 2.1 Governing Physics: Partial Differential Equations Nearly all semiconductor processes are governed by systems of coupled PDEs. #### Heat Transfer (Thermal Processing, Laser Annealing) $$ \rho c_p \frac{\partial T}{\partial t} = \nabla \cdot (k \nabla T) + Q $$ Where: - $\rho$ — density ($\text{kg/m}^3$) - $c_p$ — specific heat capacity ($\text{J/(kg}\cdot\text{K)}$) - $T$ — temperature ($\text{K}$) - $k$ — thermal conductivity ($\text{W/(m}\cdot\text{K)}$) - $Q$ — volumetric heat source ($\text{W/m}^3$) #### Mass Diffusion (Dopant Redistribution, Oxidation) $$ \frac{\partial C}{\partial t} = \nabla \cdot \left( D(C, T) \nabla C \right) + R(C) $$ Where: - $C$ — concentration ($\text{atoms/cm}^3$) - $D(C, T)$ — diffusion coefficient (concentration and temperature dependent) - $R(C)$ — reaction/generation term **Common Diffusion Models:** - **Constant source diffusion:** $$C(x, t) = C_s \cdot \text{erfc}\left( \frac{x}{2\sqrt{Dt}} \right)$$ - **Limited source diffusion:** $$C(x, t) = \frac{Q}{\sqrt{\pi D t}} \exp\left( -\frac{x^2}{4Dt} \right)$$ #### Fluid Dynamics (CVD, Etching Reactors) **Navier-Stokes Equations:** $$ \rho \left( \frac{\partial \mathbf{v}}{\partial t} + \mathbf{v} \cdot \nabla \mathbf{v} \right) = -\nabla p + \mu \nabla^2 \mathbf{v} + \mathbf{f} $$ **Continuity Equation:** $$ \frac{\partial \rho}{\partial t} + \nabla \cdot (\rho \mathbf{v}) = 0 $$ **Species Transport:** $$ \frac{\partial c_i}{\partial t} + \mathbf{v} \cdot \nabla c_i = D_i \nabla^2 c_i + \sum_j R_{ij} $$ Where: - $\mathbf{v}$ — velocity field ($\text{m/s}$) - $p$ — pressure ($\text{Pa}$) - $\mu$ — dynamic viscosity ($\text{Pa}\cdot\text{s}$) - $c_i$ — species concentration - $R_{ij}$ — reaction rates between species #### Electromagnetics (Lithography, Plasma Physics) **Maxwell's Equations:** $$ \nabla \times \mathbf{E} = -\frac{\partial \mathbf{B}}{\partial t} $$ $$ \nabla \times \mathbf{H} = \mathbf{J} + \frac{\partial \mathbf{D}}{\partial t} $$ **Hopkins Formulation for Partially Coherent Imaging:** $$ I(\mathbf{x}) = \iint J(\mathbf{f}_1, \mathbf{f}_2) \tilde{O}(\mathbf{f}_1) \tilde{O}^*(\mathbf{f}_2) e^{2\pi i (\mathbf{f}_1 - \mathbf{f}_2) \cdot \mathbf{x}} \, d\mathbf{f}_1 \, d\mathbf{f}_2 $$ Where: - $J(\mathbf{f}_1, \mathbf{f}_2)$ — mutual intensity (transmission cross-coefficient) - $\tilde{O}(\mathbf{f})$ — Fourier transform of mask transmission function ### 2.2 Surface Evolution and Topography Etching and deposition cause surfaces to evolve over time. The **Level Set Method** elegantly handles this: $$ \frac{\partial \phi}{\partial t} + V_n |\nabla \phi| = 0 $$ Where: - $\phi$ — level set function (surface defined by $\phi = 0$) - $V_n$ — normal velocity determined by local etch/deposition rates **Advantages:** - Naturally handles topological changes (void formation, surface merging) - No need for explicit surface tracking - Handles complex geometries **Etch Rate Models:** - **Ion-enhanced etching:** $$V_n = k_0 + k_1 \Gamma_{\text{ion}} + k_2 \Gamma_{\text{neutral}}$$ - **Visibility-dependent deposition:** $$V_n = V_0 \cdot \Omega(\mathbf{x})$$ where $\Omega(\mathbf{x})$ is the solid angle visible from point $\mathbf{x}$ ## 3. Computational Methods ### 3.1 Discretization Approaches #### Finite Element Methods (FEM) FEM dominates stress/strain analysis, thermal modeling, and electromagnetic simulation. The **weak formulation** transforms strong-form PDEs into integral equations: For the heat equation $-\nabla \cdot (k \nabla T) = Q$: $$ \int_\Omega \nabla w \cdot (k \nabla T) \, d\Omega = \int_\Omega w Q \, d\Omega + \int_{\Gamma_N} w q \, dS $$ Where: - $w$ — test/weight function - $\Omega$ — domain - $\Gamma_N$ — Neumann boundary **Galerkin Approximation:** $$ T(\mathbf{x}) \approx \sum_{i=1}^{N} T_i N_i(\mathbf{x}) $$ Where $N_i(\mathbf{x})$ are shape functions and $T_i$ are nodal values. #### Finite Difference Methods (FDM) Efficient for regular geometries and time-dependent problems. **Explicit Scheme (Forward Euler):** $$ \frac{T_i^{n+1} - T_i^n}{\Delta t} = \alpha \frac{T_{i+1}^n - 2T_i^n + T_{i-1}^n}{\Delta x^2} $$ **Stability Condition (CFL):** $$ \Delta t \leq \frac{\Delta x^2}{2\alpha} $$ **Implicit Scheme (Backward Euler):** $$ \frac{T_i^{n+1} - T_i^n}{\Delta t} = \alpha \frac{T_{i+1}^{n+1} - 2T_i^{n+1} + T_{i-1}^{n+1}}{\Delta x^2} $$ - Unconditionally stable but requires solving linear systems #### Monte Carlo Methods Essential for stochastic processes, particularly **ion implantation**. **Binary Collision Approximation (BCA):** 1. Sample impact parameter from screened Coulomb potential 2. Calculate scattering angle using: $$\theta = \pi - 2 \int_{r_{\min}}^{\infty} \frac{b \, dr}{r^2 \sqrt{1 - \frac{V(r)}{E_{\text{CM}}} - \frac{b^2}{r^2}}}$$ 3. Compute energy transfer: $$T = \frac{4 M_1 M_2}{(M_1 + M_2)^2} E \sin^2\left(\frac{\theta}{2}\right)$$ 4. Track recoils, vacancies, and interstitials 5. Accumulate statistics over $10^4 - 10^6$ ions ### 3.2 Multi-Scale Modeling | Scale | Length | Time | Methods | |:------|:-------|:-----|:--------| | Quantum | 0.1–1 nm | fs | DFT, ab initio MD | | Atomistic | 1–100 nm | ps–ns | Classical MD, Kinetic MC | | Mesoscale | 100 nm–10 μm | μs–ms | Phase field, Continuum MC | | Continuum | μm–mm | ms–hours | FEM, FDM, FVM | | Equipment | cm–m | seconds–hours | CFD, Thermal/Mechanical | **Information Flow Between Scales:** - **Upscaling:** Parameters computed at lower scales inform higher-scale models - Reaction barriers from DFT → Kinetic Monte Carlo rates - Surface mobilities from MD → Continuum deposition models - **Downscaling:** Boundary conditions and fields from higher scales - Temperature fields → Local reaction rates - Stress fields → Defect migration barriers ## 4. Optimization Frameworks ### 4.1 The General Problem Structure Semiconductor process optimization typically takes the form: $$ \min_{\mathbf{x} \in \mathcal{X}} f(\mathbf{x}) \quad \text{subject to} \quad g_i(\mathbf{x}) \leq 0, \quad h_j(\mathbf{x}) = 0 $$ Where: - $\mathbf{x} \in \mathbb{R}^n$ — process parameters (temperatures, pressures, times, flows, powers) - $f(\mathbf{x})$ — objective function (often negative yield or weighted combination) - $g_i(\mathbf{x}) \leq 0$ — inequality constraints (equipment limits, process windows) - $h_j(\mathbf{x}) = 0$ — equality constraints (design requirements) **Typical Parameter Vector:** $$ \mathbf{x} = \begin{bmatrix} T_1 \\ T_2 \\ P_{\text{chamber}} \\ t_{\text{process}} \\ \text{Flow}_{\text{gas1}} \\ \text{Flow}_{\text{gas2}} \\ \text{RF Power} \\ \vdots \end{bmatrix} $$ ### 4.2 Response Surface Methodology (RSM) Classical RSM builds polynomial surrogate models from designed experiments: **Second-Order Model:** $$ \hat{y} = \beta_0 + \sum_{i=1}^{k} \beta_i x_i + \sum_{i=1}^{k} \sum_{j>i}^{k} \beta_{ij} x_i x_j + \sum_{i=1}^{k} \beta_{ii} x_i^2 + \epsilon $$ **Matrix Form:** $$ \hat{y} = \beta_0 + \mathbf{x}^T \mathbf{b} + \mathbf{x}^T \mathbf{B} \mathbf{x} $$ Where: - $\mathbf{b}$ — vector of linear coefficients - $\mathbf{B}$ — matrix of quadratic and interaction coefficients **Design of Experiments (DOE) Types:** | Design Type | Runs for k Factors | Best For | |:------------|:-------------------|:---------| | Full Factorial | $2^k$ | Small k, all interactions | | Fractional Factorial | $2^{k-p}$ | Screening, main effects | | Central Composite | $2^k + 2k + n_c$ | Response surfaces | | Box-Behnken | Varies | Quadratic models, efficient | **Optimal Point (for quadratic model):** $$ \mathbf{x}^* = -\frac{1}{2} \mathbf{B}^{-1} \mathbf{b} $$ ### 4.3 Bayesian Optimization For expensive black-box functions, Bayesian optimization is remarkably efficient. **Gaussian Process Prior:** $$ f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')) $$ **Common Kernels:** - **Squared Exponential (RBF):** $$k(\mathbf{x}, \mathbf{x}') = \sigma^2 \exp\left( -\frac{\|\mathbf{x} - \mathbf{x}'\|^2}{2\ell^2} \right)$$ - **Matérn 5/2:** $$k(\mathbf{x}, \mathbf{x}') = \sigma^2 \left(1 + \frac{\sqrt{5}r}{\ell} + \frac{5r^2}{3\ell^2}\right) \exp\left(-\frac{\sqrt{5}r}{\ell}\right)$$ where $r = \|\mathbf{x} - \mathbf{x}'\|$ **Posterior Distribution:** Given observations $\mathcal{D} = \{(\mathbf{x}_i, y_i)\}_{i=1}^{n}$: $$ \mu(\mathbf{x}^*) = \mathbf{k}_*^T (\mathbf{K} + \sigma_n^2 \mathbf{I})^{-1} \mathbf{y} $$ $$ \sigma^2(\mathbf{x}^*) = k(\mathbf{x}^*, \mathbf{x}^*) - \mathbf{k}_*^T (\mathbf{K} + \sigma_n^2 \mathbf{I})^{-1} \mathbf{k}_* $$ **Acquisition Functions:** - **Expected Improvement (EI):** $$\text{EI}(\mathbf{x}) = \mathbb{E}\left[\max(f(\mathbf{x}) - f^+, 0)\right]$$ Closed form: $$\text{EI}(\mathbf{x}) = (\mu(\mathbf{x}) - f^+ - \xi) \Phi(Z) + \sigma(\mathbf{x}) \phi(Z)$$ where $Z = \frac{\mu(\mathbf{x}) - f^+ - \xi}{\sigma(\mathbf{x})}$ - **Upper Confidence Bound (UCB):** $$\text{UCB}(\mathbf{x}) = \mu(\mathbf{x}) + \kappa \sigma(\mathbf{x})$$ - **Probability of Improvement (PI):** $$\text{PI}(\mathbf{x}) = \Phi\left(\frac{\mu(\mathbf{x}) - f^+ - \xi}{\sigma(\mathbf{x})}\right)$$ ### 4.4 Metaheuristic Methods For highly non-convex, multimodal optimization landscapes. #### Genetic Algorithms (GA) **Algorithmic Steps:** 1. **Initialize** population of $N$ candidate solutions 2. **Evaluate** fitness $f(\mathbf{x}_i)$ for each individual 3. **Select** parents using tournament/roulette wheel selection 4. **Crossover** to create offspring: - Single-point: $\mathbf{x}_{\text{child}} = [\mathbf{x}_1(1:c), \mathbf{x}_2(c+1:n)]$ - Blend: $\mathbf{x}_{\text{child}} = \alpha \mathbf{x}_1 + (1-\alpha) \mathbf{x}_2$ 5. **Mutate** with probability $p_m$: $$x_i' = x_i + \mathcal{N}(0, \sigma^2)$$ 6. **Replace** population and repeat #### Particle Swarm Optimization (PSO) **Update Equations:** $$ \mathbf{v}_i^{t+1} = \omega \mathbf{v}_i^t + c_1 r_1 (\mathbf{p}_i - \mathbf{x}_i^t) + c_2 r_2 (\mathbf{g} - \mathbf{x}_i^t) $$ $$ \mathbf{x}_i^{t+1} = \mathbf{x}_i^t + \mathbf{v}_i^{t+1} $$ Where: - $\omega$ — inertia weight (typically 0.4–0.9) - $c_1, c_2$ — cognitive and social parameters (typically ~2.0) - $\mathbf{p}_i$ — personal best position - $\mathbf{g}$ — global best position - $r_1, r_2$ — random numbers in $[0, 1]$ #### Simulated Annealing (SA) **Acceptance Probability:** $$ P(\text{accept}) = \begin{cases} 1 & \text{if } \Delta E < 0 \\ \exp\left(-\frac{\Delta E}{k_B T}\right) & \text{if } \Delta E \geq 0 \end{cases} $$ **Cooling Schedule:** $$ T_{k+1} = \alpha T_k \quad \text{(geometric, } \alpha \approx 0.95\text{)} $$ ### 4.5 Multi-Objective Optimization Real optimization involves trade-offs between competing objectives. **Multi-Objective Problem:** $$ \min_{\mathbf{x}} \mathbf{F}(\mathbf{x}) = \begin{bmatrix} f_1(\mathbf{x}) \\ f_2(\mathbf{x}) \\ \vdots \\ f_m(\mathbf{x}) \end{bmatrix} $$ **Pareto Dominance:** Solution $\mathbf{x}_1$ dominates $\mathbf{x}_2$ (written $\mathbf{x}_1 \prec \mathbf{x}_2$) if: - $f_i(\mathbf{x}_1) \leq f_i(\mathbf{x}_2)$ for all $i$ - $f_j(\mathbf{x}_1) < f_j(\mathbf{x}_2)$ for at least one $j$ **NSGA-II Algorithm:** 1. Non-dominated sorting to assign ranks 2. Crowding distance calculation: $$d_i = \sum_{m=1}^{M} \frac{f_m^{i+1} - f_m^{i-1}}{f_m^{\max} - f_m^{\min}}$$ 3. Selection based on rank and crowding distance 4. Standard crossover and mutation ### 4.6 Robust Optimization Manufacturing variability is inevitable. Robust optimization explicitly accounts for it. **Mean-Variance Formulation:** $$ \min_{\mathbf{x}} \mathbb{E}_\xi[f(\mathbf{x}, \xi)] + \lambda \cdot \text{Var}_\xi[f(\mathbf{x}, \xi)] $$ **Minimax (Worst-Case) Formulation:** $$ \min_{\mathbf{x}} \max_{\xi \in \mathcal{U}} f(\mathbf{x}, \xi) $$ **Chance-Constrained Formulation:** $$ \min_{\mathbf{x}} f(\mathbf{x}) \quad \text{s.t.} \quad P(g(\mathbf{x}, \xi) \leq 0) \geq 1 - \alpha $$ **Taguchi Signal-to-Noise Ratios:** - **Smaller-is-better:** $\text{SNR} = -10 \log_{10}\left(\frac{1}{n}\sum_{i=1}^{n} y_i^2\right)$ - **Larger-is-better:** $\text{SNR} = -10 \log_{10}\left(\frac{1}{n}\sum_{i=1}^{n} \frac{1}{y_i^2}\right)$ - **Nominal-is-best:** $\text{SNR} = 10 \log_{10}\left(\frac{\bar{y}^2}{s^2}\right)$ ## 5. Advanced Topics and Modern Approaches ### 5.1 Physics-Informed Neural Networks (PINNs) PINNs embed physical laws directly into neural network training. **Loss Function:** $$ \mathcal{L} = \mathcal{L}_{\text{data}} + \lambda \mathcal{L}_{\text{physics}} + \gamma \mathcal{L}_{\text{BC}} $$ Where: $$ \mathcal{L}_{\text{data}} = \frac{1}{N_d} \sum_{i=1}^{N_d} |u_\theta(\mathbf{x}_i) - u_i|^2 $$ $$ \mathcal{L}_{\text{physics}} = \frac{1}{N_p} \sum_{j=1}^{N_p} |\mathcal{N}[u_\theta(\mathbf{x}_j)]|^2 $$ $$ \mathcal{L}_{\text{BC}} = \frac{1}{N_b} \sum_{k=1}^{N_b} |\mathcal{B}[u_\theta(\mathbf{x}_k)] - g_k|^2 $$ **Example: Heat Equation PINN** For $\frac{\partial T}{\partial t} = \alpha \nabla^2 T$: $$ \mathcal{L}_{\text{physics}} = \frac{1}{N_p} \sum_{j=1}^{N_p} \left| \frac{\partial T_\theta}{\partial t} - \alpha \nabla^2 T_\theta \right|^2_{\mathbf{x}_j, t_j} $$ **Advantages:** - Dramatically reduced data requirements - Physical consistency guaranteed - Effective for inverse problems ### 5.2 Digital Twins and Real-Time Optimization A digital twin is a continuously updated simulation model of the physical process. **Kalman Filter for State Estimation:** **Prediction Step:** $$ \hat{\mathbf{x}}_{k|k-1} = \mathbf{F}_k \hat{\mathbf{x}}_{k-1|k-1} + \mathbf{B}_k \mathbf{u}_k $$ $$ \mathbf{P}_{k|k-1} = \mathbf{F}_k \mathbf{P}_{k-1|k-1} \mathbf{F}_k^T + \mathbf{Q}_k $$ **Update Step:** $$ \mathbf{K}_k = \mathbf{P}_{k|k-1} \mathbf{H}_k^T (\mathbf{H}_k \mathbf{P}_{k|k-1} \mathbf{H}_k^T + \mathbf{R}_k)^{-1} $$ $$ \hat{\mathbf{x}}_{k|k} = \hat{\mathbf{x}}_{k|k-1} + \mathbf{K}_k (\mathbf{z}_k - \mathbf{H}_k \hat{\mathbf{x}}_{k|k-1}) $$ $$ \mathbf{P}_{k|k} = (\mathbf{I} - \mathbf{K}_k \mathbf{H}_k) \mathbf{P}_{k|k-1} $$ **Run-to-Run Control:** $$ \mathbf{u}_{k+1} = \mathbf{u}_k + \mathbf{G} (\mathbf{y}_{\text{target}} - \hat{\mathbf{y}}_k) $$ Where $\mathbf{G}$ is the controller gain matrix. ### 5.3 Machine Learning for Virtual Metrology **Virtual Metrology Model:** $$ \hat{y} = f_{\text{ML}}(\mathbf{x}_{\text{sensor}}, \mathbf{x}_{\text{recipe}}, \mathbf{x}_{\text{context}}) $$ Where: - $\mathbf{x}_{\text{sensor}}$ — in-situ sensor data (OES, RF impedance, etc.) - $\mathbf{x}_{\text{recipe}}$ — process recipe parameters - $\mathbf{x}_{\text{context}}$ — chamber state, maintenance history **Domain Adaptation Challenge:** $$ \mathcal{L}_{\text{total}} = \mathcal{L}_{\text{task}} + \lambda \mathcal{L}_{\text{domain}} $$ Using adversarial training to minimize distribution shift between chambers. ### 5.4 Reinforcement Learning for Sequential Decisions **Markov Decision Process (MDP) Formulation:** - **State** $s$: Current wafer/chamber conditions - **Action** $a$: Recipe adjustments - **Reward** $r$: Yield, throughput, quality metrics - **Transition** $P(s'|s, a)$: Process dynamics **Policy Gradient (REINFORCE):** $$ \nabla_\theta J(\theta) = \mathbb{E}_{\pi_\theta} \left[ \sum_{t=0}^{T} \nabla_\theta \log \pi_\theta(a_t|s_t) \cdot G_t \right] $$ Where $G_t = \sum_{k=t}^{T} \gamma^{k-t} r_k$ is the return. ## 6. Specific Process Case Studies ### 6.1 Lithography: Computational Imaging and OPC **Optical Proximity Correction Optimization:** $$ \mathbf{m}^* = \arg\min_{\mathbf{m}} \|\mathbf{T}_{\text{target}} - \mathbf{I}(\mathbf{m})\|^2 + R(\mathbf{m}) $$ Where: - $\mathbf{m}$ — mask transmission function - $\mathbf{I}(\mathbf{m})$ — forward imaging model - $R(\mathbf{m})$ — regularization (manufacturability, minimum features) **Aerial Image Formation (Scalar Model):** $$ I(x, y) = \left| \int_{-\text{NA}}^{\text{NA}} \tilde{M}(f_x) H(f_x) e^{2\pi i f_x x} df_x \right|^2 $$ **Source-Mask Optimization (SMO):** $$ \min_{\mathbf{m}, \mathbf{s}} \sum_{p} \|I_p(\mathbf{m}, \mathbf{s}) - T_p\|^2 + \lambda_m R_m(\mathbf{m}) + \lambda_s R_s(\mathbf{s}) $$ Jointly optimizing mask pattern and illumination source. ### 6.2 CMP: Pattern-Dependent Modeling **Preston Equation:** $$ \frac{dz}{dt} = K_p \cdot p \cdot V $$ Where: - $K_p$ — Preston coefficient (material-dependent) - $p$ — local pressure - $V$ — relative velocity **Pattern-Dependent Pressure Model:** $$ p_{\text{eff}}(x, y) = p_{\text{applied}} \cdot \frac{1}{\rho(x, y) * K(x, y)} $$ Where $\rho(x, y)$ is the local pattern density and $*$ denotes convolution with a planarization kernel $K$. **Step Height Evolution:** $$ \frac{d(\Delta z)}{dt} = K_p V (p_{\text{high}} - p_{\text{low}}) $$ ### 6.3 Plasma Etching: Plasma-Surface Interactions **Species Balance in Plasma:** $$ \frac{dn_i}{dt} = \sum_j k_{ji} n_j n_e - \sum_k k_{ik} n_i n_e - \frac{n_i}{\tau_{\text{res}}} + S_i $$ Where: - $n_i$ — density of species $i$ - $k_{ji}$ — rate coefficients (Arrhenius form) - $\tau_{\text{res}}$ — residence time - $S_i$ — source terms **Ion Energy Distribution Function:** $$ f(E) = \frac{1}{\sqrt{2\pi}\sigma_E} \exp\left(-\frac{(E - \bar{E})^2}{2\sigma_E^2}\right) $$ **Etch Yield:** $$ Y(E, \theta) = Y_0 \cdot \sqrt{E - E_{\text{th}}} \cdot f(\theta) $$ Where $f(\theta)$ is the angular dependence. ## 7. The Mathematics of Yield **Poisson Defect Model:** $$ Y = e^{-D \cdot A} $$ Where: - $D$ — defect density ($\text{defects/cm}^2$) - $A$ — chip area ($\text{cm}^2$) **Negative Binomial (Clustered Defects):** $$ Y = \left(1 + \frac{DA}{\alpha}\right)^{-\alpha} $$ Where $\alpha$ is the clustering parameter (smaller = more clustered). **Parametric Yield:** For a parameter with distribution $p(\theta)$ and specification $[\theta_{\min}, \theta_{\max}]$: $$ Y_{\text{param}} = \int_{\theta_{\min}}^{\theta_{\max}} p(\theta) \, d\theta $$ For Gaussian distribution: $$ Y_{\text{param}} = \Phi\left(\frac{\theta_{\max} - \mu}{\sigma}\right) - \Phi\left(\frac{\theta_{\min} - \mu}{\sigma}\right) $$ **Process Capability Index:** $$ C_{pk} = \min\left(\frac{\mu - \text{LSL}}{3\sigma}, \frac{\text{USL} - \mu}{3\sigma}\right) $$ **Total Yield:** $$ Y_{\text{total}} = Y_{\text{defect}} \times Y_{\text{parametric}} \times Y_{\text{test}} $$ ## 8. Open Challenges 1. **High-Dimensional Optimization** - Hundreds to thousands of interacting parameters - Curse of dimensionality in sampling-based methods - Need for effective dimensionality reduction 2. **Uncertainty Quantification** - Error propagation across model hierarchies - Aleatory vs. epistemic uncertainty separation - Confidence bounds on predictions 3. **Data Scarcity** - Each experimental data point costs \$1000+ - Models must learn from small datasets - Transfer learning between processes/tools 4. **Interpretability** - Black-box models limit root cause analysis - Need for physics-informed feature engineering - Explainable AI for process engineering 5. **Real-Time Constraints** - Run-to-run control requires millisecond decisions - Reduced-order models needed - Edge computing for in-situ optimization 6. **Integration Complexity** - Multiple physics domains coupled - Full-flow optimization across 500+ steps - Design-technology co-optimization ## 9. Optimization summary Semiconductor manufacturing process optimization represents one of the most sophisticated applications of computational mathematics in industry. It integrates: - **Classical numerical methods** (FEM, FDM, Monte Carlo) - **Statistical modeling** (DOE, RSM, uncertainty quantification) - **Optimization theory** (convex/non-convex, single/multi-objective, deterministic/robust) - **Machine learning** (neural networks, Gaussian processes, reinforcement learning) - **Control theory** (Kalman filtering, run-to-run control, MPC) The field continues to evolve as feature sizes shrink toward atomic scales, process complexity grows, and computational capabilities expand. Success requires not just mathematical sophistication but deep physical intuition about the processes being modeled—the best work reflects genuine synthesis across disciplines.
Optimization-based inversion iteratively refines latent codes minimizing reconstruction error.
AI improvement loop: measure, analyze, hypothesize, experiment, deploy. Continuous iteration beats one-shot.
Optimize considering variability.
Optimize latent to match image.
Before optimizing, I help you profile; then we focus on the real bottlenecks with concrete code or architectural changes.
To optimize latency, profile the stack: network, tokenizer, model, quantization, batching. Fix the slowest layer first (profiling before tuning).
Optimizers update weights using gradients. Adam adapts learning rate per parameter. SGD is simpler. Learning rate controls step size.
Options framework enables hierarchical RL through temporally extended actions with initiation termination and policy.
Temporal abstractions in RL.
Optuna is hyperparameter optimization framework. Efficient search.
Orca Mini applies Orca techniques to small models. Punch above weight.
Orca learns from GPT-4 reasoning traces. Microsoft research.
An orchestrator routes each request to the best model or tool (cheap vs. expensive, code vs. chat) and can chain multiple steps into a workflow.