deepar, time series models
DeepAR is an autoregressive recurrent network that learns probabilistic forecasts across multiple related time series using Monte Carlo sampling.
234 technical terms and definitions
DeepAR is an autoregressive recurrent network that learns probabilistic forecasts across multiple related time series using Monte Carlo sampling.
DeepEval provides unit tests for LLMs. Assert on metrics. CI integration.
Find minimal perturbation to decision boundary.
Decompose predictions to input features.
Microsoft's training optimization library.
Random walk-based graph embeddings.
# The Full Stack AI Build: A Comprehensive Analysis
## Overview
The 5-Layer AI Stack represents the complete vertical integration required to build frontier AI systems:
$$
\text{Energy} \rightarrow \text{Chips} \rightarrow \text{Infrastructure} \rightarrow \text{Models} \rightarrow \text{Applications}
$$
## Layer 1: Energy (Electricity)
The foundational constraint upon which all other layers depend.
### Key Metrics
- **Training Energy Consumption**: A frontier LLM requires approximately $50-100+ \text{ GWh}$
- **Power Usage Effectiveness (PUE)**:
$$
\text{PUE} = \frac{\text{Total Facility Energy}}{\text{IT Equipment Energy}}
$$
- **Target PUE**: $\text{PUE} \leq 1.2$ for modern AI data centers
### Critical Concerns
- Reliability and uptime requirements: $99.999\%$ availability
- Cost optimization: USD per kWh directly impacts training costs
- Carbon footprint: $\text{kg CO}_2/\text{kWh}$ varies by source
- Geographic availability constraints
### Energy Cost Model
$$
C_{\text{energy}} = P_{\text{peak}} \times T_{\text{training}} \times \text{PUE} \times \text{Cost}_{\text{kWh}}
$$
Where:
- $P_{\text{peak}}$ = Peak power consumption (MW)
- $T_{\text{training}}$ = Training duration (hours)
- $\text{Cost}_{\text{kWh}}$ = Energy cost per kilowatt-hour
## Layer 2: Chips
The computational substrate transforming electricity into useful operations.
### 2.1 Design Chips
#### Architecture Decisions
- **GPU vs TPU vs Custom ASIC**: Trade-offs in flexibility vs efficiency
- **Core compute unit**: Matrix multiplication engines
$$
C = A \times B \quad \text{where } A \in \mathbb{R}^{m \times k}, B \in \mathbb{R}^{k \times n}
$$
- **Operations count**:
$$
\text{FLOPs}_{\text{matmul}} = 2 \times m \times n \times k
$$
#### Memory Bandwidth Optimization
The real bottleneck for transformer models:
$$
\text{Arithmetic Intensity} = \frac{\text{FLOPs}}{\text{Bytes Accessed}}
$$
For transformers:
$$
\text{AI}_{\text{attention}} = \frac{2 \times n^2 \times d}{n^2 + 2nd} \approx O(d) \text{ (memory bound)}
$$
#### Numerical Precision Trade-offs
| Precision | Bits | Dynamic Range | Use Case |
|-----------|------|---------------|----------|
| FP32 | 32 | $\pm 3.4 \times 10^{38}$ | Reference |
| FP16 | 16 | $\pm 65504$ | Training |
| BF16 | 16 | $\pm 3.4 \times 10^{38}$ | Training |
| FP8 | 8 | $\pm 448$ (E4M3) | Inference |
| INT8 | 8 | $[-128, 127]$ | Quantized inference |
| INT4 | 4 | $[-8, 7]$ | Extreme quantization |
Quantization error bound:
$$
\|W - W_q\|_F \leq \frac{\Delta}{2}\sqrt{n}
$$
Where $\Delta$ is the quantization step size.
#### Power Efficiency
$$
\text{Efficiency} = \frac{\text{FLOPS}}{\text{Watt}} \quad [\text{FLOPS/W}]
$$
Modern targets:
- Training: $> 300 \text{ TFLOPS/chip}$ at $< 700\text{W}$
- Inference: $> 1000 \text{ TOPS/W}$ (INT8)
### 2.2 Wafer Fabrication
#### Process Technology
- **Transistor density**:
$$
D = \frac{N_{\text{transistors}}}{A_{\text{die}}} \quad [\text{transistors/mm}^2]
$$
- **Node progression**: $7\text{nm} \rightarrow 5\text{nm} \rightarrow 3\text{nm} \rightarrow 2\text{nm}$
#### Yield Optimization
Defect density model (Poisson):
$$
Y = e^{-D_0 \times A}
$$
Where:
- $Y$ = Yield (probability of good die)
- $D_0$ = Defect density (defects/cm²)
- $A$ = Die area (cm²)
#### Advanced Packaging
- **CoWoS** (Chip-on-Wafer-on-Substrate)
- **Chiplets**: Disaggregated design
- **3D Stacking**: HBM memory integration
Bandwidth scaling:
$$
\text{BW}_{\text{HBM3}} = N_{\text{stacks}} \times \text{BW}_{\text{per\_stack}} \approx 6 \times 819 = 4.9 \text{ TB/s}
$$
## Layer 3: Infrastructure
Systems layer orchestrating chips into usable compute.
### 3.1 Build AI Infrastructure
#### Cluster Architecture
- **Nodes per cluster**: $N_{\text{nodes}} = 1000-10000+$
- **GPUs per node**: $G_{\text{per\_node}} = 8$ (typical)
- **Total GPUs**:
$$
G_{\text{total}} = N_{\text{nodes}} \times G_{\text{per\_node}}
$$
#### Network Topology
**Fat-tree bandwidth**:
$$
\text{Bisection BW} = \frac{N \times \text{BW}_{\text{link}}}{2}
$$
**All-reduce communication cost**:
$$
T_{\text{all-reduce}} = 2(n-1) \times \frac{M}{n \times \text{BW}} + 2(n-1) \times \alpha
$$
Where:
- $M$ = Message size
- $n$ = Number of participants
- $\alpha$ = Latency per hop
#### Storage Requirements
Training data storage:
$$
S_{\text{data}} = N_{\text{tokens}} \times \text{bytes\_per\_token} \times \text{redundancy}
$$
For 10T tokens:
$$
S \approx 10^{13} \times 2 \times 3 = 60 \text{ PB}
$$
#### Reliability Engineering
**Checkpointing overhead**:
$$
\text{Overhead} = \frac{T_{\text{checkpoint}}}{T_{\text{checkpoint\_interval}}}
$$
**Mean Time Between Failures (MTBF)** for cluster:
$$
\text{MTBF}_{\text{cluster}} = \frac{\text{MTBF}_{\text{component}}}{N_{\text{components}}}
$$
## Layer 4: Models (LLMs)
Where core AI capability emerges.
### 4.1 Build Large Language Models
#### Transformer Architecture
**Self-attention mechanism**:
$$
\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V
$$
Where:
- $Q = XW_Q \in \mathbb{R}^{n \times d_k}$ (Queries)
- $K = XW_K \in \mathbb{R}^{n \times d_k}$ (Keys)
- $V = XW_V \in \mathbb{R}^{n \times d_v}$ (Values)
**Multi-Head Attention (MHA)**:
$$
\text{MHA}(X) = \text{Concat}(\text{head}_1, ..., \text{head}_h)W_O
$$
$$
\text{head}_i = \text{Attention}(XW_Q^i, XW_K^i, XW_V^i)
$$
**Grouped Query Attention (GQA)**:
Reduces KV cache by factor $g$:
$$
\text{KV\_cache} = 2 \times L \times n \times d \times \frac{h}{g} \times \text{bytes}
$$
#### Feed-Forward Network
$$
\text{FFN}(x) = \text{GELU}(xW_1 + b_1)W_2 + b_2
$$
SwiGLU variant:
$$
\text{SwiGLU}(x) = (\text{Swish}(xW_1) \odot xW_3)W_2
$$
#### Model Parameter Count
For decoder-only transformer:
$$
P = 12 \times L \times d^2 + V \times d
$$
Where:
- $L$ = Number of layers
- $d$ = Model dimension
- $V$ = Vocabulary size
#### Mixture of Experts (MoE)
$$
y = \sum_{i=1}^{N} G(x)_i \cdot E_i(x)
$$
Where $G(x)$ is the gating function:
$$
G(x) = \text{TopK}(\text{softmax}(xW_g))
$$
### 4.2 Pre-training
#### Training Objective
**Next-token prediction (autoregressive)**:
$$
\mathcal{L} = -\sum_{t=1}^{T} \log P(x_t | x_{
Defect density models like Poisson and negative binomial relate defect counts to die area predicting yield from defect measurements.
Layer multiple safety mechanisms for robustness.
Learn offset locations for attention.
Models that can deform to fit data.
Deformation fields warp canonical representations to model non-rigid motion.
Analyze partially degraded units.
Train ViT efficiently with distillation.
Use special markers to separate sections.
Demand control ventilation modulates outdoor air intake based on occupancy reducing conditioning loads.
Demand forecasting predicts future material requirements based on production schedules and market trends.
Democratic co-learning trains multiple diverse models that teach each other through weighted voting on unlabeled examples.
Demographic parity: equal positive rates across groups. May conflict with accuracy. Choose metric carefully.
Model outputs same distribution for all demographic groups.
Core framework for diffusion-based generation.
Learn scores through denoising.
Control generation vs input preservation.
Generate captions for regions.
Standard model where all parameters activate for every input.
Learn attention patterns.
DenseNAS enables efficient one-shot search in dense spaces through dimension-extended sampling.
Depth conditioning guides generation using depth maps preserving spatial structure.
Condition on depth information.
Depthwise convolutions apply separate filters per input channel without cross-channel interaction.
Depthwise separable convolutions factorize standard convolutions into depthwise and pointwise operations reducing computation.
Depthwise temporal convolutions process each channel independently across time.
Desiccant dehumidification removes moisture using hygroscopic materials reducing cooling loads.
Design for recycling optimizes products for material recovery and reuse at end-of-life.
Exception to design rule for special cases.
Ongoing competition between detectors and generators.
Eliminate non-deterministic operations.
Remove toxic or harmful content from generations.
# Device Physics & Mathematical Modeling 1. Fundamental Mathematical Structure Semiconductor modeling is built on coupled nonlinear partial differential equations spanning multiple scales: | Scale | Methods | Typical Equations | |:------|:--------|:------------------| | Quantum (< 1 nm) | DFT, Schrödinger | $H\psi = E\psi$ | | Atomistic (1–100 nm) | MD, Kinetic Monte Carlo | Newton's equations, master equations | | Continuum (nm–mm) | Drift-diffusion, FEM | PDEs (Poisson, continuity, heat) | | Circuit | SPICE | ODEs, compact models | Multiscale Hierarchy The mathematics forms a hierarchy of models through successive averaging: $$ \boxed{\text{Schrödinger} \xrightarrow{\text{averaging}} \text{Boltzmann} \xrightarrow{\text{moments}} \text{Drift-Diffusion} \xrightarrow{\text{fitting}} \text{Compact Models}} $$ 2. Process Physics & Models 2.1 Oxidation: Deal-Grove Model Thermal oxidation of silicon follows linear-parabolic kinetics : $$ \frac{dx_{ox}}{dt} = \frac{B}{A + 2x_{ox}} $$ where: - $x_{ox}$ = oxide thickness - $B/A$ = linear rate constant (surface-reaction limited) - $B$ = parabolic rate constant (diffusion limited) Limiting Cases: - Thin oxide (reaction-limited): $$ x_{ox} \approx \frac{B}{A} \cdot t $$ - Thick oxide (diffusion-limited): $$ x_{ox} \approx \sqrt{B \cdot t} $$ Physical Mechanism: 1. O₂ transport from gas to oxide surface 2. O₂ diffusion through growing SiO₂ layer 3. Reaction at Si/SiO₂ interface: $\text{Si} + \text{O}_2 \rightarrow \text{SiO}_2$ > Note: This is a Stefan problem (moving boundary PDE). 2.2 Diffusion: Fick's Laws Dopant redistribution follows Fick's second law : $$ \frac{\partial C}{\partial t} = \nabla \cdot \left( D(C, T) \nabla C \right) $$ For constant $D$ in 1D: $$ \frac{\partial C}{\partial t} = D \frac{\partial^2 C}{\partial x^2} $$ Analytical Solutions (1D, constant D): - Constant surface concentration (infinite source): $$ C(x,t) = C_s \cdot \text{erfc}\left( \frac{x}{2\sqrt{Dt}} \right) $$ - Limited source (e.g., implant drive-in): $$ C(x,t) = \frac{Q}{\sqrt{\pi D t}} \exp\left( -\frac{x^2}{4Dt} \right) $$ where $Q$ = dose (atoms/cm²) Complications at High Concentrations: - Concentration-dependent diffusivity: $D = D(C)$ - Electric field effects: Charged point defects create internal fields - Vacancy/interstitial mechanisms: Different diffusion pathways $$ \frac{\partial C}{\partial t} = \frac{\partial}{\partial x}\left[ D(C) \frac{\partial C}{\partial x} \right] + \mu C \frac{\partial \phi}{\partial x} $$ 2.3 Ion Implantation: Range Theory The implanted dopant profile is approximately Gaussian : $$ C(x) = \frac{\Phi}{\sqrt{2\pi} \Delta R_p} \exp\left( -\frac{(x - R_p)^2}{2 (\Delta R_p)^2} \right) $$ where: - $\Phi$ = implant dose (ions/cm²) - $R_p$ = projected range (mean depth) - $\Delta R_p$ = straggle (standard deviation) LSS Theory (Lindhard-Scharff-Schiøtt) predicts stopping power: $$ -\frac{dE}{dx} = N \left[ S_n(E) + S_e(E) \right] $$ where: - $S_n(E)$ = nuclear stopping power (dominant at low energy) - $S_e(E)$ = electronic stopping power (dominant at high energy) - $N$ = target atomic density For asymmetric profiles , the Pearson IV distribution is used: $$ C(x) = \frac{\Phi \cdot K}{\Delta R_p} \left[ 1 + \left( \frac{x - R_p}{a} \right)^2 \right]^{-m} \exp\left[ -\nu \arctan\left( \frac{x - R_p}{a} \right) \right] $$ > Modern approach: Monte Carlo codes (SRIM/TRIM) for accurate profiles including channeling effects. 2.4 Lithography: Optical Imaging Aerial image formation follows Hopkins' partially coherent imaging theory : $$ I(\mathbf{r}) = \iint TCC(f, f') \cdot \tilde{M}(f) \cdot \tilde{M}^*(f') \cdot e^{2\pi i (f - f') \cdot \mathbf{r}} \, df \, df' $$ where: - $TCC$ = Transmission Cross-Coefficient - $\tilde{M}(f)$ = mask spectrum (Fourier transform of mask pattern) - $\mathbf{r}$ = position in image plane Fundamental Limits: - Rayleigh resolution criterion: $$ CD_{\min} = k_1 \frac{\lambda}{NA} $$ - Depth of focus: $$ DOF = k_2 \frac{\lambda}{NA^2} $$ where: - $\lambda$ = wavelength (193 nm for ArF, 13.5 nm for EUV) - $NA$ = numerical aperture - $k_1, k_2$ = process-dependent factors Resist Modeling — Dill Equations: $$ \frac{\partial M}{\partial t} = -C \cdot I(z) \cdot M $$ $$ \frac{dI}{dz} = -(\alpha M + \beta) I $$ where $M$ = photoactive compound concentration. 2.5 Etching & Deposition: Surface Evolution Topography evolution is modeled with the level set method : $$ \frac{\partial \phi}{\partial t} + V |\nabla \phi| = 0 $$ where: - $\phi(\mathbf{r}, t) = 0$ defines the surface - $V$ = local velocity (etch rate or deposition rate) For anisotropic etching: $$ V = V(\theta, \phi, \text{ion flux}, \text{chemistry}) $$ CVD in High Aspect Ratio Features: Knudsen diffusion limits step coverage: $$ \frac{\partial C}{\partial t} = D_K \nabla^2 C - k_s C \cdot \delta_{\text{surface}} $$ where: - $D_K = \frac{d}{3}\sqrt{\frac{8k_BT}{\pi m}}$ (Knudsen diffusivity) - $d$ = feature width - $k_s$ = surface reaction rate ALD (Atomic Layer Deposition): Self-limiting surface reactions follow Langmuir kinetics: $$ \theta = \frac{K \cdot P}{1 + K \cdot P} $$ where $\theta$ = surface coverage, $P$ = precursor partial pressure. 3. Device Physics: Semiconductor Equations The core mathematical framework for device simulation consists of three coupled PDEs : 3.1 Poisson's Equation (Electrostatics) $$ \nabla \cdot (\varepsilon \nabla \psi) = -q \left( p - n + N_D^+ - N_A^- \right) $$ where: - $\psi$ = electrostatic potential - $n, p$ = electron and hole concentrations - $N_D^+, N_A^-$ = ionized donor and acceptor concentrations 3.2 Continuity Equations (Carrier Conservation) Electrons: $$ \frac{\partial n}{\partial t} = \frac{1}{q} \nabla \cdot \mathbf{J}_n + G - R $$ Holes: $$ \frac{\partial p}{\partial t} = -\frac{1}{q} \nabla \cdot \mathbf{J}_p + G - R $$ where: - $G$ = generation rate - $R$ = recombination rate 3.3 Current Density Equations (Transport) Drift-Diffusion Model: $$ \mathbf{J}_n = q \mu_n n \mathbf{E} + q D_n \nabla n $$ $$ \mathbf{J}_p = q \mu_p p \mathbf{E} - q D_p \nabla p $$ Einstein Relation: $$ \frac{D_n}{\mu_n} = \frac{D_p}{\mu_p} = \frac{k_B T}{q} = V_T $$ 3.4 Recombination Models Shockley-Read-Hall (SRH) Recombination: $$ R_{SRH} = \frac{np - n_i^2}{\tau_p (n + n_1) + \tau_n (p + p_1)} $$ Auger Recombination: $$ R_{Auger} = C_n n (np - n_i^2) + C_p p (np - n_i^2) $$ Radiative Recombination: $$ R_{rad} = B (np - n_i^2) $$ 3.5 MOSFET Physics Threshold Voltage: $$ V_T = V_{FB} + 2\phi_B + \frac{\sqrt{2 \varepsilon_{Si} q N_A (2\phi_B)}}{C_{ox}} $$ where: - $V_{FB}$ = flat-band voltage - $\phi_B = \frac{k_BT}{q} \ln\left(\frac{N_A}{n_i}\right)$ = bulk potential - $C_{ox} = \frac{\varepsilon_{ox}}{t_{ox}}$ = oxide capacitance Drain Current (Gradual Channel Approximation): - Linear region ($V_{DS} < V_{GS} - V_T$): $$ I_D = \frac{W}{L} \mu_n C_{ox} \left[ (V_{GS} - V_T) V_{DS} - \frac{V_{DS}^2}{2} \right] $$ - Saturation region ($V_{DS} \geq V_{GS} - V_T$): $$ I_D = \frac{W}{2L} \mu_n C_{ox} (V_{GS} - V_T)^2 $$ 4. Quantum Effects at Nanoscale For modern devices with gate lengths $L_g < 10$ nm, classical models fail. 4.1 Quantum Confinement In thin silicon channels, carrier energy becomes quantized : $$ E_n = \frac{\hbar^2 \pi^2 n^2}{2 m^* t_{Si}^2} $$ where: - $n$ = quantum number (1, 2, 3, ...) - $m^*$ = effective mass - $t_{Si}$ = silicon body thickness Effects: - Increased threshold voltage - Modified density of states: $g_{2D}(E) = \frac{m^*}{\pi \hbar^2}$ (step function) 4.2 Quantum Tunneling Gate Leakage (Direct Tunneling): WKB approximation: $$ T \approx \exp\left( -2 \int_0^{t_{ox}} \kappa(x) \, dx \right) $$ where $\kappa = \sqrt{\frac{2m^*(\Phi_B - E)}{\hbar^2}}$ Source-Drain Tunneling: Limits OFF-state current in ultra-short channels. Band-to-Band Tunneling: Enables Tunnel FETs (TFETs): $$ I_{BTBT} \propto \exp\left( -\frac{4\sqrt{2m^*} E_g^{3/2}}{3q\hbar |\mathbf{E}|} \right) $$ 4.3 Ballistic Transport When channel length $L < \lambda_{mfp}$ (mean free path), the Landauer formalism applies: $$ I = \frac{2q}{h} \int T(E) \left[ f_S(E) - f_D(E) \right] dE $$ where: - $T(E)$ = transmission probability - $f_S, f_D$ = source and drain Fermi functions Ballistic Conductance Quantum: $$ G_0 = \frac{2q^2}{h} \approx 77.5 \, \mu\text{S} $$ 4.4 NEGF Formalism The Non-Equilibrium Green's Function method is the gold standard for quantum transport: $$ G^R = \left[ EI - H - \Sigma_1 - \Sigma_2 \right]^{-1} $$ where: - $H$ = device Hamiltonian - $\Sigma_1, \Sigma_2$ = contact self-energies - $G^R$ = retarded Green's function Observables: - Electron density: $n(\mathbf{r}) = -\frac{1}{\pi} \text{Im}[G^<(\mathbf{r}, \mathbf{r}; E)]$ - Current: $I = \frac{q}{h} \text{Tr}[\Gamma_1 G^R \Gamma_2 G^A]$ 5. Numerical Methods 5.1 Discretization: Scharfetter-Gummel Scheme The drift-diffusion current requires special treatment to avoid numerical instability: $$ J_{n,i+1/2} = \frac{q D_n}{h} \left[ n_{i+1} B\left( -\frac{\Delta \psi}{V_T} \right) - n_i B\left( \frac{\Delta \psi}{V_T} \right) \right] $$ where the Bernoulli function is: $$ B(x) = \frac{x}{e^x - 1} $$ Properties: - $B(0) = 1$ - $B(x) \to 0$ as $x \to \infty$ - $B(-x) = x + B(x)$ 5.2 Solution Strategies Gummel Iteration (Decoupled): 1. Solve Poisson for $\psi$ (fixed $n$, $p$) 2. Solve electron continuity for $n$ (fixed $\psi$, $p$) 3. Solve hole continuity for $p$ (fixed $\psi$, $n$) 4. Repeat until convergence Newton-Raphson (Fully Coupled): Solve the Jacobian system: $$ \begin{pmatrix} \frac{\partial F_\psi}{\partial \psi} & \frac{\partial F_\psi}{\partial n} & \frac{\partial F_\psi}{\partial p} \\ \frac{\partial F_n}{\partial \psi} & \frac{\partial F_n}{\partial n} & \frac{\partial F_n}{\partial p} \\ \frac{\partial F_p}{\partial \psi} & \frac{\partial F_p}{\partial n} & \frac{\partial F_p}{\partial p} \end{pmatrix} \begin{pmatrix} \delta \psi \\ \delta n \\ \delta p \end{pmatrix} = - \begin{pmatrix} F_\psi \\ F_n \\ F_p \end{pmatrix} $$ 5.3 Time Integration Stiffness Problem: Time scales span ~15 orders of magnitude: | Process | Time Scale | |:--------|:-----------| | Carrier relaxation | ~ps | | Thermal response | ~μs–ms | | Dopant diffusion | min–hours | Solution: Use implicit methods (Backward Euler, BDF). 5.4 Mesh Requirements Debye Length Constraint: The mesh must resolve the Debye length: $$ \lambda_D = \sqrt{\frac{\varepsilon k_B T}{q^2 n}} $$ For $n = 10^{18}$ cm⁻³: $\lambda_D \approx 4$ nm Adaptive Mesh Refinement: - Refine near junctions, interfaces, corners - Coarsen in bulk regions - Use Delaunay triangulation for quality 6. Compact Models for Circuit Simulation For SPICE-level simulation, physics is abstracted into algebraic/empirical equations. Industry Standard Models | Model | Device | Key Features | |:------|:-------|:-------------| | BSIM4 | Planar MOSFET | ~300 parameters, channel length modulation | | BSIM-CMG | FinFET | Tri-gate geometry, quantum effects | | BSIM-GAA | Nanosheet | Stacked channels, sheet width | | PSP | Bulk MOSFET | Surface-potential-based | Key Physics Captured - Short-channel effects: DIBL, $V_T$ roll-off - Quantum corrections: Inversion layer quantization - Mobility degradation: Surface scattering, velocity saturation - Parasitic effects: Series resistance, overlap capacitance - Variability: Statistical mismatch models Threshold Voltage Variability (Pelgrom's Law) $$ \sigma_{V_T} = \frac{A_{VT}}{\sqrt{W \cdot L}} $$ where $A_{VT}$ is a technology-dependent constant. 7. TCAD Co-Simulation Workflow The complete semiconductor design flow: ```text ┌─────────────────────────────────────────────────────────────┐ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ │ │ Process │──▶│ Device │──▶│ Parameter │ │ │ │ Simulation │ │ Simulation │ │ Extraction │ │ │ │ (Sentaurus) │ │ (Sentaurus) │ │ (BSIM Fit) │ │ │ └───────────────┘ └───────────────┘ └───────────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ │ │• Implantation │ │• I-V, C-V │ │• BSIM params │ │ │ │• Diffusion │ │• Breakdown │ │• Corner extr. │ │ │ │• Oxidation │ │• Hot carrier │ │• Variability │ │ │ │• Etching │ │• Noise │ │ statistics │ │ │ └───────────────┘ └───────────────┘ └───────────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────┐ │ │ │ Circuit │ │ │ │ Simulation │ │ │ │(SPICE,Spectre)│ │ │ └───────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` Key Challenge: Propagating variability through the entire chain: - Line Edge Roughness (LER) - Random Dopant Fluctuation (RDF) - Work function variation - Thickness variations 8. Mathematical Frontiers 8.1 Machine Learning + Physics - Physics-Informed Neural Networks (PINNs): $$ \mathcal{L} = \mathcal{L}_{data} + \lambda \mathcal{L}_{physics} $$ where $\mathcal{L}_{physics}$ enforces PDE residuals. - Surrogate models for expensive TCAD simulations - Inverse design and topology optimization - Defect prediction in manufacturing 8.2 Stochastic Modeling Random Dopant Fluctuation: $$ \sigma_{V_T} \propto \frac{t_{ox}}{\sqrt{W \cdot L \cdot N_A}} $$ Approaches: - Atomistic Monte Carlo (place individual dopants) - Statistical impedance field method - Compact model statistical extensions 8.3 Multiphysics Coupling Electro-Thermal Self-Heating: $$ \rho C_p \frac{\partial T}{\partial t} = \nabla \cdot (\kappa \nabla T) + \mathbf{J} \cdot \mathbf{E} $$ Stress Effects on Mobility (Piezoresistance): $$ \frac{\Delta \mu}{\mu_0} = \pi_L \sigma_L + \pi_T \sigma_T $$ Electromigration in Interconnects: $$ \mathbf{J}_{atoms} = \frac{D C}{k_B T} \left( Z^* q \mathbf{E} - \Omega \nabla \sigma \right) $$ 8.4 Atomistic-Continuum Bridging Strategies: - Coarse-graining from MD/DFT - Density gradient quantum corrections: $$ V_{QM} = \frac{\gamma \hbar^2}{12 m^*} \frac{\nabla^2 \sqrt{n}}{\sqrt{n}} $$ - Hybrid methods: atomistic core + continuum far-field The mathematics of semiconductor manufacturing and device physics encompasses: $$ \boxed{ \begin{aligned} &\text{Process:} && \text{Stefan problems, diffusion PDEs, reaction kinetics} \\ &\text{Device:} && \text{Coupled Poisson + continuity equations} \\ &\text{Quantum:} && \text{Schrödinger, NEGF, tunneling} \\ &\text{Numerical:} && \text{FEM/FDM, Scharfetter-Gummel, Newton iteration} \\ &\text{Circuit:} && \text{Compact models (BSIM), variability statistics} \end{aligned} } $$ Each level trades accuracy for computational tractability . The art lies in knowing when each approximation breaks down—and modern scaling is pushing us toward the quantum limit where classical continuum models become inadequate.
# Device Physics, TCAD, and Mathematical Modeling 1. Physical Foundation 1.1 Band Theory and Electronic Structure - Energy bands arise from the periodic potential of the crystal lattice - Conduction band (empty states available for electron transport) - Valence band (filled states; holes represent missing electrons) - Bandgap $E_g$ separates these bands (Si: ~1.12 eV at 300K) - Effective mass approximation - Electrons and holes behave as quasi-particles with modified mass - Electron effective mass: $m_n^*$ - Hole effective mass: $m_p^*$ - Carrier statistics follow Fermi-Dirac distribution: $$ f(E) = \frac{1}{1 + \exp\left(\frac{E - E_F}{k_B T}\right)} $$ - Carrier concentrations in non-degenerate semiconductors: $$ n = N_C \exp\left(-\frac{E_C - E_F}{k_B T}\right) $$ $$ p = N_V \exp\left(-\frac{E_F - E_V}{k_B T}\right) $$ Where: - $N_C$, $N_V$ = effective density of states in conduction/valence bands - $E_C$, $E_V$ = conduction/valence band edges - $E_F$ = Fermi level 1.2 Carrier Transport Mechanisms | Mechanism | Driving Force | Current Density | |-----------|---------------|-----------------| | Drift | Electric field $\mathbf{E}$ | $\mathbf{J} = qn\mu\mathbf{E}$ | | Diffusion | Concentration gradient | $\mathbf{J} = qD\nabla n$ | | Thermionic emission | Thermal energy over barrier | Exponential in $\phi_B/k_BT$ | | Tunneling | Quantum penetration | Exponential in barrier | - Einstein relation connects mobility and diffusivity: $$ D = \frac{k_B T}{q} \mu $$ 1.3 Generation and Recombination - Thermal equilibrium condition: $$ np = n_i^2 $$ - Three primary recombination mechanisms: 1. Shockley-Read-Hall (SRH) — trap-assisted 2. Auger — three-particle process (dominant at high injection) 3. Radiative — photon emission (important in direct bandgap materials) 2. Mathematical Hierarchy 2.1 Quantum Mechanical Level (Most Fundamental) Time-Independent Schrödinger Equation $$ \left[-\frac{\hbar^2}{2m^*}\nabla^2 + V(\mathbf{r})\right]\psi = E\psi $$ Where: - $\hbar$ = reduced Planck constant - $m^*$ = effective mass - $V(\mathbf{r})$ = potential energy - $\psi$ = wavefunction - $E$ = energy eigenvalue Non-Equilibrium Green's Function (NEGF) For open quantum systems (nanoscale devices, tunneling): $$ G^R = [EI - H - \Sigma]^{-1} $$ - $G^R$ = retarded Green's function - $H$ = device Hamiltonian - $\Sigma$ = self-energy (encodes contact coupling) Applications: - Tunnel FETs - Ultra-scaled MOSFETs ($L_g < 10$ nm) - Quantum well devices - Resonant tunneling diodes 2.2 Boltzmann Transport Level Boltzmann Transport Equation (BTE) $$ \frac{\partial f}{\partial t} + \mathbf{v} \cdot \nabla_{\mathbf{r}} f + \frac{\mathbf{F}}{\hbar} \cdot \nabla_{\mathbf{k}} f = \left(\frac{\partial f}{\partial t}\right)_{\text{coll}} $$ Where: - $f(\mathbf{r}, \mathbf{k}, t)$ = distribution function in phase space - $\mathbf{v}$ = group velocity - $\mathbf{F}$ = external force - RHS = collision integral Solution Methods: - Monte Carlo (stochastic particle tracking) - Spherical Harmonics Expansion (SHE) - Moments methods → leads to drift-diffusion, hydrodynamic Captures: - Hot carrier effects - Velocity overshoot - Non-equilibrium distributions - Ballistic transport 2.3 Hydrodynamic / Energy Balance Level Derived from moments of BTE with carrier temperature as variable: $$ \frac{\partial (nw)}{\partial t} + \nabla \cdot \mathbf{S} = \mathbf{J} \cdot \mathbf{E} - \frac{n(w - w_0)}{\tau_w} $$ - $w$ = carrier energy density - $\mathbf{S}$ = energy flux - $\tau_w$ = energy relaxation time - $w_0$ = equilibrium energy density Key feature: Carrier temperature $T_n \neq$ lattice temperature $T_L$ 2.4 Drift-Diffusion Level (The Workhorse) The most widely used TCAD formulation — three coupled PDEs: Poisson's Equation (Electrostatics) $$ \nabla \cdot (\varepsilon \nabla \psi) = -\rho = -q(p - n + N_D^+ - N_A^-) $$ - $\psi$ = electrostatic potential - $\varepsilon$ = permittivity - $\rho$ = charge density - $N_D^+$, $N_A^-$ = ionized donor/acceptor concentrations Electron Continuity Equation $$ \frac{\partial n}{\partial t} = \frac{1}{q}\nabla \cdot \mathbf{J}_n + G_n - R_n $$ Hole Continuity Equation $$ \frac{\partial p}{\partial t} = -\frac{1}{q}\nabla \cdot \mathbf{J}_p + G_p - R_p $$ Current Density Equations Standard form: $$ \mathbf{J}_n = q\mu_n n \mathbf{E} + qD_n \nabla n $$ $$ \mathbf{J}_p = q\mu_p p \mathbf{E} - qD_p \nabla p $$ Quasi-Fermi level formulation: $$ \mathbf{J}_n = q\mu_n n \nabla E_{F,n} $$ $$ \mathbf{J}_p = q\mu_p p \nabla E_{F,p} $$ System characteristics: - Coupled, nonlinear, elliptic-parabolic PDEs - Carrier concentrations vary exponentially with potential - Spans 10+ orders of magnitude across junctions 3. Numerical Methods 3.1 Spatial Discretization Finite Difference Method (FDM) - Simple implementation - Limited to structured (rectangular) grids - Box integration for conservation Finite Element Method (FEM) - Handles complex geometries - Basis function expansion - Weak (variational) formulation Finite Volume Method (FVM) - Ensures local conservation - Natural for semiconductor equations - Control volume integration 3.2 Scharfetter-Gummel Discretization Critical for numerical stability — handles exponential carrier variations: $$ J_{n,i+\frac{1}{2}} = \frac{qD_n}{h}\left[n_i B\left(\frac{\psi_i - \psi_{i+1}}{V_T}\right) - n_{i+1} B\left(\frac{\psi_{i+1} - \psi_i}{V_T}\right)\right] $$ Where the Bernoulli function is: $$ B(x) = \frac{x}{e^x - 1} $$ Properties: - Reduces to central difference for small $\Delta\psi$ - Reduces to upwind for large $\Delta\psi$ - Prevents spurious oscillations - Thermal voltage: $V_T = k_B T / q \approx 26$ mV at 300K 3.3 Mesh Generation - 2D: Delaunay triangulation - 3D: Tetrahedral meshing Adaptive refinement criteria: - Junction regions (high field gradients) - Oxide interfaces - Contact regions - High current density areas Quality metrics: - Aspect ratio - Orthogonality (important for FVM) - Delaunay property (circumsphere criterion) 3.4 Nonlinear Solvers Gummel Iteration (Decoupled) repeat: 1. Solve Poisson equation → ψ 2. Solve electron continuity → n 3. Solve hole continuity → p until convergence Pros: - Simple implementation - Robust for moderate bias - Each subproblem is smaller Cons: - Poor convergence at high injection - Slow for strongly coupled systems Newton-Raphson (Fully Coupled) Solve the linearized system: $$ \mathbf{J} \cdot \delta\mathbf{x} = -\mathbf{F}(\mathbf{x}) $$ Where: - $\mathbf{J}$ = Jacobian matrix $\partial \mathbf{F}/\partial \mathbf{x}$ - $\mathbf{F}$ = residual vector - $\delta\mathbf{x}$ = update vector Pros: - Quadratic convergence near solution - Handles strong coupling Cons: - Requires good initial guess - Expensive Jacobian assembly - Larger linear systems Hybrid Methods - Start with Gummel to get close - Switch to Newton for fast final convergence 3.5 Linear Solvers For large, sparse, ill-conditioned Jacobian systems: | Method | Type | Characteristics | |--------|------|-----------------| | LU (PARDISO, UMFPACK) | Direct | Robust, memory-intensive | | GMRES | Iterative | Krylov subspace, needs preconditioning | | BiCGSTAB | Iterative | Non-symmetric systems | | Multigrid | Iterative | Optimal for Poisson-like equations | 4. Physical Models in TCAD 4.1 Mobility Models Matthiessen's Rule Combines independent scattering mechanisms: $$ \frac{1}{\mu} = \frac{1}{\mu_{\text{lattice}}} + \frac{1}{\mu_{\text{impurity}}} + \frac{1}{\mu_{\text{surface}}} + \cdots $$ Lattice Scattering $$ \mu_L = \mu_0 \left(\frac{T}{300}\right)^{-\alpha} $$ - Si electrons: $\alpha \approx 2.4$ - Si holes: $\alpha \approx 2.2$ Ionized Impurity Scattering Brooks-Herring model: $$ \mu_I \propto \frac{T^{3/2}}{N_I \cdot \ln(1 + b^2) - b^2/(1+b^2)} $$ High-Field Saturation (Caughey-Thomas) $$ \mu(E) = \frac{\mu_0}{\left[1 + \left(\frac{\mu_0 E}{v_{\text{sat}}}\right)^\beta\right]^{1/\beta}} $$ - $v_{\text{sat}}$ = saturation velocity (~$10^7$ cm/s for Si) - $\beta$ = fitting parameter (~2 for electrons, ~1 for holes) 4.2 Recombination Models Shockley-Read-Hall (SRH) $$ R_{\text{SRH}} = \frac{np - n_i^2}{\tau_p(n + n_1) + \tau_n(p + p_1)} $$ Where: - $\tau_n$, $\tau_p$ = carrier lifetimes - $n_1 = n_i \exp[(E_t - E_i)/k_BT]$ - $p_1 = n_i \exp[(E_i - E_t)/k_BT]$ - $E_t$ = trap energy level Auger Recombination $$ R_{\text{Auger}} = (C_n n + C_p p)(np - n_i^2) $$ - $C_n$, $C_p$ = Auger coefficients (~$10^{-31}$ cm$^6$/s for Si) - Dominant at high carrier densities ($>10^{18}$ cm$^{-3}$) Radiative Recombination $$ R_{\text{rad}} = B(np - n_i^2) $$ - $B$ = radiative coefficient - Important in direct bandgap materials (GaAs, InP) 4.3 Band-to-Band Tunneling For tunnel FETs, Zener diodes: $$ G_{\text{BTBT}} = A \cdot E^2 \exp\left(-\frac{B}{E}\right) $$ - $A$, $B$ = material-dependent parameters - $E$ = electric field magnitude 4.4 Quantum Corrections Density Gradient Method Adds quantum potential to classical equations: $$ V_Q = -\frac{\hbar^2}{6m^*} \frac{\nabla^2\sqrt{n}}{\sqrt{n}} $$ Or equivalently, the quantum potential term: $$ \Lambda_n = \frac{\hbar^2}{12 m_n^* k_B T} \nabla^2 \ln(n) $$ Applications: - Inversion layer quantization in MOSFETs - Thin body SOI devices - FinFETs, nanowires 1D Schrödinger-Poisson For stronger quantum confinement: 1. Solve 1D Schrödinger in confinement direction → subbands $E_i$, $\psi_i$ 2. Calculate 2D density of states 3. Compute carrier density from subband occupation 4. Solve 2D Poisson with quantum charge 5. Iterate to self-consistency 4.5 Bandgap Narrowing At high doping ($N > 10^{17}$ cm$^{-3}$): $$ \Delta E_g = A \cdot N^{1/3} + B \cdot \ln\left(\frac{N}{N_{\text{ref}}}\right) $$ Effect: Increases $n_i^2$ → affects recombination and device characteristics 4.6 Interface Models - Interface trap density: $D_{it}(E)$ — states per cm$^2$·eV - Oxide charges: - Fixed oxide charge $Q_f$ - Mobile ionic charge $Q_m$ - Oxide trapped charge $Q_{ot}$ - Interface trapped charge $Q_{it}$ 5. Process TCAD 5.1 Ion Implantation Monte Carlo Method - Track individual ion trajectories - Binary collision approximation - Accurate for low doses, complex geometries Analytical Profiles Gaussian: $$ N(x) = \frac{\Phi}{\sqrt{2\pi}\Delta R_p} \exp\left[-\frac{(x - R_p)^2}{2\Delta R_p^2}\right] $$ - $\Phi$ = dose (ions/cm$^2$) - $R_p$ = projected range - $\Delta R_p$ = straggle Pearson IV: Adds skewness and kurtosis for better accuracy 5.2 Diffusion Fick's First Law: $$ \mathbf{J} = -D \nabla C $$ Fick's Second Law: $$ \frac{\partial C}{\partial t} = \nabla \cdot (D \nabla C) $$ Concentration-dependent diffusion: $$ D = D_i \left(\frac{n}{n_i}\right)^2 + D_v + D_x \left(\frac{n}{n_i}\right) $$ (Accounts for charged point defects) 5.3 Oxidation Deal-Grove Model: $$ x_{ox}^2 + A \cdot x_{ox} = B(t + \tau) $$ - $x_{ox}$ = oxide thickness - $A$, $B$ = temperature-dependent parameters - Linear regime: $x_{ox} \approx (B/A) \cdot t$ (thin oxide) - Parabolic regime: $x_{ox} \approx \sqrt{B \cdot t}$ (thick oxide) 5.4 Etching and Deposition Level-set method for surface evolution: $$ \frac{\partial \phi}{\partial t} + v_n |\nabla \phi| = 0 $$ - $\phi$ = level-set function (zero contour = surface) - $v_n$ = normal velocity (etch/deposition rate) 6. Multiphysics and Advanced Topics 6.1 Electrothermal Coupling Heat equation: $$ \rho c_p \frac{\partial T}{\partial t} = \nabla \cdot (\kappa \nabla T) + H $$ Heat generation: $$ H = \mathbf{J} \cdot \mathbf{E} + (R - G)(E_g + 3k_BT) $$ - First term: Joule heating - Second term: recombination heating Thermoelectric effects: - Seebeck effect - Peltier effect - Thomson effect 6.2 Electromechanical Coupling Strain effects on mobility: $$ \mu_{\text{strained}} = \mu_0 (1 + \Pi \cdot \sigma) $$ - $\Pi$ = piezoresistance coefficient - $\sigma$ = mechanical stress Applications: Strained Si, SiGe channels 6.3 Statistical Variability Sources of random variation: - Random Dopant Fluctuations (RDF) — discrete dopant positions - Line Edge Roughness (LER) — gate patterning variation - Metal Gate Granularity (MGG) — work function variation - Oxide Thickness Variation (OTV) Simulation approach: - Monte Carlo sampling over device instances - Statistical TCAD → threshold voltage distributions 6.4 Reliability Modeling Bias Temperature Instability (BTI): - Defect generation at Si/SiO$_2$ interface - Reaction-diffusion models Hot Carrier Injection (HCI): - High-energy carriers damage interface - Coupled with energy transport 6.5 Noise Modeling Noise sources: - Thermal noise: $S_I = 4k_BT/R$ - Shot noise: $S_I = 2qI$ - 1/f noise (flicker): $S_I \propto I^2/(f \cdot N)$ Impedance field method for spatial correlation 7. Computational Architecture 7.1 Model Hierarchy Comparison | Level | Physics | Math | Cost | Accuracy | |-------|---------|------|------|----------| | NEGF | Quantum coherence | $G = [E-H-\Sigma]^{-1}$ | $$$$$ | Highest | | Monte Carlo | Distribution function | Stochastic DEs | $$$$ | High | | Hydrodynamic | Carrier temperature | Hyperbolic-parabolic PDEs | $$$ | Good | | Drift-Diffusion | Continuum transport | Elliptic-parabolic PDEs | $$ | Moderate | | Compact Models | Empirical | Algebraic | $ | Calibrated | 7.2 Software Architecture ```text ┌─────────────────────────────────────────┐ │ User Interface (GUI) │ ├─────────────────────────────────────────┤ │ Structure Definition │ │ (Geometry, Mesh, Materials) │ ├─────────────────────────────────────────┤ │ Physical Models │ │ (Mobility, Recombination, Quantum) │ ├─────────────────────────────────────────┤ │ Numerical Engine │ │ (Discretization, Solvers, Linear Alg) │ ├─────────────────────────────────────────┤ │ Post-Processing │ │ (Visualization, Parameter Extraction) │ └─────────────────────────────────────────┘ ``` 7.3 TCAD ↔ Compact Model Flow ```text ┌──────────┐ calibrate ┌──────────────┐ │ TCAD │ ──────────────► │ Compact Model│ │(Physics) │ │ (BSIM,PSP) │ └──────────┘ └──────────────┘ │ │ │ validate │ enable ▼ ▼ ┌──────────┐ ┌──────────────┐ │ Silicon │ │ Circuit │ │ Data │ │ Simulation │ └──────────┘ └──────────────┘ ``` Equations: Fundamental Constants | Symbol | Name | Value | |--------|------|-------| | $q$ | Elementary charge | $1.602 \times 10^{-19}$ C | | $k_B$ | Boltzmann constant | $1.381 \times 10^{-23}$ J/K | | $\hbar$ | Reduced Planck | $1.055 \times 10^{-34}$ J·s | | $\varepsilon_0$ | Vacuum permittivity | $8.854 \times 10^{-12}$ F/m | | $V_T$ | Thermal voltage (300K) | 25.9 mV | Silicon Properties (300K) | Property | Value | |----------|-------| | Bandgap $E_g$ | 1.12 eV | | Intrinsic carrier density $n_i$ | $1.0 \times 10^{10}$ cm$^{-3}$ | | Electron mobility $\mu_n$ | 1450 cm$^2$/V·s | | Hole mobility $\mu_p$ | 500 cm$^2$/V·s | | Electron saturation velocity | $1.0 \times 10^7$ cm/s | | Relative permittivity $\varepsilon_r$ | 11.7 |
Deionized water with high resistivity is essential in semiconductor processing for rinsing and chemical preparation minimizing ionic contamination.
Suggest possible diagnoses from symptoms.
Assess what information is present.
Generate diagrams from descriptions. Mermaid, PlantUML.
Find sparse feature representations.
Die shear testing measures die attach adhesion by applying lateral force until separation occurs.
Diffusion-GAN combines diffusion models with adversarial training for diverse graph generation.
Search architectures via gradient descent.
Advanced memory-augmented network.
Differentiable rendering enables gradient-based optimization of 3D representations from images.