← Back to AI Factory Chat

AI Factory Glossary

31 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 1 of 1 (31 entries)

fab-wide control,metrology

Use metrology data across all tools to maintain process targets.

fan-out wafer-level packaging,advanced packaging

Package dies at wafer level with RDL extending beyond die area.

fib (focused ion beam),fib,focused ion beam,metrology

Use ion beam to cut cross-sections or edit circuits for FA.

filler in molding compound, packaging

Silica particles in EMC.

filler loading, packaging

Percentage of filler.

final inspection,metrology

Defect check before shipment.

fine-pitch bga, packaging

Small ball pitch.

fine-pitch interconnects, advanced packaging

Very small spacing between connections.

first wafer effect, production

Performance difference after maintenance.

flame retardant in emc, packaging

Meet flammability standards.

flip chip,advanced packaging

Mount die face-down with solder bumps for connections.

flip-chip bonding, packaging

Face-down die attachment.

floor life, packaging

Time allowed out of bag.

fluid dynamics, semiconductor fluid dynamics, navier stokes, reynolds number, cfd, wet processing, cmp slurry, gas dynamics

# Fluid Dynamics: Mathematical Modeling 1. Overview: Where Fluid Dynamics Matters Fluid dynamics plays a critical role in numerous semiconductor fabrication steps: - Chemical Vapor Deposition (CVD) — Precursor gas transport and reaction - Spin Coating — Photoresist film formation - Chemical Mechanical Planarization (CMP) — Slurry flow and material removal - Wet Etching/Cleaning — Etchant transport to surfaces - Immersion Lithography — Water flow between lens and wafer - Electrochemical Deposition — Electrolyte flow and ion transport Each process involves distinct physics, but they share a common mathematical foundation. 2. Fundamental Governing Equations 2.1 Navier-Stokes Framework The foundation is the incompressible Navier-Stokes equations. Continuity Equation $$ \nabla \cdot \mathbf{u} = 0 $$ Momentum Equation $$ \rho\left(\frac{\partial \mathbf{u}}{\partial t} + \mathbf{u} \cdot \nabla \mathbf{u}\right) = -\nabla p + \mu \nabla^2 \mathbf{u} + \mathbf{F} $$ Where: - $\mathbf{u}$ — Velocity field vector - $p$ — Pressure field - $\rho$ — Fluid density - $\mu$ — Dynamic viscosity - $\mathbf{F}$ — Body forces (gravity, electromagnetic, etc.) Species Transport Equation $$ \frac{\partial C_i}{\partial t} + \mathbf{u} \cdot \nabla C_i = D_i \nabla^2 C_i + R_i $$ Where: - $C_i$ — Concentration of species $i$ - $D_i$ — Diffusion coefficient of species $i$ - $R_i$ — Reaction rate (source/sink term) Energy Equation $$ \rho c_p \left(\frac{\partial T}{\partial t} + \mathbf{u} \cdot \nabla T\right) = k \nabla^2 T + Q $$ Where: - $c_p$ — Specific heat capacity - $T$ — Temperature - $k$ — Thermal conductivity - $Q$ — Heat source (reaction heat, Joule heating, etc.) 3. Chemical Vapor Deposition (CVD) CVD is one of the most mathematically complex processes, coupling gas-phase transport, homogeneous reactions, and heterogeneous surface chemistry. 3.1 Reactor-Scale Transport In a typical showerhead reactor, gas enters through distributed holes and flows toward a heated wafer. The classic stagnation-point flow solution applies. Similarity Solution For axisymmetric flow toward a disk: $$ u_r = r f'(\eta), \quad u_z = -\sqrt{\nu a} \cdot f(\eta) $$ Where: - $\eta = z\sqrt{a/\nu}$ — Similarity variable - $a$ — Strain rate parameter - $\nu$ — Kinematic viscosity This yields the Hiemenz equation : $$ f''' + ff'' - (f')^2 + 1 = 0 $$ With boundary conditions: - $f(0) = f'(0) = 0$ (no-slip at surface) - $f'(\infty) = 1$ (far-field condition) 3.2 Key Dimensionless Groups Damköhler Number $$ \text{Da} = \frac{k_s L}{D} $$ Physical meaning: Ratio of surface reaction rate to diffusive transport rate. | Regime | Condition | Implication | |--------|-----------|-------------| | Transport-limited | $\text{Da} \gg 1$ | Uniformity controlled by flow | | Reaction-limited | $\text{Da} \ll 1$ | Uniformity controlled by temperature | Péclet Number $$ \text{Pe} = \frac{UL}{D} $$ Physical meaning: Ratio of convective to diffusive transport. Grashof Number $$ \text{Gr} = \frac{g\beta \Delta T L^3}{\nu^2} $$ Physical meaning: Ratio of buoyancy to viscous forces (important in horizontal reactors). Where: - $g$ — Gravitational acceleration - $\beta$ — Thermal expansion coefficient - $\Delta T$ — Temperature difference 3.3 Surface Boundary Conditions The critical coupling between transport and chemistry at the wafer surface: $$ -D \left.\frac{\partial C}{\partial n}\right|_{\text{surface}} = k_s \cdot f(C, T, \theta) $$ This is a Robin boundary condition linking diffusive flux to surface kinetics. Langmuir-Hinshelwood Kinetics $$ R = \frac{k C}{1 + KC} $$ Features: - First-order at low concentration ($C \ll 1/K$) - Zero-order (saturated) at high concentration ($C \gg 1/K$) Sticking Coefficient Model $$ s = s_0 \cdot f(T) \cdot (1 - \theta) $$ Where: - $s_0$ — Base sticking coefficient - $\theta$ — Surface coverage fraction 3.4 Multi-Scale Challenge CVD spans enormous length scales: | Scale | Dimension | Physics | |-------|-----------|---------| | Reactor chamber | 0.1–1 m | Continuum CFD | | Boundary layer | 1–10 mm | Convection-diffusion | | Surface features | 10–100 nm | Ballistic/Knudsen transport | | Molecular mean free path | 0.1–10 μm | Molecular dynamics | Knudsen Number $$ \text{Kn} = \frac{\lambda}{L} $$ Where $\lambda$ is the molecular mean free path. | Regime | Condition | Modeling Approach | |--------|-----------|-------------------| | Continuum | $\text{Kn} < 0.01$ | Navier-Stokes | | Slip flow | $0.01 < \text{Kn} < 0.1$ | Navier-Stokes + slip BC | | Transition | $0.1 < \text{Kn} < 10$ | DSMC, Boltzmann | | Free molecular | $\text{Kn} > 10$ | Ballistic transport | 4. Spin Coating Spin coating deposits thin photoresist films through centrifugal spreading and solvent evaporation. 4.1 Thin Film Lubrication Theory For a thin viscous layer ($h \ll R$) on a rotating disk, the lubrication approximation applies: $$ \frac{\partial h}{\partial t} + \frac{1}{r}\frac{\partial}{\partial r}(r h \bar{u}_r) = -E $$ Where: - $h(r,t)$ — Film thickness - $\bar{u}_r$ — Depth-averaged radial velocity - $E$ — Evaporation rate 4.2 Velocity Profile Integrating the momentum equation with: - No-slip at substrate ($u_r = 0$ at $z = 0$) - Zero shear at free surface ($\partial u_r / \partial z = 0$ at $z = h$) Yields: $$ u_r(z) = \frac{\rho \omega^2 r}{2\mu}(2hz - z^2) $$ Depth-averaged velocity: $$ \bar{u}_r = \frac{\rho \omega^2 r h^2}{3\mu} $$ 4.3 Emslie-Bonner-Peck Solution For a Newtonian fluid without evaporation: $$ \frac{\partial h}{\partial t} = -\frac{\rho \omega^2}{3\mu} \cdot \frac{1}{r}\frac{\partial (r h^3)}{\partial r} $$ For uniform initial thickness $h_0$: $$ h(t) = \frac{h_0}{\sqrt{1 + \dfrac{4\rho \omega^2 h_0^2 t}{3\mu}}} $$ Asymptotic behavior: - Short time: $h \approx h_0$ - Long time: $h \propto t^{-1/2}$ 4.4 Non-Newtonian Photoresists Real photoresists are shear-thinning. Using a power-law model : $$ \tau = K\left(\frac{\partial u}{\partial z}\right)^n $$ Where: - $K$ — Consistency index - $n$ — Power-law index ($n < 1$ for shear-thinning) The governing equation becomes: $$ \frac{\partial h}{\partial t} = -\frac{n}{2n+1}\left(\frac{\rho \omega^2}{K}\right)^{1/n} \frac{1}{r}\frac{\partial}{\partial r}\left(r h^{(2n+1)/n}\right) $$ 4.5 Evaporation and Marangoni Effects Coupled Concentration Equation $$ \frac{\partial(h x_s)}{\partial t} + \frac{1}{r}\frac{\partial}{\partial r}(r h x_s \bar{u}_r) = -\frac{e}{\rho_s} $$ Where: - $x_s$ — Solvent mass fraction - $e$ — Evaporation mass flux - $\rho_s$ — Solvent density Marangoni Stress Surface tension gradients drive Marangoni flows: $$ \tau_{\text{surface}} = \frac{\partial \sigma}{\partial r} = \frac{d\sigma}{dC}\frac{\partial C}{\partial r} $$ Marangoni Number $$ \text{Ma} = \frac{\Delta\sigma \cdot L}{\mu \alpha} $$ Where $\alpha$ is thermal diffusivity. 5. Chemical Mechanical Planarization (CMP) CMP combines chemical etching with mechanical abrasion, mediated by slurry flow between pad and wafer. 5.1 Reynolds Lubrication Equation For the thin fluid film: $$ \frac{\partial}{\partial x}\left(h^3 \frac{\partial p}{\partial x}\right) + \frac{\partial}{\partial y}\left(h^3 \frac{\partial p}{\partial y}\right) = 6\mu U \frac{\partial h}{\partial x} + 12\mu \frac{\partial h}{\partial t} $$ Terms: - Left side: Pressure-driven (Poiseuille) flow - First term on right: Shear-driven (Couette) flow (wedge effect) - Second term on right: Squeeze film effect 5.2 Slurry as Suspension CMP slurries contain abrasive particles exhibiting complex rheology. Shear-Induced Migration (Leighton-Acrivos) $$ \mathbf{J}_{\text{shear}} = -K_c a^2 \phi \nabla(\dot{\gamma} \phi) - K_\eta a^2 \dot{\gamma} \phi^2 \nabla(\ln \eta) $$ Where: - $a$ — Particle radius - $\phi$ — Particle volume fraction - $\dot{\gamma}$ — Shear rate - $K_c, K_\eta$ — Empirical constants Physical effect: Particles migrate from high-shear to low-shear regions. Effective Viscosity (Krieger-Dougherty) $$ \eta_{\text{eff}} = \eta_0 \left(1 - \frac{\phi}{\phi_m}\right)^{-[\eta]\phi_m} $$ Where: - $\phi_m$ — Maximum packing fraction (~0.64) - $[\eta]$ — Intrinsic viscosity (~2.5 for spheres) 5.3 Material Removal Models Classical Preston Equation $$ \text{MRR} = K_p \cdot p \cdot V $$ Where: - MRR — Material removal rate - $K_p$ — Preston coefficient - $p$ — Applied pressure - $V$ — Relative velocity Enhanced Models $$ \text{MRR} = f(\tau_{\text{shear}}, \phi_{\text{particle}}, k_{\text{chem}}, T) $$ Incorporating: - Fluid shear stress: $\tau = \mu \left.\dfrac{\partial u}{\partial z}\right|_{\text{surface}}$ - Local particle flux - Chemical reaction rate - Temperature-dependent kinetics 5.4 Contact Mechanics When pad asperities contact wafer: Greenwood-Williamson Model $$ p_{\text{contact}} = \frac{4}{3} E^* n \int_d^\infty (z-d)^{3/2} \phi(z) \, dz $$ Where: - $E^*$ — Effective elastic modulus - $n$ — Asperity density - $\phi(z)$ — Asperity height distribution - $d$ — Separation distance Force Balance $$ p_{\text{fluid}} + p_{\text{contact}} = P_{\text{applied}} $$ 6. Wet Etching: Mass Transfer Limited Processes 6.1 Convective-Diffusion Equation $$ \frac{\partial C}{\partial t} + \mathbf{u} \cdot \nabla C = D \nabla^2 C $$ At the reactive surface (fast reaction limit): $$ C|_{\text{surface}} = 0 $$ Etch rate: $$ \text{ER} \propto D \left.\frac{\partial C}{\partial n}\right|_{\text{surface}} $$ 6.2 Rotating Disk Solution (Levich) For a wafer rotating in etchant: Velocity Components $$ u_r = r\omega F(\zeta), \quad u_\theta = r\omega G(\zeta), \quad u_z = \sqrt{\nu\omega} H(\zeta) $$ Where $\zeta = z\sqrt{\omega/\nu}$. Boundary Layer Thickness $$ \delta = 1.61 D^{1/3} \nu^{1/6} \omega^{-1/2} $$ Mass Flux (Levich Equation) $$ j = 0.62 D^{2/3} \nu^{-1/6} \omega^{1/2} C_\infty $$ Key insight : The etch rate is uniform across an infinite disk . This explains why rotating processes achieve excellent uniformity. 6.3 Feature-Scale Transport In high-aspect-ratio trenches: Knudsen Diffusion $$ D_{\text{Kn}} = \frac{d}{3}\sqrt{\frac{8RT}{\pi M}} $$ Where: - $d$ — Trench width - $M$ — Molecular weight Concentration Profile in Trench For a trench of depth $L$ with reactive bottom: $$ \frac{d^2 C}{dz^2} = 0 \quad \text{(diffusion only)} $$ With boundary conditions: - $C(0) = C_{\text{top}}$ (top of trench) - $-D\dfrac{dC}{dz}\big|_{z=L} = k_s C(L)$ (reactive bottom) Solution: $$ \frac{C(z)}{C_{\text{top}}} = 1 - \frac{z}{L} \cdot \frac{1}{1 + D/(k_s L)} $$ Thiele Modulus $$ \phi = L\sqrt{\frac{k_s}{D}} $$ - $\phi \ll 1$: Reaction-limited (uniform etch in feature) - $\phi \gg 1$: Transport-limited (RIE lag) 7. Immersion Lithography At 193 nm wavelength, water ($n \approx 1.44$) fills the gap between lens and wafer, increasing numerical aperture. 7.1 Free Surface Dynamics Capillary Number $$ \text{Ca} = \frac{\mu U}{\sigma} $$ Where $\sigma$ is surface tension. - $\text{Ca} < \text{Ca}_{\text{crit}} \approx 0.1$: Stable meniscus - $\text{Ca} > \text{Ca}_{\text{crit}}$: Bubble entrainment risk Young-Laplace Equation $$ \Delta p = \sigma \kappa = \sigma \left(\frac{1}{R_1} + \frac{1}{R_2}\right) $$ Where $\kappa$ is the interface curvature. 7.2 Interface Tracking Methods Level Set Method $$ \frac{\partial \phi}{\partial t} + \mathbf{u} \cdot \nabla \phi = 0 $$ Where: - $\phi > 0$: Liquid phase - $\phi < 0$: Gas phase - $\phi = 0$: Interface Volume of Fluid (VOF) $$ \frac{\partial \alpha}{\partial t} + \nabla \cdot (\alpha \mathbf{u}) = 0 $$ Where $\alpha$ is the volume fraction. 7.3 Thermal Management Light absorption heats the water: $$ \rho c_p \left(\frac{\partial T}{\partial t} + \mathbf{u} \cdot \nabla T\right) = k\nabla^2 T + Q_{\text{abs}} $$ Refractive Index Sensitivity $$ \frac{dn}{dT} \approx -1 \times 10^{-4} \text{ K}^{-1} $$ Temperature variations cause refractive index changes, introducing imaging errors (aberrations). 8. Numerical Methods 8.1 Finite Volume Method (FVM) The workhorse for semiconductor CFD. Starting from integral form: $$ \frac{\partial}{\partial t}\int_V \rho \phi \, dV + \oint_S \rho \phi \mathbf{u} \cdot \mathbf{n} \, dS = \oint_S \Gamma \nabla \phi \cdot \mathbf{n} \, dS + \int_V S_\phi \, dV $$ Discretization $$ \frac{(\rho \phi)_P^{n+1} - (\rho \phi)_P^n}{\Delta t} V_P + \sum_f F_f \phi_f = \sum_f \Gamma_f (\nabla \phi)_f \cdot \mathbf{A}_f + S_\phi V_P $$ Where: - $P$ — Cell center - $f$ — Face index - $F_f = \rho \mathbf{u}_f \cdot \mathbf{A}_f$ — Face flux 8.2 Advection Schemes | Scheme | Order | Properties | |--------|-------|------------| | Upwind | 1st | Stable, diffusive | | Central | 2nd | Unstable for high Pe | | QUICK | 3rd | Good accuracy, bounded | | MUSCL | 2nd | TVD, shock-capturing | 8.3 Pressure-Velocity Coupling SIMPLE Algorithm 1. Guess pressure field $p^*$ 2. Solve momentum for $\mathbf{u}^*$ 3. Solve pressure correction: $\nabla \cdot (D \nabla p') = \nabla \cdot \mathbf{u}^*$ 4. Correct: $p = p^* + \alpha_p p'$, $\mathbf{u} = \mathbf{u}^* - D \nabla p'$ 5. Iterate until convergence 8.4 Moving Boundary Problems For etching/deposition where geometry evolves: Arbitrary Lagrangian-Eulerian (ALE) $$ \left.\frac{\partial \phi}{\partial t}\right|_{\chi} + (\mathbf{u} - \mathbf{u}_{\text{mesh}}) \cdot \nabla \phi = \text{RHS} $$ Where $\mathbf{u}_{\text{mesh}}$ is mesh velocity. Level Set Velocity Extension $$ \frac{\partial d}{\partial \tau} + \text{sign}(\phi)(|\nabla d| - 1) = 0 $$ Reinitializes the level set to a signed distance function. 8.5 Stiff Chemistry CVD with multiple reactions has time scales from ns (gas reactions) to s (deposition). Operator Splitting 1. Solve transport: $\dfrac{\partial C}{\partial t} + \mathbf{u} \cdot \nabla C = D\nabla^2 C$ 2. Solve chemistry: $\dfrac{dC}{dt} = R(C)$ (using stiff ODE solver) Implicit Methods For stiff systems: $$ \mathbf{C}^{n+1} = \mathbf{C}^n + \Delta t \cdot \mathbf{R}(\mathbf{C}^{n+1}) $$ Requires Newton iteration with Jacobian $\partial R_i / \partial C_j$. 9. Dimensionless | Group | Definition | Physical Meaning | |-------|------------|------------------| | Reynolds (Re) | $\dfrac{\rho UL}{\mu}$ | Inertia / Viscosity | | Péclet (Pe) | $\dfrac{UL}{D}$ | Convection / Diffusion | | Damköhler (Da) | $\dfrac{k_s L}{D}$ | Reaction / Transport | | Knudsen (Kn) | $\dfrac{\lambda}{L}$ | Mean free path / Length | | Capillary (Ca) | $\dfrac{\mu U}{\sigma}$ | Viscous / Surface tension | | Marangoni (Ma) | $\dfrac{\Delta\sigma \cdot L}{\mu \alpha}$ | Marangoni / Viscous | | Grashof (Gr) | $\dfrac{g\beta \Delta T L^3}{\nu^2}$ | Buoyancy / Viscous | | Schmidt (Sc) | $\dfrac{\nu}{D}$ | Momentum / Mass diffusivity | | Sherwood (Sh) | $\dfrac{k_m L}{D}$ | Convective / Diffusive mass transfer | | Thiele ($\phi$) | $L\sqrt{\dfrac{k_s}{D}}$ | Reaction / Diffusion in pores | 10. Current Research Frontiers 10.1 Machine Learning Integration - Surrogate models replacing expensive CFD for real-time process control - Physics-informed neural networks (PINNs) for solving PDEs - Digital twins for predictive maintenance and optimization 10.2 Atomic Layer Processes (ALD/ALE) - Highly transient, surface-reaction-dominated - Requires time-dependent modeling of pulse/purge cycles - Surface coverage evolution: $$ \frac{d\theta}{dt} = k_{\text{ads}} C (1-\theta) - k_{\text{des}} \theta $$ 10.3 Extreme Aspect Ratios - 3D NAND with aspect ratios > 100 - Transition to molecular flow ($\text{Kn} > 0.1$) - Transmission probability methods : $$ P = \frac{1}{1 + 3L/(8r)} $$ 10.4 EUV-Related Flows - Hydrogen buffer gas flow for debris mitigation - Tin droplet dynamics in source - Molecular outgassing and mask contamination 10.5 Plasma-Flow Coupling Low-pressure plasma processes require multi-physics: $$ \nabla \cdot \mathbf{J}_e = S_e - R_e \quad \text{(electron continuity)} $$ $$ \nabla \cdot \mathbf{J}_i = S_i - R_i \quad \text{(ion continuity)} $$ $$ \nabla \cdot (\epsilon \nabla \phi) = -e(n_i - n_e) \quad \text{(Poisson)} $$ Coupled to neutral gas Navier-Stokes equations.

flux residue, packaging

Remaining flux after reflow.

focus-exposure matrix, fem, lithography

Test different focus and dose combinations.

focused ion beam - atom probe, fib-apt, metrology

Combine FIB sample prep with APT.

focused ion beam (fib),focused ion beam,fib,metrology

Use ion beam to mill and image samples.

focused ion beam repair, fib, lithography

Use ion beam to repair mask defects.

forward bonding, packaging

Bond to die first.

four-point probe mapping, metrology

Map sheet resistance at many points.

four-point probe,metrology

Measure sheet resistance of doped or metal films.

fourier optics, computational lithography, hopkins formulation, transmission cross coefficient, tcc, socs, zernike polynomials, partial coherence, opc, ilt

# Computational Lithography Mathematics Modern semiconductor manufacturing faces a fundamental physical challenge: creating nanoscale features using light with wavelengths much larger than the target dimensions. Computational lithography bridges this gap through sophisticated mathematical techniques. 1. The Core Challenge 1.1 Resolution Limits The Rayleigh criterion defines the minimum resolvable feature size: $$ R = k_1 \cdot \frac{\lambda}{NA} $$ Where: - $R$ = minimum resolution - $k_1$ = process-dependent factor (theoretical limit: 0.25) - $\lambda$ = wavelength of light (193 nm for ArF, 13.5 nm for EUV) - $NA$ = numerical aperture of the lens system 1.2 Depth of Focus $$ DOF = k_2 \cdot \frac{\lambda}{NA^2} $$ 2. Wave Optics Fundamentals 2.1 Partially Coherent Imaging The aerial image intensity on the wafer is described by Hopkins' equation: $$ I(x, y) = \iint TCC(f_1, f_2) \cdot M(f_1) \cdot M^*(f_2) \, df_1 \, df_2 $$ Where: - $I(x, y)$ = intensity at wafer position $(x, y)$ - $TCC(f_1, f_2)$ = Transmission Cross Coefficient - $M(f)$ = Fourier transform of the mask pattern - $M^*(f)$ = complex conjugate of $M(f)$ 2.2 Transmission Cross Coefficient The TCC captures the optical system behavior: $$ TCC(f_1, f_2) = \iint S(\xi, \eta) \cdot H(f_1 + \xi, \eta) \cdot H^*(f_2 + \xi, \eta) \, d\xi \, d\eta $$ Where: - $S(\xi, \eta)$ = source intensity distribution - $H(f)$ = pupil function of the projection optics 3. Optical Proximity Correction (OPC) 3.1 The Inverse Problem OPC solves the inverse imaging problem: $$ \min_{M} \sum_{i} \left\| I(x_i, y_i; M) - I_{\text{target}}(x_i, y_i) \right\|^2 + \lambda R(M) $$ Where: - $M$ = mask pattern (optimization variable) - $I_{\text{target}}$ = desired wafer pattern - $R(M)$ = regularization term for manufacturability - $\lambda$ = regularization weight 3.2 Gradient-Based Optimization The gradient with respect to mask pixels: $$ \frac{\partial J}{\partial M_k} = \sum_{i} 2 \left( I_i - I_{\text{target},i} \right) \cdot \frac{\partial I_i}{\partial M_k} $$ 3.3 Key Correction Features - Serifs : Corner additions/subtractions to correct corner rounding - Hammerheads : Line-end extensions to prevent line shortening - Assist features : Sub-resolution features that improve main feature fidelity - Scattering bars : Improve depth of focus for isolated features 4. Inverse Lithography Technology (ILT) 4.1 Full Pixel-Based Optimization ILT treats each mask pixel as an independent variable: $$ \min_{\mathbf{m}} \left\| \mathbf{I}(\mathbf{m}) - \mathbf{I}_{\text{target}} \right\|_2^2 + \alpha \|\nabla \mathbf{m}\|_1 + \beta \text{TV}(\mathbf{m}) $$ Where: - $\mathbf{m} \in [0, 1]^N$ = continuous mask pixel values - $\text{TV}(\mathbf{m})$ = Total Variation regularization - $\|\nabla \mathbf{m}\|_1$ = sparsity-promoting term 4.2 Level-Set Formulation Mask boundaries represented implicitly: $$ \frac{\partial \phi}{\partial t} = -V \cdot |\nabla \phi| $$ Where: - $\phi(x, y)$ = level-set function - Mask region: $\{(x,y) : \phi(x,y) > 0\}$ - $V$ = velocity field derived from optimization gradient 5. Source Mask Optimization (SMO) 5.1 Joint Optimization Problem $$ \min_{S, M} \sum_{i} \left[ I(x_i, y_i; S, M) - I_{\text{target}}(x_i, y_i) \right]^2 $$ Subject to: - Source constraints: $\int S(\xi, \eta) \, d\xi \, d\eta = 1$, $S \geq 0$ - Mask manufacturability constraints 5.2 Alternating Optimization 1. Fix source $S$, optimize mask $M$ 2. Fix mask $M$, optimize source $S$ 3. Repeat until convergence 6. Rigorous Electromagnetic Simulation 6.1 Maxwell's Equations For accurate 3D mask effects: $$ \nabla \times \mathbf{E} = -\frac{\partial \mathbf{B}}{\partial t} $$ $$ \nabla \times \mathbf{H} = \mathbf{J} + \frac{\partial \mathbf{D}}{\partial t} $$ $$ \nabla \cdot \mathbf{D} = \rho $$ $$ \nabla \cdot \mathbf{B} = 0 $$ 6.2 Numerical Methods - FDTD (Finite-Difference Time-Domain) : $$ \frac{\partial E_x}{\partial t} = \frac{1}{\epsilon} \left( \frac{\partial H_z}{\partial y} - \frac{\partial H_y}{\partial z} \right) $$ - RCWA (Rigorous Coupled-Wave Analysis) : Expansion in Fourier harmonics $$ \mathbf{E}(x, y, z) = \sum_{m,n} \mathbf{E}_{mn}(z) \cdot e^{i(k_{xm}x + k_{yn}y)} $$ 7. Photoresist Modeling 7.1 Dill Model for Absorption $$ I(z) = I_0 \exp\left( -\int_0^z \alpha(z') \, dz' \right) $$ Where absorption coefficient: $$ \alpha = A \cdot M + B $$ - $A$ = bleachable absorption - $B$ = non-bleachable absorption - $M$ = photoactive compound concentration 7.2 Exposure Kinetics $$ \frac{dM}{dt} = -C \cdot I \cdot M $$ - $C$ = exposure rate constant 7.3 Acid Diffusion (Post-Exposure Bake) Reaction-diffusion equation: $$ \frac{\partial [H^+]}{\partial t} = D \nabla^2 [H^+] - k_{\text{loss}} [H^+] $$ Where: - $D$ = diffusion coefficient (temperature-dependent) - $k_{\text{loss}}$ = acid loss rate 7.4 Development Rate Mack model: $$ r = r_{\max} \cdot \frac{(a+1)(1-m)^n}{a + (1-m)^n} + r_{\min} $$ Where $m$ = normalized remaining PAC concentration. 8. Stochastic Effects 8.1 Photon Shot Noise Photon count follows Poisson distribution: $$ P(n) = \frac{\lambda^n e^{-\lambda}}{n!} $$ Standard deviation: $$ \sigma_n = \sqrt{\bar{n}} $$ 8.2 Line Edge Roughness (LER) Power spectral density: $$ PSD(f) = \frac{A}{1 + (2\pi f \xi)^{2\alpha}} $$ Where: - $\xi$ = correlation length - $\alpha$ = roughness exponent - $A$ = amplitude 8.3 Stochastic Defect Probability For extreme ultraviolet (EUV): $$ P_{\text{defect}} = 1 - \exp\left( -\frac{A_{\text{pixel}}}{N_{\text{photons}} \cdot \eta} \right) $$ 9. Multi-Patterning Mathematics 9.1 Graph Coloring Formulation Given conflict graph $G = (V, E)$: - $V$ = features - $E$ = edges connecting features with spacing $< \text{min}_{\text{space}}$ Find $k$-coloring $c: V \rightarrow \{1, 2, \ldots, k\}$ such that: $$ \forall (u, v) \in E: c(u) \neq c(v) $$ 9.2 Integer Linear Programming Formulation $$ \min \sum_{(i,j) \in E} w_{ij} \cdot y_{ij} $$ Subject to: $$ \sum_{k=1}^{K} x_{ik} = 1 \quad \forall i \in V $$ $$ x_{ik} + x_{jk} - y_{ij} \leq 1 \quad \forall (i,j) \in E, \forall k $$ $$ x_{ik}, y_{ij} \in \{0, 1\} $$ 10. EUV Lithography Specific Mathematics 10.1 Multilayer Mirror Reflectivity Bragg condition for Mo/Si multilayers: $$ 2d \sin\theta = n\lambda $$ Reflectivity at each interface: $$ r = \frac{n_1 - n_2}{n_1 + n_2} $$ Total reflectivity (matrix method): $$ \mathbf{M}_{\text{total}} = \prod_{j=1}^{N} \mathbf{M}_j $$ 10.2 Mask 3D Effects Shadow effect for off-axis illumination: $$ \Delta x = h_{\text{absorber}} \cdot \tan(\theta_{\text{chief ray}}) $$ 11. Machine Learning in Computational Lithography 11.1 Neural Network as Fast Surrogate Model $$ I_{\text{predicted}} = f_{\theta}(M) $$ Where $f_{\theta}$ is a trained CNN, training minimizes: $$ \mathcal{L} = \sum_{i} \left\| f_{\theta}(M_i) - I_{\text{rigorous}}(M_i) \right\|^2 $$ 11.2 Physics-Informed Neural Networks Loss function incorporating physics: $$ \mathcal{L} = \mathcal{L}_{\text{data}} + \lambda_{\text{physics}} \mathcal{L}_{\text{physics}} $$ Where: $$ \mathcal{L}_{\text{physics}} = \left\| \nabla^2 E + k^2 \epsilon E \right\|^2 $$ 12. Key Mathematical Techniques Summary | Technique | Application | |-----------|-------------| | Fourier Analysis | Optical imaging, frequency domain calculations | | Inverse Problems | OPC, ILT, metrology | | Non-convex Optimization | Mask optimization, SMO | | Partial Differential Equations | EM simulation, resist diffusion | | Graph Theory | Multi-patterning decomposition | | Stochastic Processes | Shot noise, LER modeling | | Linear Algebra | Large sparse system solutions | | Machine Learning | Fast surrogate models, pattern recognition | 13. Computational Complexity 13.1 Full-Chip OPC Scale - Features : $\sim 10^{12}$ polygon edges - Variables : $\sim 10^8$ optimization parameters - Compute time : hours to days on $1000+$ CPU cores - Memory : terabytes of working data 13.2 Complexity Classes | Operation | Complexity | |-----------|------------| | FFT for imaging | $O(N \log N)$ | | RCWA per wavelength | $O(M^3)$ where $M$ = harmonics | | ILT optimization | $O(N \cdot k)$ where $k$ = iterations | | Graph coloring | NP-complete (general case) | Notation: | Symbol | Meaning | |--------|---------| | $\lambda$ | Wavelength | | $NA$ | Numerical Aperture | | $TCC$ | Transmission Cross Coefficient | | $M(f)$ | Mask Fourier transform | | $I(x,y)$ | Intensity at wafer | | $\phi$ | Level-set function | | $D$ | Diffusion coefficient | | $\sigma$ | Standard deviation | | $PSD$ | Power Spectral Density |

fourier transform infrared spectroscopy (ftir),fourier transform infrared spectroscopy,ftir,metrology

Identify chemical bonds.

fpga alternative,chip design hobby,logisim,ngspice

# FPGA Mathematical Modeling A comprehensive guide to mathematical techniques, representations, and methodologies for implementing algorithms in Field-Programmable Gate Arrays. ## 1. Number Representation Models ### 1.1 Fixed-Point Representation Fixed-point representation maintains the decimal point within a fixed position, enabling straightforward arithmetic operations. #### Notation Format - **Q-format:** $Q(m, n)$ or $(x, y)$ - $m$ or $x$ = number of integer bits - $n$ or $y$ = number of fractional bits - Total bits = $m + n + 1$ (including sign bit for signed) #### Mathematical Formulations **Integer bits calculation:** $$ \text{Integer Bits} = \lceil \log_2(\text{max\_value}) \rceil $$ **Value representation:** $$ \text{Value} = \sum_{i=-n}^{m-1} b_i \cdot 2^i $$ Where $b_i \in \{0, 1\}$ for unsigned, or using two's complement for signed. **Resolution (LSB value):** $$ \text{Resolution} = 2^{-n} $$ **Range for signed Q(m,n):** $$ \text{Range} = \left[ -2^{m-1}, \, 2^{m-1} - 2^{-n} \right] $$ #### Fixed-Point Arithmetic Rules - **Addition/Subtraction:** - Decimal points must be aligned - Result: $\max(m_1, m_2) + 1$ integer bits, $\max(n_1, n_2)$ fractional bits $$ Q(m_1, n_1) \pm Q(m_2, n_2) \rightarrow Q(\max(m_1, m_2) + 1, \max(n_1, n_2)) $$ - **Multiplication:** $$ Q(m_1, n_1) \times Q(m_2, n_2) \rightarrow Q(m_1 + m_2, n_1 + n_2) $$ - **Scaling by powers of 2:** - Left shift by $k$: multiply by $2^k$ - Right shift by $k$: divide by $2^k$ #### Example Calculation For a 16-bit signed Q8.8 format: - Integer range: $[-128, 127]$ - Fractional resolution: $2^{-8} = 0.00390625$ - Example: Binary `0000_0001.1000_0000` = $1.5_{10}$ ### 1.2 Floating-Point Representation #### IEEE 754 Standard Formats | Format | Total Bits | Sign | Exponent | Mantissa | |--------|-----------|------|----------|----------| | Half | 16 | 1 | 5 | 10 | | Single | 32 | 1 | 8 | 23 | | Double | 64 | 1 | 11 | 52 | #### Mathematical Representation $$ \text{Value} = (-1)^{s} \times (1 + M) \times 2^{E - \text{bias}} $$ Where: - $s$ = sign bit - $M$ = mantissa (fractional part) - $E$ = exponent - $\text{bias} = 2^{k-1} - 1$ (where $k$ = exponent bits) #### Bias Values - Half precision: $\text{bias} = 15$ - Single precision: $\text{bias} = 127$ - Double precision: $\text{bias} = 1023$ #### Resource Comparison | Operation | Fixed-Point LUTs | Floating-Point LUTs | Ratio | |-----------|-----------------|---------------------|-------| | Add (32-bit) | ~32 | ~400 | ~12.5× | | Multiply (32-bit) | ~64 (DSP) | ~600 | ~9× | | Divide (32-bit) | ~200 | ~1200 | ~6× | ## 2. Mathematical Function Implementation ### 2.1 CORDIC Algorithm **CORDIC** (COordinate Rotation DIgital Computer) computes trigonometric, hyperbolic, and other functions using only shift-add operations. #### Core Rotation Matrix $$ \begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} $$ #### CORDIC Simplification By restricting $\tan(\theta_i) = 2^{-i}$, the rotation becomes: $$ \begin{bmatrix} x' \\ y' \end{bmatrix} = \cos\theta_i \begin{bmatrix} 1 & -\sigma_i 2^{-i} \\ \sigma_i 2^{-i} & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} $$ Where $\sigma_i \in \{-1, +1\}$ is the rotation direction. #### Iterative Equations $$ \begin{aligned} x_{i+1} &= x_i - \sigma_i \cdot y_i \cdot 2^{-i} \\ y_{i+1} &= y_i + \sigma_i \cdot x_i \cdot 2^{-i} \\ z_{i+1} &= z_i - \sigma_i \cdot \arctan(2^{-i}) \end{aligned} $$ #### CORDIC Gain (Scale Factor) $$ K_n = \prod_{i=0}^{n-1} \cos(\arctan(2^{-i})) = \prod_{i=0}^{n-1} \frac{1}{\sqrt{1 + 2^{-2i}}} $$ For $n \rightarrow \infty$: $$ K_\infty \approx 0.6072529350088812561694 $$ #### CORDIC Modes | Mode | Condition | Computes | |------|-----------|----------| | Rotation | $\sigma_i = \text{sign}(z_i)$ | $x_n = K(x_0\cos z_0 - y_0\sin z_0)$ | | Vectoring | $\sigma_i = -\text{sign}(y_i)$ | $z_n = z_0 + \arctan(y_0/x_0)$ | #### CORDIC Functions Table | Function | Mode | Initial Values | Result | |----------|------|----------------|--------| | $\sin(\theta)$ | Rotation | $x_0=1/K$, $y_0=0$, $z_0=\theta$ | $y_n$ | | $\cos(\theta)$ | Rotation | $x_0=1/K$, $y_0=0$, $z_0=\theta$ | $x_n$ | | $\arctan(a)$ | Vectoring | $x_0=1$, $y_0=a$, $z_0=0$ | $z_n$ | | $\sqrt{x^2+y^2}$ | Vectoring | $x_0=x$, $y_0=y$, $z_0=0$ | $K \cdot x_n$ | #### CORDIC VHDL Pseudo-Structure ```vhdl -- CORDIC iteration stage process(clk) begin if rising_edge(clk) then if z(i) >= 0 then -- σ = +1 x(i+1) <= x(i) - shift_right(y(i), i); y(i+1) <= y(i) + shift_right(x(i), i); z(i+1) <= z(i) - atan_table(i); else -- σ = -1 x(i+1) <= x(i) + shift_right(y(i), i); y(i+1) <= y(i) - shift_right(x(i), i); z(i+1) <= z(i) + atan_table(i); end if; end if; end process; ``` ### 2.2 Polynomial Approximation #### Taylor Series Expansion $$ f(x) = \sum_{n=0}^{N} \frac{f^{(n)}(a)}{n!}(x-a)^n + R_N(x) $$ Where $R_N(x)$ is the remainder term. #### Common Taylor Expansions **Sine function:** $$ \sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \cdots = \sum_{n=0}^{\infty} \frac{(-1)^n x^{2n+1}}{(2n+1)!} $$ **Cosine function:** $$ \cos(x) = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \frac{x^6}{6!} + \cdots = \sum_{n=0}^{\infty} \frac{(-1)^n x^{2n}}{(2n)!} $$ **Exponential function:** $$ e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots = \sum_{n=0}^{\infty} \frac{x^n}{n!} $$ **Natural logarithm (for $|x| < 1$):** $$ \ln(1+x) = x - \frac{x^2}{2} + \frac{x^3}{3} - \frac{x^4}{4} + \cdots = \sum_{n=1}^{\infty} \frac{(-1)^{n+1} x^n}{n} $$ #### Horner's Method for Evaluation For polynomial $P(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n$: $$ P(x) = a_0 + x(a_1 + x(a_2 + x(a_3 + \cdots + x(a_{n-1} + x \cdot a_n)\cdots))) $$ **Advantages:** - Reduces multiplications from $\frac{n(n+1)}{2}$ to $n$ - Reduces additions from $n$ to $n$ - Enables pipelining #### Minimax Polynomial Approximation Minimizes the maximum error over an interval: $$ \min_{p \in P_n} \max_{x \in [a,b]} |f(x) - p(x)| $$ Where $P_n$ is the set of polynomials of degree $\leq n$. **Chebyshev approximation** often provides near-minimax results: $$ T_n(x) = \cos(n \cdot \arccos(x)) $$ ### 2.3 Lookup Table (LUT) Methods #### Direct Table Lookup $$ f(x) \approx \text{LUT}[\lfloor x \cdot 2^n \rfloor] $$ **Memory requirement:** $$ \text{Memory (bits)} = 2^{\text{input\_bits}} \times \text{output\_bits} $$ #### Linear Interpolation $$ f(x) \approx f(x_0) + \frac{f(x_1) - f(x_0)}{x_1 - x_0} \cdot (x - x_0) $$ Simplified for uniform spacing ($\Delta x = x_1 - x_0$): $$ f(x) \approx f(x_0) + \frac{\Delta f}{\Delta x} \cdot (x - x_0) $$ #### Bipartite Table Method Decomposes the function into two smaller tables: $$ f(x) \approx T_1(x_{\text{MSB}}) + T_2(x_{\text{MSB}}, x_{\text{LSB}}) $$ **Memory savings:** $$ \text{Bipartite Memory} \approx 2 \cdot 2^{n/2} \ll 2^n \text{ (Direct)} $$ #### Piecewise Polynomial with LUT $$ f(x) = \sum_{k=0}^{d} c_k[i] \cdot (x - x_i)^k \quad \text{for } x \in [x_i, x_{i+1}) $$ Where coefficients $c_k[i]$ are stored in lookup tables. ## 3. System-Level Mathematical Modeling ### 3.1 Boolean Algebra Model #### Basic Operations | Operation | Symbol | VHDL | Verilog | |-----------|--------|------|---------| | AND | $A \cdot B$ | `A and B` | `A & B` | | OR | $A + B$ | `A or B` | `A ∣ B` | | NOT | $\overline{A}$ | `not A` | `~A` | | XOR | $A \oplus B$ | `A xor B` | `A ^ B` | | NAND | $\overline{A \cdot B}$ | `A nand B` | `~(A & B)` | | NOR | $\overline{A + B}$ | `A nor B` | `~(A ∣ B)` | #### Boolean Algebra Laws **Identity:** $$ A + 0 = A \quad \text{and} \quad A \cdot 1 = A $$ **Complement:** $$ A + \overline{A} = 1 \quad \text{and} \quad A \cdot \overline{A} = 0 $$ **De Morgan's Theorems:** $$ \overline{A + B} = \overline{A} \cdot \overline{B} \quad \text{and} \quad \overline{A \cdot B} = \overline{A} + \overline{B} $$ **Absorption:** $$ A + A \cdot B = A \quad \text{and} \quad A \cdot (A + B) = A $$ #### Sum of Products (SOP) Form $$ F = \sum_{i} m_i \quad \text{where } m_i \text{ are minterms} $$ **Example:** $$ F(A, B, C) = \overline{A} \cdot B \cdot C + A \cdot \overline{B} \cdot C + A \cdot B \cdot \overline{C} $$ ### 3.2 Finite State Machine (FSM) Models #### Moore Machine $$ \begin{aligned} \text{Next State:} \quad S_{n+1} &= \delta(S_n, I_n) \\ \text{Output:} \quad O_n &= \lambda(S_n) \end{aligned} $$ #### Mealy Machine $$ \begin{aligned} \text{Next State:} \quad S_{n+1} &= \delta(S_n, I_n) \\ \text{Output:} \quad O_n &= \lambda(S_n, I_n) \end{aligned} $$ #### State Encoding | Encoding | States | Flip-Flops | Transitions | |----------|--------|------------|-------------| | Binary | $N$ | $\lceil \log_2 N \rceil$ | Complex | | One-Hot | $N$ | $N$ | Simple | | Gray | $N$ | $\lceil \log_2 N \rceil$ | Low power | ### 3.3 Dataflow Graph Model #### Directed Acyclic Graph (DAG) Representation - **Nodes ($V$):** Operations (add, multiply, etc.) - **Edges ($E$):** Data dependencies - **Graph:** $G = (V, E)$ #### Critical Path Length $$ T_{\text{critical}} = \max_{\text{paths } p} \sum_{v \in p} t_v $$ Where $t_v$ is the delay of operation $v$. #### Parallelism Metrics **Available parallelism:** $$ P = \frac{\sum_{v \in V} t_v}{T_{\text{critical}}} $$ **Resource-constrained schedule length:** $$ T_{\text{schedule}} \geq \max\left( T_{\text{critical}}, \left\lceil \frac{\sum_{v \in V} t_v}{R} \right\rceil \right) $$ Where $R$ is the number of available resources. ### 3.4 State-Space Models #### Continuous-Time State-Space $$ \begin{aligned} \dot{\mathbf{x}}(t) &= \mathbf{A}\mathbf{x}(t) + \mathbf{B}\mathbf{u}(t) \\ \mathbf{y}(t) &= \mathbf{C}\mathbf{x}(t) + \mathbf{D}\mathbf{u}(t) \end{aligned} $$ #### Discrete-Time State-Space (for FPGA) $$ \begin{aligned} \mathbf{x}[k+1] &= \mathbf{A}_d\mathbf{x}[k] + \mathbf{B}_d\mathbf{u}[k] \\ \mathbf{y}[k] &= \mathbf{C}_d\mathbf{x}[k] + \mathbf{D}_d\mathbf{u}[k] \end{aligned} $$ #### Discretization (Zero-Order Hold) $$ \mathbf{A}_d = e^{\mathbf{A}T_s} \quad \text{and} \quad \mathbf{B}_d = \mathbf{A}^{-1}(e^{\mathbf{A}T_s} - \mathbf{I})\mathbf{B} $$ Where $T_s$ is the sampling period. ## 4. Numerical Methods for FPGA ### 4.1 ODE Solvers #### Forward Euler Method $$ y_{n+1} = y_n + h \cdot f(t_n, y_n) $$ **Local truncation error:** $O(h^2)$ **Global error:** $O(h)$ #### Backward Euler Method $$ y_{n+1} = y_n + h \cdot f(t_{n+1}, y_{n+1}) $$ Requires solving implicit equation (more stable but complex). #### Trapezoidal Method (Crank-Nicolson) $$ y_{n+1} = y_n + \frac{h}{2}\left[f(t_n, y_n) + f(t_{n+1}, y_{n+1})\right] $$ **Local truncation error:** $O(h^3)$ #### 4th Order Runge-Kutta (RK4) $$ y_{n+1} = y_n + \frac{h}{6}(k_1 + 2k_2 + 2k_3 + k_4) $$ Where: $$ \begin{aligned} k_1 &= f(t_n, y_n) \\ k_2 &= f\left(t_n + \frac{h}{2}, y_n + \frac{h}{2}k_1\right) \\ k_3 &= f\left(t_n + \frac{h}{2}, y_n + \frac{h}{2}k_2\right) \\ k_4 &= f(t_n + h, y_n + h \cdot k_3) \end{aligned} $$ **Local truncation error:** $O(h^5)$ **Global error:** $O(h^4)$ #### Method Comparison for FPGA | Method | Accuracy | Stability | FPGA Resources | Latency | |--------|----------|-----------|----------------|---------| | Forward Euler | $O(h)$ | Conditional | Low | 1 mult | | Backward Euler | $O(h)$ | Unconditional | Medium | Iterative | | Trapezoidal | $O(h^2)$ | A-stable | Medium | Iterative | | RK4 | $O(h^4)$ | Conditional | High | 4 stages | ### 4.2 Linear Algebra Operations #### Matrix-Vector Multiplication $$ \mathbf{y} = \mathbf{A}\mathbf{x} \quad \Rightarrow \quad y_i = \sum_{j=1}^{n} a_{ij} x_j $$ **FPGA Implementation:** - Fully parallel: $n^2$ multipliers, $O(n)$ adder tree - Systolic array: $n$ multipliers, $n$ cycles #### Matrix-Matrix Multiplication $$ \mathbf{C} = \mathbf{A}\mathbf{B} \quad \Rightarrow \quad c_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj} $$ **Computational complexity:** $O(n^3)$ multiplications #### LU Decomposition $$ \mathbf{A} = \mathbf{L}\mathbf{U} $$ **FPGA considerations:** - Division required for pivot elements - Sequential dependencies limit parallelism ## 5. Timing and Resource Models ### 5.1 Static Timing Analysis (STA) #### Setup Time Constraint $$ T_{\text{clk-to-q}} + T_{\text{logic}} + T_{\text{routing}} + T_{\text{setup}} \leq T_{\text{clk}} $$ Rearranged: $$ T_{\text{logic}} + T_{\text{routing}} \leq T_{\text{clk}} - T_{\text{clk-to-q}} - T_{\text{setup}} $$ #### Hold Time Constraint $$ T_{\text{clk-to-q}} + T_{\text{logic}} + T_{\text{routing}} \geq T_{\text{hold}} $$ #### Slack Calculations **Setup slack:** $$ \text{Slack}_{\text{setup}} = T_{\text{clk}} - T_{\text{clk-to-q}} - T_{\text{setup}} - T_{\text{data\_path}} $$ **Hold slack:** $$ \text{Slack}_{\text{hold}} = T_{\text{data\_path}} + T_{\text{clk-to-q}} - T_{\text{hold}} $$ **Condition for valid timing:** $$ \text{Slack} \geq 0 $$ #### Maximum Operating Frequency $$ f_{\text{max}} = \frac{1}{T_{\text{clk-to-q}} + T_{\text{logic}} + T_{\text{routing}} + T_{\text{setup}}} $$ ### 5.2 Resource Utilization Models #### LUT Utilization For $n$-input boolean function: $$ \text{LUTs required} = \left\lceil \frac{n}{k} \right\rceil^{\text{levels}} $$ Where $k$ is the LUT input count (typically 4-6). #### DSP Slice Utilization For multiplication $A \times B$: $$ \text{DSP slices} = \left\lceil \frac{\text{bitwidth}_A}{18} \right\rceil \times \left\lceil \frac{\text{bitwidth}_B}{18} \right\rceil $$ (For typical 18×18 DSP multipliers) #### Block RAM Utilization $$ \text{BRAMs} = \left\lceil \frac{\text{depth}}{2^{k}} \right\rceil \times \left\lceil \frac{\text{width}}{w} \right\rceil $$ Where $k$ and $w$ are BRAM address and data widths. #### Register Utilization $$ \text{Registers} = \sum_{\text{signals}} \text{bitwidth} \times \text{pipeline\_stages} $$ ### 5.3 Power Models #### Dynamic Power $$ P_{\text{dynamic}} = \alpha \cdot C_L \cdot V_{DD}^2 \cdot f $$ Where: - $\alpha$ = switching activity factor - $C_L$ = load capacitance - $V_{DD}$ = supply voltage - $f$ = clock frequency #### Static Power $$ P_{\text{static}} = I_{\text{leakage}} \cdot V_{DD} $$ #### Total Power $$ P_{\text{total}} = P_{\text{dynamic}} + P_{\text{static}} $$ ## 6. High-Level Synthesis Models ### 6.1 Quantization Models #### Fixed-Point Quantization $$ x_q = \text{round}\left(\frac{x}{2^{-n}}\right) \cdot 2^{-n} $$ Where $n$ is the number of fractional bits. #### Quantization Error $$ \epsilon_q = x - x_q $$ **For rounding:** $$ -\frac{2^{-n}}{2} \leq \epsilon_q < \frac{2^{-n}}{2} $$ **Quantization noise power (uniform distribution):** $$ \sigma_q^2 = \frac{(2^{-n})^2}{12} = \frac{2^{-2n}}{12} $$ #### Signal-to-Quantization-Noise Ratio (SQNR) $$ \text{SQNR (dB)} = 10 \log_{10}\left(\frac{\sigma_x^2}{\sigma_q^2}\right) \approx 6.02n + 1.76 \text{ dB} $$ For full-scale sinusoidal signal with $n$ fractional bits. ### 6.2 Scheduling Models #### ASAP (As Soon As Possible) Scheduling $$ t_{\text{ASAP}}(v) = \max_{u \in \text{pred}(v)} \left( t_{\text{ASAP}}(u) + d_u \right) $$ Where $d_u$ is the delay of operation $u$. #### ALAP (As Late As Possible) Scheduling $$ t_{\text{ALAP}}(v) = \min_{w \in \text{succ}(v)} \left( t_{\text{ALAP}}(w) - d_v \right) $$ #### Mobility (Slack) $$ \text{Mobility}(v) = t_{\text{ALAP}}(v) - t_{\text{ASAP}}(v) $$ Operations with zero mobility are on the critical path. ### 6.3 Resource Binding Models #### Resource Sharing Constraint $$ \sum_{v: t(v) = t} r(v, k) \leq R_k \quad \forall t, k $$ Where: - $r(v, k)$ = 1 if operation $v$ uses resource type $k$ - $R_k$ = number of available resources of type $k$ #### Conflict Graph Two operations conflict if they: 1. Execute in the same time step 2. Require the same resource type **Minimum resources = Chromatic number of conflict graph** ## 7. Optimization Models ### 7.1 Pipeline Optimization #### Pipeline Stages $$ \text{Stages} = \left\lceil \frac{T_{\text{combinational}}}{T_{\text{target\_clock}}} \right\rceil $$ #### Throughput $$ \text{Throughput} = \frac{f_{\text{clk}}}{\text{II}} $$ Where II = Initiation Interval (cycles between new inputs). #### Latency $$ \text{Latency} = \text{Stages} \times T_{\text{clk}} $$ #### Pipeline Efficiency $$ \eta = \frac{\text{Ideal Throughput}}{\text{Actual Throughput}} = \frac{1}{\text{II}} $$ ### 7.2 Loop Optimization #### Loop Unrolling For unroll factor $U$: $$ \text{New iterations} = \left\lceil \frac{N}{U} \right\rceil $$ **Resource increase:** ~$U \times$ original **Throughput increase:** up to $U \times$ #### Loop Pipelining $$ \text{II}_{\text{achieved}} = \max(\text{II}_{\text{resource}}, \text{II}_{\text{recurrence}}, \text{II}_{\text{target}}) $$ Where: - $\text{II}_{\text{resource}}$ = resource-constrained II - $\text{II}_{\text{recurrence}}$ = loop-carried dependency II #### Loop Tiling For tile size $T$: $$ \text{for } (ii = 0; ii < N; ii += T) \\ \quad \text{for } (i = ii; i < \min(ii + T, N); i++) $$ **Memory accesses reduced by factor of** $T$ for reused data. ### 7.3 Area-Time Trade-offs #### Area-Time Product $$ \text{AT} = \text{Area} \times \text{Time} $$ #### Pareto Optimality A design $D_1$ dominates $D_2$ if: $$ \text{Area}(D_1) \leq \text{Area}(D_2) \quad \text{AND} \quad \text{Time}(D_1) \leq \text{Time}(D_2) $$ with at least one strict inequality. #### Cost Function $$ \text{Cost} = \alpha \cdot \text{Area} + \beta \cdot \text{Latency} + \gamma \cdot \text{Power} $$ Where $\alpha + \beta + \gamma = 1$ are weighting factors. ## 8. Practical Examples ### 8.1 FIR Filter Implementation #### Mathematical Definition $$ y[n] = \sum_{k=0}^{N-1} h[k] \cdot x[n-k] $$ #### Direct Form Implementation **Resources:** - Multipliers: $N$ - Adders: $N-1$ - Registers: $N-1$ (delay line) **Accumulator sizing:** $$ \text{Acc bits} = \text{input bits} + \text{coef bits} + \lceil \log_2(N) \rceil $$ #### Transpose Form (Better for FPGA) **Advantages:** - All multipliers fed by same input - Shorter critical path - Better for high-speed designs ### 8.2 FFT Implementation #### Radix-2 DIT FFT $$ X[k] = \sum_{n=0}^{N-1} x[n] \cdot W_N^{nk} $$ Where $W_N = e^{-j2\pi/N}$ (twiddle factor). #### Butterfly Operation $$ \begin{aligned} X[k] &= A + W_N^k \cdot B \\ X[k + N/2] &= A - W_N^k \cdot B \end{aligned} $$ #### Resource Requirements | Architecture | Multipliers | Memory | Throughput | |--------------|-------------|--------|------------| | Radix-2 Serial | 1 complex | $N$ | $N \log_2 N$ cycles | | Radix-2 Parallel | $N/2$ complex | $N$ | $\log_2 N$ cycles | | Pipelined | $\log_2 N$ complex | $2N$ | 1 sample/cycle | ### 8.3 PID Controller #### Continuous-Time PID $$ u(t) = K_p \cdot e(t) + K_i \int_0^t e(\tau) d\tau + K_d \frac{de(t)}{dt} $$ #### Discrete-Time PID (Tustin Transform) $$ u[n] = u[n-1] + a_0 \cdot e[n] + a_1 \cdot e[n-1] + a_2 \cdot e[n-2] $$ Where: $$ \begin{aligned} a_0 &= K_p + K_i \frac{T_s}{2} + K_d \frac{2}{T_s} \\ a_1 &= -K_p + K_i \frac{T_s}{2} - K_d \frac{4}{T_s} \\ a_2 &= K_d \frac{2}{T_s} \end{aligned} $$ #### FPGA Implementation Considerations - **Coefficient quantization:** affects stability - **Accumulator overflow:** requires saturation logic - **Anti-windup:** needed for integrator term ## Common Mathematical Constants for FPGA | Constant | Value | Q16.16 Hex | |----------|-------|------------| | $\pi$ | 3.14159265 | `0x0003_243F` | | $e$ | 2.71828183 | `0x0002_B7E1` | | $\sqrt{2}$ | 1.41421356 | `0x0001_6A09` | | $\ln(2)$ | 0.69314718 | `0x0000_B172` | | $1/\pi$ | 0.31830989 | `0x0000_517C` | | CORDIC $K$ | 0.60725293 | `0x0000_9B74` | ## FPGA Vendor DSP Specifications | Vendor | Family | DSP Name | Multiplier Size | Pre-adder | |--------|--------|----------|-----------------|-----------| | AMD/Xilinx | 7 Series | DSP48E1 | 25×18 | Yes | | AMD/Xilinx | UltraScale | DSP48E2 | 27×18 | Yes | | Intel | Stratix 10 | DSP Block | 18×19 or 27×27 | Yes | | Lattice | ECP5 | MULT18X18D | 18×18 | No |

fpga alternative,chip design hobby,logisim,ngspice

# FPGA Mathematical Modeling A comprehensive guide to mathematical techniques, representations, and methodologies for implementing algorithms in Field-Programmable Gate Arrays. ## 1. Number Representation Models ### 1.1 Fixed-Point Representation Fixed-point representation maintains the decimal point within a fixed position, enabling straightforward arithmetic operations. #### Notation Format - **Q-format:** $Q(m, n)$ or $(x, y)$ - $m$ or $x$ = number of integer bits - $n$ or $y$ = number of fractional bits - Total bits = $m + n + 1$ (including sign bit for signed) #### Mathematical Formulations **Integer bits calculation:** $$ \text{Integer Bits} = \lceil \log_2(\text{max\_value}) \rceil $$ **Value representation:** $$ \text{Value} = \sum_{i=-n}^{m-1} b_i \cdot 2^i $$ Where $b_i \in \{0, 1\}$ for unsigned, or using two's complement for signed. **Resolution (LSB value):** $$ \text{Resolution} = 2^{-n} $$ **Range for signed Q(m,n):** $$ \text{Range} = \left[ -2^{m-1}, \, 2^{m-1} - 2^{-n} \right] $$ #### Fixed-Point Arithmetic Rules - **Addition/Subtraction:** - Decimal points must be aligned - Result: $\max(m_1, m_2) + 1$ integer bits, $\max(n_1, n_2)$ fractional bits $$ Q(m_1, n_1) \pm Q(m_2, n_2) \rightarrow Q(\max(m_1, m_2) + 1, \max(n_1, n_2)) $$ - **Multiplication:** $$ Q(m_1, n_1) \times Q(m_2, n_2) \rightarrow Q(m_1 + m_2, n_1 + n_2) $$ - **Scaling by powers of 2:** - Left shift by $k$: multiply by $2^k$ - Right shift by $k$: divide by $2^k$ #### Example Calculation For a 16-bit signed Q8.8 format: - Integer range: $[-128, 127]$ - Fractional resolution: $2^{-8} = 0.00390625$ - Example: Binary `0000_0001.1000_0000` = $1.5_{10}$ ### 1.2 Floating-Point Representation #### IEEE 754 Standard Formats | Format | Total Bits | Sign | Exponent | Mantissa | |--------|-----------|------|----------|----------| | Half | 16 | 1 | 5 | 10 | | Single | 32 | 1 | 8 | 23 | | Double | 64 | 1 | 11 | 52 | #### Mathematical Representation $$ \text{Value} = (-1)^{s} \times (1 + M) \times 2^{E - \text{bias}} $$ Where: - $s$ = sign bit - $M$ = mantissa (fractional part) - $E$ = exponent - $\text{bias} = 2^{k-1} - 1$ (where $k$ = exponent bits) #### Bias Values - Half precision: $\text{bias} = 15$ - Single precision: $\text{bias} = 127$ - Double precision: $\text{bias} = 1023$ #### Resource Comparison | Operation | Fixed-Point LUTs | Floating-Point LUTs | Ratio | |-----------|-----------------|---------------------|-------| | Add (32-bit) | ~32 | ~400 | ~12.5× | | Multiply (32-bit) | ~64 (DSP) | ~600 | ~9× | | Divide (32-bit) | ~200 | ~1200 | ~6× | ## 2. Mathematical Function Implementation ### 2.1 CORDIC Algorithm **CORDIC** (COordinate Rotation DIgital Computer) computes trigonometric, hyperbolic, and other functions using only shift-add operations. #### Core Rotation Matrix $$ \begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} $$ #### CORDIC Simplification By restricting $\tan(\theta_i) = 2^{-i}$, the rotation becomes: $$ \begin{bmatrix} x' \\ y' \end{bmatrix} = \cos\theta_i \begin{bmatrix} 1 & -\sigma_i 2^{-i} \\ \sigma_i 2^{-i} & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} $$ Where $\sigma_i \in \{-1, +1\}$ is the rotation direction. #### Iterative Equations $$ \begin{aligned} x_{i+1} &= x_i - \sigma_i \cdot y_i \cdot 2^{-i} \\ y_{i+1} &= y_i + \sigma_i \cdot x_i \cdot 2^{-i} \\ z_{i+1} &= z_i - \sigma_i \cdot \arctan(2^{-i}) \end{aligned} $$ #### CORDIC Gain (Scale Factor) $$ K_n = \prod_{i=0}^{n-1} \cos(\arctan(2^{-i})) = \prod_{i=0}^{n-1} \frac{1}{\sqrt{1 + 2^{-2i}}} $$ For $n \rightarrow \infty$: $$ K_\infty \approx 0.6072529350088812561694 $$ #### CORDIC Modes | Mode | Condition | Computes | |------|-----------|----------| | Rotation | $\sigma_i = \text{sign}(z_i)$ | $x_n = K(x_0\cos z_0 - y_0\sin z_0)$ | | Vectoring | $\sigma_i = -\text{sign}(y_i)$ | $z_n = z_0 + \arctan(y_0/x_0)$ | #### CORDIC Functions Table | Function | Mode | Initial Values | Result | |----------|------|----------------|--------| | $\sin(\theta)$ | Rotation | $x_0=1/K$, $y_0=0$, $z_0=\theta$ | $y_n$ | | $\cos(\theta)$ | Rotation | $x_0=1/K$, $y_0=0$, $z_0=\theta$ | $x_n$ | | $\arctan(a)$ | Vectoring | $x_0=1$, $y_0=a$, $z_0=0$ | $z_n$ | | $\sqrt{x^2+y^2}$ | Vectoring | $x_0=x$, $y_0=y$, $z_0=0$ | $K \cdot x_n$ | #### CORDIC VHDL Pseudo-Structure ```vhdl -- CORDIC iteration stage process(clk) begin if rising_edge(clk) then if z(i) >= 0 then -- σ = +1 x(i+1) <= x(i) - shift_right(y(i), i); y(i+1) <= y(i) + shift_right(x(i), i); z(i+1) <= z(i) - atan_table(i); else -- σ = -1 x(i+1) <= x(i) + shift_right(y(i), i); y(i+1) <= y(i) - shift_right(x(i), i); z(i+1) <= z(i) + atan_table(i); end if; end if; end process; ``` ### 2.2 Polynomial Approximation #### Taylor Series Expansion $$ f(x) = \sum_{n=0}^{N} \frac{f^{(n)}(a)}{n!}(x-a)^n + R_N(x) $$ Where $R_N(x)$ is the remainder term. #### Common Taylor Expansions **Sine function:** $$ \sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \cdots = \sum_{n=0}^{\infty} \frac{(-1)^n x^{2n+1}}{(2n+1)!} $$ **Cosine function:** $$ \cos(x) = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \frac{x^6}{6!} + \cdots = \sum_{n=0}^{\infty} \frac{(-1)^n x^{2n}}{(2n)!} $$ **Exponential function:** $$ e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots = \sum_{n=0}^{\infty} \frac{x^n}{n!} $$ **Natural logarithm (for $|x| < 1$):** $$ \ln(1+x) = x - \frac{x^2}{2} + \frac{x^3}{3} - \frac{x^4}{4} + \cdots = \sum_{n=1}^{\infty} \frac{(-1)^{n+1} x^n}{n} $$ #### Horner's Method for Evaluation For polynomial $P(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n$: $$ P(x) = a_0 + x(a_1 + x(a_2 + x(a_3 + \cdots + x(a_{n-1} + x \cdot a_n)\cdots))) $$ **Advantages:** - Reduces multiplications from $\frac{n(n+1)}{2}$ to $n$ - Reduces additions from $n$ to $n$ - Enables pipelining #### Minimax Polynomial Approximation Minimizes the maximum error over an interval: $$ \min_{p \in P_n} \max_{x \in [a,b]} |f(x) - p(x)| $$ Where $P_n$ is the set of polynomials of degree $\leq n$. **Chebyshev approximation** often provides near-minimax results: $$ T_n(x) = \cos(n \cdot \arccos(x)) $$ ### 2.3 Lookup Table (LUT) Methods #### Direct Table Lookup $$ f(x) \approx \text{LUT}[\lfloor x \cdot 2^n \rfloor] $$ **Memory requirement:** $$ \text{Memory (bits)} = 2^{\text{input\_bits}} \times \text{output\_bits} $$ #### Linear Interpolation $$ f(x) \approx f(x_0) + \frac{f(x_1) - f(x_0)}{x_1 - x_0} \cdot (x - x_0) $$ Simplified for uniform spacing ($\Delta x = x_1 - x_0$): $$ f(x) \approx f(x_0) + \frac{\Delta f}{\Delta x} \cdot (x - x_0) $$ #### Bipartite Table Method Decomposes the function into two smaller tables: $$ f(x) \approx T_1(x_{\text{MSB}}) + T_2(x_{\text{MSB}}, x_{\text{LSB}}) $$ **Memory savings:** $$ \text{Bipartite Memory} \approx 2 \cdot 2^{n/2} \ll 2^n \text{ (Direct)} $$ #### Piecewise Polynomial with LUT $$ f(x) = \sum_{k=0}^{d} c_k[i] \cdot (x - x_i)^k \quad \text{for } x \in [x_i, x_{i+1}) $$ Where coefficients $c_k[i]$ are stored in lookup tables. ## 3. System-Level Mathematical Modeling ### 3.1 Boolean Algebra Model #### Basic Operations | Operation | Symbol | VHDL | Verilog | |-----------|--------|------|---------| | AND | $A \cdot B$ | `A and B` | `A & B` | | OR | $A + B$ | `A or B` | `A ∣ B` | | NOT | $\overline{A}$ | `not A` | `~A` | | XOR | $A \oplus B$ | `A xor B` | `A ^ B` | | NAND | $\overline{A \cdot B}$ | `A nand B` | `~(A & B)` | | NOR | $\overline{A + B}$ | `A nor B` | `~(A ∣ B)` | #### Boolean Algebra Laws **Identity:** $$ A + 0 = A \quad \text{and} \quad A \cdot 1 = A $$ **Complement:** $$ A + \overline{A} = 1 \quad \text{and} \quad A \cdot \overline{A} = 0 $$ **De Morgan's Theorems:** $$ \overline{A + B} = \overline{A} \cdot \overline{B} \quad \text{and} \quad \overline{A \cdot B} = \overline{A} + \overline{B} $$ **Absorption:** $$ A + A \cdot B = A \quad \text{and} \quad A \cdot (A + B) = A $$ #### Sum of Products (SOP) Form $$ F = \sum_{i} m_i \quad \text{where } m_i \text{ are minterms} $$ **Example:** $$ F(A, B, C) = \overline{A} \cdot B \cdot C + A \cdot \overline{B} \cdot C + A \cdot B \cdot \overline{C} $$ ### 3.2 Finite State Machine (FSM) Models #### Moore Machine $$ \begin{aligned} \text{Next State:} \quad S_{n+1} &= \delta(S_n, I_n) \\ \text{Output:} \quad O_n &= \lambda(S_n) \end{aligned} $$ #### Mealy Machine $$ \begin{aligned} \text{Next State:} \quad S_{n+1} &= \delta(S_n, I_n) \\ \text{Output:} \quad O_n &= \lambda(S_n, I_n) \end{aligned} $$ #### State Encoding | Encoding | States | Flip-Flops | Transitions | |----------|--------|------------|-------------| | Binary | $N$ | $\lceil \log_2 N \rceil$ | Complex | | One-Hot | $N$ | $N$ | Simple | | Gray | $N$ | $\lceil \log_2 N \rceil$ | Low power | ### 3.3 Dataflow Graph Model #### Directed Acyclic Graph (DAG) Representation - **Nodes ($V$):** Operations (add, multiply, etc.) - **Edges ($E$):** Data dependencies - **Graph:** $G = (V, E)$ #### Critical Path Length $$ T_{\text{critical}} = \max_{\text{paths } p} \sum_{v \in p} t_v $$ Where $t_v$ is the delay of operation $v$. #### Parallelism Metrics **Available parallelism:** $$ P = \frac{\sum_{v \in V} t_v}{T_{\text{critical}}} $$ **Resource-constrained schedule length:** $$ T_{\text{schedule}} \geq \max\left( T_{\text{critical}}, \left\lceil \frac{\sum_{v \in V} t_v}{R} \right\rceil \right) $$ Where $R$ is the number of available resources. ### 3.4 State-Space Models #### Continuous-Time State-Space $$ \begin{aligned} \dot{\mathbf{x}}(t) &= \mathbf{A}\mathbf{x}(t) + \mathbf{B}\mathbf{u}(t) \\ \mathbf{y}(t) &= \mathbf{C}\mathbf{x}(t) + \mathbf{D}\mathbf{u}(t) \end{aligned} $$ #### Discrete-Time State-Space (for FPGA) $$ \begin{aligned} \mathbf{x}[k+1] &= \mathbf{A}_d\mathbf{x}[k] + \mathbf{B}_d\mathbf{u}[k] \\ \mathbf{y}[k] &= \mathbf{C}_d\mathbf{x}[k] + \mathbf{D}_d\mathbf{u}[k] \end{aligned} $$ #### Discretization (Zero-Order Hold) $$ \mathbf{A}_d = e^{\mathbf{A}T_s} \quad \text{and} \quad \mathbf{B}_d = \mathbf{A}^{-1}(e^{\mathbf{A}T_s} - \mathbf{I})\mathbf{B} $$ Where $T_s$ is the sampling period. ## 4. Numerical Methods for FPGA ### 4.1 ODE Solvers #### Forward Euler Method $$ y_{n+1} = y_n + h \cdot f(t_n, y_n) $$ **Local truncation error:** $O(h^2)$ **Global error:** $O(h)$ #### Backward Euler Method $$ y_{n+1} = y_n + h \cdot f(t_{n+1}, y_{n+1}) $$ Requires solving implicit equation (more stable but complex). #### Trapezoidal Method (Crank-Nicolson) $$ y_{n+1} = y_n + \frac{h}{2}\left[f(t_n, y_n) + f(t_{n+1}, y_{n+1})\right] $$ **Local truncation error:** $O(h^3)$ #### 4th Order Runge-Kutta (RK4) $$ y_{n+1} = y_n + \frac{h}{6}(k_1 + 2k_2 + 2k_3 + k_4) $$ Where: $$ \begin{aligned} k_1 &= f(t_n, y_n) \\ k_2 &= f\left(t_n + \frac{h}{2}, y_n + \frac{h}{2}k_1\right) \\ k_3 &= f\left(t_n + \frac{h}{2}, y_n + \frac{h}{2}k_2\right) \\ k_4 &= f(t_n + h, y_n + h \cdot k_3) \end{aligned} $$ **Local truncation error:** $O(h^5)$ **Global error:** $O(h^4)$ #### Method Comparison for FPGA | Method | Accuracy | Stability | FPGA Resources | Latency | |--------|----------|-----------|----------------|---------| | Forward Euler | $O(h)$ | Conditional | Low | 1 mult | | Backward Euler | $O(h)$ | Unconditional | Medium | Iterative | | Trapezoidal | $O(h^2)$ | A-stable | Medium | Iterative | | RK4 | $O(h^4)$ | Conditional | High | 4 stages | ### 4.2 Linear Algebra Operations #### Matrix-Vector Multiplication $$ \mathbf{y} = \mathbf{A}\mathbf{x} \quad \Rightarrow \quad y_i = \sum_{j=1}^{n} a_{ij} x_j $$ **FPGA Implementation:** - Fully parallel: $n^2$ multipliers, $O(n)$ adder tree - Systolic array: $n$ multipliers, $n$ cycles #### Matrix-Matrix Multiplication $$ \mathbf{C} = \mathbf{A}\mathbf{B} \quad \Rightarrow \quad c_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj} $$ **Computational complexity:** $O(n^3)$ multiplications #### LU Decomposition $$ \mathbf{A} = \mathbf{L}\mathbf{U} $$ **FPGA considerations:** - Division required for pivot elements - Sequential dependencies limit parallelism ## 5. Timing and Resource Models ### 5.1 Static Timing Analysis (STA) #### Setup Time Constraint $$ T_{\text{clk-to-q}} + T_{\text{logic}} + T_{\text{routing}} + T_{\text{setup}} \leq T_{\text{clk}} $$ Rearranged: $$ T_{\text{logic}} + T_{\text{routing}} \leq T_{\text{clk}} - T_{\text{clk-to-q}} - T_{\text{setup}} $$ #### Hold Time Constraint $$ T_{\text{clk-to-q}} + T_{\text{logic}} + T_{\text{routing}} \geq T_{\text{hold}} $$ #### Slack Calculations **Setup slack:** $$ \text{Slack}_{\text{setup}} = T_{\text{clk}} - T_{\text{clk-to-q}} - T_{\text{setup}} - T_{\text{data\_path}} $$ **Hold slack:** $$ \text{Slack}_{\text{hold}} = T_{\text{data\_path}} + T_{\text{clk-to-q}} - T_{\text{hold}} $$ **Condition for valid timing:** $$ \text{Slack} \geq 0 $$ #### Maximum Operating Frequency $$ f_{\text{max}} = \frac{1}{T_{\text{clk-to-q}} + T_{\text{logic}} + T_{\text{routing}} + T_{\text{setup}}} $$ ### 5.2 Resource Utilization Models #### LUT Utilization For $n$-input boolean function: $$ \text{LUTs required} = \left\lceil \frac{n}{k} \right\rceil^{\text{levels}} $$ Where $k$ is the LUT input count (typically 4-6). #### DSP Slice Utilization For multiplication $A \times B$: $$ \text{DSP slices} = \left\lceil \frac{\text{bitwidth}_A}{18} \right\rceil \times \left\lceil \frac{\text{bitwidth}_B}{18} \right\rceil $$ (For typical 18×18 DSP multipliers) #### Block RAM Utilization $$ \text{BRAMs} = \left\lceil \frac{\text{depth}}{2^{k}} \right\rceil \times \left\lceil \frac{\text{width}}{w} \right\rceil $$ Where $k$ and $w$ are BRAM address and data widths. #### Register Utilization $$ \text{Registers} = \sum_{\text{signals}} \text{bitwidth} \times \text{pipeline\_stages} $$ ### 5.3 Power Models #### Dynamic Power $$ P_{\text{dynamic}} = \alpha \cdot C_L \cdot V_{DD}^2 \cdot f $$ Where: - $\alpha$ = switching activity factor - $C_L$ = load capacitance - $V_{DD}$ = supply voltage - $f$ = clock frequency #### Static Power $$ P_{\text{static}} = I_{\text{leakage}} \cdot V_{DD} $$ #### Total Power $$ P_{\text{total}} = P_{\text{dynamic}} + P_{\text{static}} $$ ## 6. High-Level Synthesis Models ### 6.1 Quantization Models #### Fixed-Point Quantization $$ x_q = \text{round}\left(\frac{x}{2^{-n}}\right) \cdot 2^{-n} $$ Where $n$ is the number of fractional bits. #### Quantization Error $$ \epsilon_q = x - x_q $$ **For rounding:** $$ -\frac{2^{-n}}{2} \leq \epsilon_q < \frac{2^{-n}}{2} $$ **Quantization noise power (uniform distribution):** $$ \sigma_q^2 = \frac{(2^{-n})^2}{12} = \frac{2^{-2n}}{12} $$ #### Signal-to-Quantization-Noise Ratio (SQNR) $$ \text{SQNR (dB)} = 10 \log_{10}\left(\frac{\sigma_x^2}{\sigma_q^2}\right) \approx 6.02n + 1.76 \text{ dB} $$ For full-scale sinusoidal signal with $n$ fractional bits. ### 6.2 Scheduling Models #### ASAP (As Soon As Possible) Scheduling $$ t_{\text{ASAP}}(v) = \max_{u \in \text{pred}(v)} \left( t_{\text{ASAP}}(u) + d_u \right) $$ Where $d_u$ is the delay of operation $u$. #### ALAP (As Late As Possible) Scheduling $$ t_{\text{ALAP}}(v) = \min_{w \in \text{succ}(v)} \left( t_{\text{ALAP}}(w) - d_v \right) $$ #### Mobility (Slack) $$ \text{Mobility}(v) = t_{\text{ALAP}}(v) - t_{\text{ASAP}}(v) $$ Operations with zero mobility are on the critical path. ### 6.3 Resource Binding Models #### Resource Sharing Constraint $$ \sum_{v: t(v) = t} r(v, k) \leq R_k \quad \forall t, k $$ Where: - $r(v, k)$ = 1 if operation $v$ uses resource type $k$ - $R_k$ = number of available resources of type $k$ #### Conflict Graph Two operations conflict if they: 1. Execute in the same time step 2. Require the same resource type **Minimum resources = Chromatic number of conflict graph** ## 7. Optimization Models ### 7.1 Pipeline Optimization #### Pipeline Stages $$ \text{Stages} = \left\lceil \frac{T_{\text{combinational}}}{T_{\text{target\_clock}}} \right\rceil $$ #### Throughput $$ \text{Throughput} = \frac{f_{\text{clk}}}{\text{II}} $$ Where II = Initiation Interval (cycles between new inputs). #### Latency $$ \text{Latency} = \text{Stages} \times T_{\text{clk}} $$ #### Pipeline Efficiency $$ \eta = \frac{\text{Ideal Throughput}}{\text{Actual Throughput}} = \frac{1}{\text{II}} $$ ### 7.2 Loop Optimization #### Loop Unrolling For unroll factor $U$: $$ \text{New iterations} = \left\lceil \frac{N}{U} \right\rceil $$ **Resource increase:** ~$U \times$ original **Throughput increase:** up to $U \times$ #### Loop Pipelining $$ \text{II}_{\text{achieved}} = \max(\text{II}_{\text{resource}}, \text{II}_{\text{recurrence}}, \text{II}_{\text{target}}) $$ Where: - $\text{II}_{\text{resource}}$ = resource-constrained II - $\text{II}_{\text{recurrence}}$ = loop-carried dependency II #### Loop Tiling For tile size $T$: $$ \text{for } (ii = 0; ii < N; ii += T) \\ \quad \text{for } (i = ii; i < \min(ii + T, N); i++) $$ **Memory accesses reduced by factor of** $T$ for reused data. ### 7.3 Area-Time Trade-offs #### Area-Time Product $$ \text{AT} = \text{Area} \times \text{Time} $$ #### Pareto Optimality A design $D_1$ dominates $D_2$ if: $$ \text{Area}(D_1) \leq \text{Area}(D_2) \quad \text{AND} \quad \text{Time}(D_1) \leq \text{Time}(D_2) $$ with at least one strict inequality. #### Cost Function $$ \text{Cost} = \alpha \cdot \text{Area} + \beta \cdot \text{Latency} + \gamma \cdot \text{Power} $$ Where $\alpha + \beta + \gamma = 1$ are weighting factors. ## 8. Practical Examples ### 8.1 FIR Filter Implementation #### Mathematical Definition $$ y[n] = \sum_{k=0}^{N-1} h[k] \cdot x[n-k] $$ #### Direct Form Implementation **Resources:** - Multipliers: $N$ - Adders: $N-1$ - Registers: $N-1$ (delay line) **Accumulator sizing:** $$ \text{Acc bits} = \text{input bits} + \text{coef bits} + \lceil \log_2(N) \rceil $$ #### Transpose Form (Better for FPGA) **Advantages:** - All multipliers fed by same input - Shorter critical path - Better for high-speed designs ### 8.2 FFT Implementation #### Radix-2 DIT FFT $$ X[k] = \sum_{n=0}^{N-1} x[n] \cdot W_N^{nk} $$ Where $W_N = e^{-j2\pi/N}$ (twiddle factor). #### Butterfly Operation $$ \begin{aligned} X[k] &= A + W_N^k \cdot B \\ X[k + N/2] &= A - W_N^k \cdot B \end{aligned} $$ #### Resource Requirements | Architecture | Multipliers | Memory | Throughput | |--------------|-------------|--------|------------| | Radix-2 Serial | 1 complex | $N$ | $N \log_2 N$ cycles | | Radix-2 Parallel | $N/2$ complex | $N$ | $\log_2 N$ cycles | | Pipelined | $\log_2 N$ complex | $2N$ | 1 sample/cycle | ### 8.3 PID Controller #### Continuous-Time PID $$ u(t) = K_p \cdot e(t) + K_i \int_0^t e(\tau) d\tau + K_d \frac{de(t)}{dt} $$ #### Discrete-Time PID (Tustin Transform) $$ u[n] = u[n-1] + a_0 \cdot e[n] + a_1 \cdot e[n-1] + a_2 \cdot e[n-2] $$ Where: $$ \begin{aligned} a_0 &= K_p + K_i \frac{T_s}{2} + K_d \frac{2}{T_s} \\ a_1 &= -K_p + K_i \frac{T_s}{2} - K_d \frac{4}{T_s} \\ a_2 &= K_d \frac{2}{T_s} \end{aligned} $$ #### FPGA Implementation Considerations - **Coefficient quantization:** affects stability - **Accumulator overflow:** requires saturation logic - **Anti-windup:** needed for integrator term ## Common Mathematical Constants for FPGA | Constant | Value | Q16.16 Hex | |----------|-------|------------| | $\pi$ | 3.14159265 | `0x0003_243F` | | $e$ | 2.71828183 | `0x0002_B7E1` | | $\sqrt{2}$ | 1.41421356 | `0x0001_6A09` | | $\ln(2)$ | 0.69314718 | `0x0000_B172` | | $1/\pi$ | 0.31830989 | `0x0000_517C` | | CORDIC $K$ | 0.60725293 | `0x0000_9B74` | ## FPGA Vendor DSP Specifications | Vendor | Family | DSP Name | Multiplier Size | Pre-adder | |--------|--------|----------|-----------------|-----------| | AMD/Xilinx | 7 Series | DSP48E1 | 25×18 | Yes | | AMD/Xilinx | UltraScale | DSP48E2 | 27×18 | Yes | | Intel | Stratix 10 | DSP Block | 18×19 or 27×27 | Yes | | Lattice | ECP5 | MULT18X18D | 18×18 | No |

fractal dimension of surfaces, metrology

Characterize complexity of rough surfaces.

fractured data, lithography

Design broken into write fields.

full array bga, packaging

Balls across entire bottom.

full wafer test, testing

Test all dies on wafer.

fusion bonding, advanced packaging

Silicon oxide bonds formed by thermal treatment.