← Back to AI Factory Chat

AI Factory Glossary

9,967 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 67 of 200 (9,967 entries)

four-point probe mapping, metrology

Map sheet resistance at many points.

four-point probe, yield enhancement

Four-point probe measures sheet resistance using separate current and voltage contacts eliminating contact resistance.

four-point probe,metrology

Measure sheet resistance of doped or metal films.

fourier features,neural architecture

Encode inputs with Fourier basis.

fourier neural operator (fno),fourier neural operator,fno,scientific ml

Neural operator using Fourier transforms.

fourier optics, computational lithography, hopkins formulation, transmission cross coefficient, tcc, socs, zernike polynomials, partial coherence, opc, ilt

# Computational Lithography Mathematics Modern semiconductor manufacturing faces a fundamental physical challenge: creating nanoscale features using light with wavelengths much larger than the target dimensions. Computational lithography bridges this gap through sophisticated mathematical techniques. 1. The Core Challenge 1.1 Resolution Limits The Rayleigh criterion defines the minimum resolvable feature size: $$ R = k_1 \cdot \frac{\lambda}{NA} $$ Where: - $R$ = minimum resolution - $k_1$ = process-dependent factor (theoretical limit: 0.25) - $\lambda$ = wavelength of light (193 nm for ArF, 13.5 nm for EUV) - $NA$ = numerical aperture of the lens system 1.2 Depth of Focus $$ DOF = k_2 \cdot \frac{\lambda}{NA^2} $$ 2. Wave Optics Fundamentals 2.1 Partially Coherent Imaging The aerial image intensity on the wafer is described by Hopkins' equation: $$ I(x, y) = \iint TCC(f_1, f_2) \cdot M(f_1) \cdot M^*(f_2) \, df_1 \, df_2 $$ Where: - $I(x, y)$ = intensity at wafer position $(x, y)$ - $TCC(f_1, f_2)$ = Transmission Cross Coefficient - $M(f)$ = Fourier transform of the mask pattern - $M^*(f)$ = complex conjugate of $M(f)$ 2.2 Transmission Cross Coefficient The TCC captures the optical system behavior: $$ TCC(f_1, f_2) = \iint S(\xi, \eta) \cdot H(f_1 + \xi, \eta) \cdot H^*(f_2 + \xi, \eta) \, d\xi \, d\eta $$ Where: - $S(\xi, \eta)$ = source intensity distribution - $H(f)$ = pupil function of the projection optics 3. Optical Proximity Correction (OPC) 3.1 The Inverse Problem OPC solves the inverse imaging problem: $$ \min_{M} \sum_{i} \left\| I(x_i, y_i; M) - I_{\text{target}}(x_i, y_i) \right\|^2 + \lambda R(M) $$ Where: - $M$ = mask pattern (optimization variable) - $I_{\text{target}}$ = desired wafer pattern - $R(M)$ = regularization term for manufacturability - $\lambda$ = regularization weight 3.2 Gradient-Based Optimization The gradient with respect to mask pixels: $$ \frac{\partial J}{\partial M_k} = \sum_{i} 2 \left( I_i - I_{\text{target},i} \right) \cdot \frac{\partial I_i}{\partial M_k} $$ 3.3 Key Correction Features - Serifs : Corner additions/subtractions to correct corner rounding - Hammerheads : Line-end extensions to prevent line shortening - Assist features : Sub-resolution features that improve main feature fidelity - Scattering bars : Improve depth of focus for isolated features 4. Inverse Lithography Technology (ILT) 4.1 Full Pixel-Based Optimization ILT treats each mask pixel as an independent variable: $$ \min_{\mathbf{m}} \left\| \mathbf{I}(\mathbf{m}) - \mathbf{I}_{\text{target}} \right\|_2^2 + \alpha \|\nabla \mathbf{m}\|_1 + \beta \text{TV}(\mathbf{m}) $$ Where: - $\mathbf{m} \in [0, 1]^N$ = continuous mask pixel values - $\text{TV}(\mathbf{m})$ = Total Variation regularization - $\|\nabla \mathbf{m}\|_1$ = sparsity-promoting term 4.2 Level-Set Formulation Mask boundaries represented implicitly: $$ \frac{\partial \phi}{\partial t} = -V \cdot |\nabla \phi| $$ Where: - $\phi(x, y)$ = level-set function - Mask region: $\{(x,y) : \phi(x,y) > 0\}$ - $V$ = velocity field derived from optimization gradient 5. Source Mask Optimization (SMO) 5.1 Joint Optimization Problem $$ \min_{S, M} \sum_{i} \left[ I(x_i, y_i; S, M) - I_{\text{target}}(x_i, y_i) \right]^2 $$ Subject to: - Source constraints: $\int S(\xi, \eta) \, d\xi \, d\eta = 1$, $S \geq 0$ - Mask manufacturability constraints 5.2 Alternating Optimization 1. Fix source $S$, optimize mask $M$ 2. Fix mask $M$, optimize source $S$ 3. Repeat until convergence 6. Rigorous Electromagnetic Simulation 6.1 Maxwell's Equations For accurate 3D mask effects: $$ \nabla \times \mathbf{E} = -\frac{\partial \mathbf{B}}{\partial t} $$ $$ \nabla \times \mathbf{H} = \mathbf{J} + \frac{\partial \mathbf{D}}{\partial t} $$ $$ \nabla \cdot \mathbf{D} = \rho $$ $$ \nabla \cdot \mathbf{B} = 0 $$ 6.2 Numerical Methods - FDTD (Finite-Difference Time-Domain) : $$ \frac{\partial E_x}{\partial t} = \frac{1}{\epsilon} \left( \frac{\partial H_z}{\partial y} - \frac{\partial H_y}{\partial z} \right) $$ - RCWA (Rigorous Coupled-Wave Analysis) : Expansion in Fourier harmonics $$ \mathbf{E}(x, y, z) = \sum_{m,n} \mathbf{E}_{mn}(z) \cdot e^{i(k_{xm}x + k_{yn}y)} $$ 7. Photoresist Modeling 7.1 Dill Model for Absorption $$ I(z) = I_0 \exp\left( -\int_0^z \alpha(z') \, dz' \right) $$ Where absorption coefficient: $$ \alpha = A \cdot M + B $$ - $A$ = bleachable absorption - $B$ = non-bleachable absorption - $M$ = photoactive compound concentration 7.2 Exposure Kinetics $$ \frac{dM}{dt} = -C \cdot I \cdot M $$ - $C$ = exposure rate constant 7.3 Acid Diffusion (Post-Exposure Bake) Reaction-diffusion equation: $$ \frac{\partial [H^+]}{\partial t} = D \nabla^2 [H^+] - k_{\text{loss}} [H^+] $$ Where: - $D$ = diffusion coefficient (temperature-dependent) - $k_{\text{loss}}$ = acid loss rate 7.4 Development Rate Mack model: $$ r = r_{\max} \cdot \frac{(a+1)(1-m)^n}{a + (1-m)^n} + r_{\min} $$ Where $m$ = normalized remaining PAC concentration. 8. Stochastic Effects 8.1 Photon Shot Noise Photon count follows Poisson distribution: $$ P(n) = \frac{\lambda^n e^{-\lambda}}{n!} $$ Standard deviation: $$ \sigma_n = \sqrt{\bar{n}} $$ 8.2 Line Edge Roughness (LER) Power spectral density: $$ PSD(f) = \frac{A}{1 + (2\pi f \xi)^{2\alpha}} $$ Where: - $\xi$ = correlation length - $\alpha$ = roughness exponent - $A$ = amplitude 8.3 Stochastic Defect Probability For extreme ultraviolet (EUV): $$ P_{\text{defect}} = 1 - \exp\left( -\frac{A_{\text{pixel}}}{N_{\text{photons}} \cdot \eta} \right) $$ 9. Multi-Patterning Mathematics 9.1 Graph Coloring Formulation Given conflict graph $G = (V, E)$: - $V$ = features - $E$ = edges connecting features with spacing $< \text{min}_{\text{space}}$ Find $k$-coloring $c: V \rightarrow \{1, 2, \ldots, k\}$ such that: $$ \forall (u, v) \in E: c(u) \neq c(v) $$ 9.2 Integer Linear Programming Formulation $$ \min \sum_{(i,j) \in E} w_{ij} \cdot y_{ij} $$ Subject to: $$ \sum_{k=1}^{K} x_{ik} = 1 \quad \forall i \in V $$ $$ x_{ik} + x_{jk} - y_{ij} \leq 1 \quad \forall (i,j) \in E, \forall k $$ $$ x_{ik}, y_{ij} \in \{0, 1\} $$ 10. EUV Lithography Specific Mathematics 10.1 Multilayer Mirror Reflectivity Bragg condition for Mo/Si multilayers: $$ 2d \sin\theta = n\lambda $$ Reflectivity at each interface: $$ r = \frac{n_1 - n_2}{n_1 + n_2} $$ Total reflectivity (matrix method): $$ \mathbf{M}_{\text{total}} = \prod_{j=1}^{N} \mathbf{M}_j $$ 10.2 Mask 3D Effects Shadow effect for off-axis illumination: $$ \Delta x = h_{\text{absorber}} \cdot \tan(\theta_{\text{chief ray}}) $$ 11. Machine Learning in Computational Lithography 11.1 Neural Network as Fast Surrogate Model $$ I_{\text{predicted}} = f_{\theta}(M) $$ Where $f_{\theta}$ is a trained CNN, training minimizes: $$ \mathcal{L} = \sum_{i} \left\| f_{\theta}(M_i) - I_{\text{rigorous}}(M_i) \right\|^2 $$ 11.2 Physics-Informed Neural Networks Loss function incorporating physics: $$ \mathcal{L} = \mathcal{L}_{\text{data}} + \lambda_{\text{physics}} \mathcal{L}_{\text{physics}} $$ Where: $$ \mathcal{L}_{\text{physics}} = \left\| \nabla^2 E + k^2 \epsilon E \right\|^2 $$ 12. Key Mathematical Techniques Summary | Technique | Application | |-----------|-------------| | Fourier Analysis | Optical imaging, frequency domain calculations | | Inverse Problems | OPC, ILT, metrology | | Non-convex Optimization | Mask optimization, SMO | | Partial Differential Equations | EM simulation, resist diffusion | | Graph Theory | Multi-patterning decomposition | | Stochastic Processes | Shot noise, LER modeling | | Linear Algebra | Large sparse system solutions | | Machine Learning | Fast surrogate models, pattern recognition | 13. Computational Complexity 13.1 Full-Chip OPC Scale - Features : $\sim 10^{12}$ polygon edges - Variables : $\sim 10^8$ optimization parameters - Compute time : hours to days on $1000+$ CPU cores - Memory : terabytes of working data 13.2 Complexity Classes | Operation | Complexity | |-----------|------------| | FFT for imaging | $O(N \log N)$ | | RCWA per wavelength | $O(M^3)$ where $M$ = harmonics | | ILT optimization | $O(N \cdot k)$ where $k$ = iterations | | Graph coloring | NP-complete (general case) | Notation: | Symbol | Meaning | |--------|---------| | $\lambda$ | Wavelength | | $NA$ | Numerical Aperture | | $TCC$ | Transmission Cross Coefficient | | $M(f)$ | Mask Fourier transform | | $I(x,y)$ | Intensity at wafer | | $\phi$ | Level-set function | | $D$ | Diffusion coefficient | | $\sigma$ | Standard deviation | | $PSD$ | Power Spectral Density |

fourier position encoding, computer vision

Use Fourier features for position.

fourier transform analysis, data analysis

Frequency domain analysis of signals.

fourier transform infrared spectroscopy (ftir),fourier transform infrared spectroscopy,ftir,metrology

Identify chemical bonds.

fourier transform process, manufacturing operations

Fourier transforms decompose time series into frequency components.

fouriermix, computer vision

Frequency-domain mixing.

fourteen points alternating, spc

Oscillation pattern.

fowler-nordheim tunneling, device physics

Field emission through triangular barrier.

fp16 training, fp16, optimization

Half-precision training.

fp16,half precision,convert

FP16 is 16-bit float. 2x memory savings. Some accuracy loss, need loss scaling in training.

fp32,single precision,float

FP32 is 32-bit float, standard precision. Full accuracy but slow and memory-heavy for inference.

fp8,8bit,latest

FP8 is 8-bit float, emerging standard. E4M3 and E5M2 formats. NVIDIA H100 supports FP8.

fpga alternative,chip design hobby,logisim,ngspice

# FPGA Mathematical Modeling A comprehensive guide to mathematical techniques, representations, and methodologies for implementing algorithms in Field-Programmable Gate Arrays. ## 1. Number Representation Models ### 1.1 Fixed-Point Representation Fixed-point representation maintains the decimal point within a fixed position, enabling straightforward arithmetic operations. #### Notation Format - **Q-format:** $Q(m, n)$ or $(x, y)$ - $m$ or $x$ = number of integer bits - $n$ or $y$ = number of fractional bits - Total bits = $m + n + 1$ (including sign bit for signed) #### Mathematical Formulations **Integer bits calculation:** $$ \text{Integer Bits} = \lceil \log_2(\text{max\_value}) \rceil $$ **Value representation:** $$ \text{Value} = \sum_{i=-n}^{m-1} b_i \cdot 2^i $$ Where $b_i \in \{0, 1\}$ for unsigned, or using two's complement for signed. **Resolution (LSB value):** $$ \text{Resolution} = 2^{-n} $$ **Range for signed Q(m,n):** $$ \text{Range} = \left[ -2^{m-1}, \, 2^{m-1} - 2^{-n} \right] $$ #### Fixed-Point Arithmetic Rules - **Addition/Subtraction:** - Decimal points must be aligned - Result: $\max(m_1, m_2) + 1$ integer bits, $\max(n_1, n_2)$ fractional bits $$ Q(m_1, n_1) \pm Q(m_2, n_2) \rightarrow Q(\max(m_1, m_2) + 1, \max(n_1, n_2)) $$ - **Multiplication:** $$ Q(m_1, n_1) \times Q(m_2, n_2) \rightarrow Q(m_1 + m_2, n_1 + n_2) $$ - **Scaling by powers of 2:** - Left shift by $k$: multiply by $2^k$ - Right shift by $k$: divide by $2^k$ #### Example Calculation For a 16-bit signed Q8.8 format: - Integer range: $[-128, 127]$ - Fractional resolution: $2^{-8} = 0.00390625$ - Example: Binary `0000_0001.1000_0000` = $1.5_{10}$ ### 1.2 Floating-Point Representation #### IEEE 754 Standard Formats | Format | Total Bits | Sign | Exponent | Mantissa | |--------|-----------|------|----------|----------| | Half | 16 | 1 | 5 | 10 | | Single | 32 | 1 | 8 | 23 | | Double | 64 | 1 | 11 | 52 | #### Mathematical Representation $$ \text{Value} = (-1)^{s} \times (1 + M) \times 2^{E - \text{bias}} $$ Where: - $s$ = sign bit - $M$ = mantissa (fractional part) - $E$ = exponent - $\text{bias} = 2^{k-1} - 1$ (where $k$ = exponent bits) #### Bias Values - Half precision: $\text{bias} = 15$ - Single precision: $\text{bias} = 127$ - Double precision: $\text{bias} = 1023$ #### Resource Comparison | Operation | Fixed-Point LUTs | Floating-Point LUTs | Ratio | |-----------|-----------------|---------------------|-------| | Add (32-bit) | ~32 | ~400 | ~12.5× | | Multiply (32-bit) | ~64 (DSP) | ~600 | ~9× | | Divide (32-bit) | ~200 | ~1200 | ~6× | ## 2. Mathematical Function Implementation ### 2.1 CORDIC Algorithm **CORDIC** (COordinate Rotation DIgital Computer) computes trigonometric, hyperbolic, and other functions using only shift-add operations. #### Core Rotation Matrix $$ \begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} $$ #### CORDIC Simplification By restricting $\tan(\theta_i) = 2^{-i}$, the rotation becomes: $$ \begin{bmatrix} x' \\ y' \end{bmatrix} = \cos\theta_i \begin{bmatrix} 1 & -\sigma_i 2^{-i} \\ \sigma_i 2^{-i} & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} $$ Where $\sigma_i \in \{-1, +1\}$ is the rotation direction. #### Iterative Equations $$ \begin{aligned} x_{i+1} &= x_i - \sigma_i \cdot y_i \cdot 2^{-i} \\ y_{i+1} &= y_i + \sigma_i \cdot x_i \cdot 2^{-i} \\ z_{i+1} &= z_i - \sigma_i \cdot \arctan(2^{-i}) \end{aligned} $$ #### CORDIC Gain (Scale Factor) $$ K_n = \prod_{i=0}^{n-1} \cos(\arctan(2^{-i})) = \prod_{i=0}^{n-1} \frac{1}{\sqrt{1 + 2^{-2i}}} $$ For $n \rightarrow \infty$: $$ K_\infty \approx 0.6072529350088812561694 $$ #### CORDIC Modes | Mode | Condition | Computes | |------|-----------|----------| | Rotation | $\sigma_i = \text{sign}(z_i)$ | $x_n = K(x_0\cos z_0 - y_0\sin z_0)$ | | Vectoring | $\sigma_i = -\text{sign}(y_i)$ | $z_n = z_0 + \arctan(y_0/x_0)$ | #### CORDIC Functions Table | Function | Mode | Initial Values | Result | |----------|------|----------------|--------| | $\sin(\theta)$ | Rotation | $x_0=1/K$, $y_0=0$, $z_0=\theta$ | $y_n$ | | $\cos(\theta)$ | Rotation | $x_0=1/K$, $y_0=0$, $z_0=\theta$ | $x_n$ | | $\arctan(a)$ | Vectoring | $x_0=1$, $y_0=a$, $z_0=0$ | $z_n$ | | $\sqrt{x^2+y^2}$ | Vectoring | $x_0=x$, $y_0=y$, $z_0=0$ | $K \cdot x_n$ | #### CORDIC VHDL Pseudo-Structure ```vhdl -- CORDIC iteration stage process(clk) begin if rising_edge(clk) then if z(i) >= 0 then -- σ = +1 x(i+1) <= x(i) - shift_right(y(i), i); y(i+1) <= y(i) + shift_right(x(i), i); z(i+1) <= z(i) - atan_table(i); else -- σ = -1 x(i+1) <= x(i) + shift_right(y(i), i); y(i+1) <= y(i) - shift_right(x(i), i); z(i+1) <= z(i) + atan_table(i); end if; end if; end process; ``` ### 2.2 Polynomial Approximation #### Taylor Series Expansion $$ f(x) = \sum_{n=0}^{N} \frac{f^{(n)}(a)}{n!}(x-a)^n + R_N(x) $$ Where $R_N(x)$ is the remainder term. #### Common Taylor Expansions **Sine function:** $$ \sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \cdots = \sum_{n=0}^{\infty} \frac{(-1)^n x^{2n+1}}{(2n+1)!} $$ **Cosine function:** $$ \cos(x) = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \frac{x^6}{6!} + \cdots = \sum_{n=0}^{\infty} \frac{(-1)^n x^{2n}}{(2n)!} $$ **Exponential function:** $$ e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots = \sum_{n=0}^{\infty} \frac{x^n}{n!} $$ **Natural logarithm (for $|x| < 1$):** $$ \ln(1+x) = x - \frac{x^2}{2} + \frac{x^3}{3} - \frac{x^4}{4} + \cdots = \sum_{n=1}^{\infty} \frac{(-1)^{n+1} x^n}{n} $$ #### Horner's Method for Evaluation For polynomial $P(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n$: $$ P(x) = a_0 + x(a_1 + x(a_2 + x(a_3 + \cdots + x(a_{n-1} + x \cdot a_n)\cdots))) $$ **Advantages:** - Reduces multiplications from $\frac{n(n+1)}{2}$ to $n$ - Reduces additions from $n$ to $n$ - Enables pipelining #### Minimax Polynomial Approximation Minimizes the maximum error over an interval: $$ \min_{p \in P_n} \max_{x \in [a,b]} |f(x) - p(x)| $$ Where $P_n$ is the set of polynomials of degree $\leq n$. **Chebyshev approximation** often provides near-minimax results: $$ T_n(x) = \cos(n \cdot \arccos(x)) $$ ### 2.3 Lookup Table (LUT) Methods #### Direct Table Lookup $$ f(x) \approx \text{LUT}[\lfloor x \cdot 2^n \rfloor] $$ **Memory requirement:** $$ \text{Memory (bits)} = 2^{\text{input\_bits}} \times \text{output\_bits} $$ #### Linear Interpolation $$ f(x) \approx f(x_0) + \frac{f(x_1) - f(x_0)}{x_1 - x_0} \cdot (x - x_0) $$ Simplified for uniform spacing ($\Delta x = x_1 - x_0$): $$ f(x) \approx f(x_0) + \frac{\Delta f}{\Delta x} \cdot (x - x_0) $$ #### Bipartite Table Method Decomposes the function into two smaller tables: $$ f(x) \approx T_1(x_{\text{MSB}}) + T_2(x_{\text{MSB}}, x_{\text{LSB}}) $$ **Memory savings:** $$ \text{Bipartite Memory} \approx 2 \cdot 2^{n/2} \ll 2^n \text{ (Direct)} $$ #### Piecewise Polynomial with LUT $$ f(x) = \sum_{k=0}^{d} c_k[i] \cdot (x - x_i)^k \quad \text{for } x \in [x_i, x_{i+1}) $$ Where coefficients $c_k[i]$ are stored in lookup tables. ## 3. System-Level Mathematical Modeling ### 3.1 Boolean Algebra Model #### Basic Operations | Operation | Symbol | VHDL | Verilog | |-----------|--------|------|---------| | AND | $A \cdot B$ | `A and B` | `A & B` | | OR | $A + B$ | `A or B` | `A ∣ B` | | NOT | $\overline{A}$ | `not A` | `~A` | | XOR | $A \oplus B$ | `A xor B` | `A ^ B` | | NAND | $\overline{A \cdot B}$ | `A nand B` | `~(A & B)` | | NOR | $\overline{A + B}$ | `A nor B` | `~(A ∣ B)` | #### Boolean Algebra Laws **Identity:** $$ A + 0 = A \quad \text{and} \quad A \cdot 1 = A $$ **Complement:** $$ A + \overline{A} = 1 \quad \text{and} \quad A \cdot \overline{A} = 0 $$ **De Morgan's Theorems:** $$ \overline{A + B} = \overline{A} \cdot \overline{B} \quad \text{and} \quad \overline{A \cdot B} = \overline{A} + \overline{B} $$ **Absorption:** $$ A + A \cdot B = A \quad \text{and} \quad A \cdot (A + B) = A $$ #### Sum of Products (SOP) Form $$ F = \sum_{i} m_i \quad \text{where } m_i \text{ are minterms} $$ **Example:** $$ F(A, B, C) = \overline{A} \cdot B \cdot C + A \cdot \overline{B} \cdot C + A \cdot B \cdot \overline{C} $$ ### 3.2 Finite State Machine (FSM) Models #### Moore Machine $$ \begin{aligned} \text{Next State:} \quad S_{n+1} &= \delta(S_n, I_n) \\ \text{Output:} \quad O_n &= \lambda(S_n) \end{aligned} $$ #### Mealy Machine $$ \begin{aligned} \text{Next State:} \quad S_{n+1} &= \delta(S_n, I_n) \\ \text{Output:} \quad O_n &= \lambda(S_n, I_n) \end{aligned} $$ #### State Encoding | Encoding | States | Flip-Flops | Transitions | |----------|--------|------------|-------------| | Binary | $N$ | $\lceil \log_2 N \rceil$ | Complex | | One-Hot | $N$ | $N$ | Simple | | Gray | $N$ | $\lceil \log_2 N \rceil$ | Low power | ### 3.3 Dataflow Graph Model #### Directed Acyclic Graph (DAG) Representation - **Nodes ($V$):** Operations (add, multiply, etc.) - **Edges ($E$):** Data dependencies - **Graph:** $G = (V, E)$ #### Critical Path Length $$ T_{\text{critical}} = \max_{\text{paths } p} \sum_{v \in p} t_v $$ Where $t_v$ is the delay of operation $v$. #### Parallelism Metrics **Available parallelism:** $$ P = \frac{\sum_{v \in V} t_v}{T_{\text{critical}}} $$ **Resource-constrained schedule length:** $$ T_{\text{schedule}} \geq \max\left( T_{\text{critical}}, \left\lceil \frac{\sum_{v \in V} t_v}{R} \right\rceil \right) $$ Where $R$ is the number of available resources. ### 3.4 State-Space Models #### Continuous-Time State-Space $$ \begin{aligned} \dot{\mathbf{x}}(t) &= \mathbf{A}\mathbf{x}(t) + \mathbf{B}\mathbf{u}(t) \\ \mathbf{y}(t) &= \mathbf{C}\mathbf{x}(t) + \mathbf{D}\mathbf{u}(t) \end{aligned} $$ #### Discrete-Time State-Space (for FPGA) $$ \begin{aligned} \mathbf{x}[k+1] &= \mathbf{A}_d\mathbf{x}[k] + \mathbf{B}_d\mathbf{u}[k] \\ \mathbf{y}[k] &= \mathbf{C}_d\mathbf{x}[k] + \mathbf{D}_d\mathbf{u}[k] \end{aligned} $$ #### Discretization (Zero-Order Hold) $$ \mathbf{A}_d = e^{\mathbf{A}T_s} \quad \text{and} \quad \mathbf{B}_d = \mathbf{A}^{-1}(e^{\mathbf{A}T_s} - \mathbf{I})\mathbf{B} $$ Where $T_s$ is the sampling period. ## 4. Numerical Methods for FPGA ### 4.1 ODE Solvers #### Forward Euler Method $$ y_{n+1} = y_n + h \cdot f(t_n, y_n) $$ **Local truncation error:** $O(h^2)$ **Global error:** $O(h)$ #### Backward Euler Method $$ y_{n+1} = y_n + h \cdot f(t_{n+1}, y_{n+1}) $$ Requires solving implicit equation (more stable but complex). #### Trapezoidal Method (Crank-Nicolson) $$ y_{n+1} = y_n + \frac{h}{2}\left[f(t_n, y_n) + f(t_{n+1}, y_{n+1})\right] $$ **Local truncation error:** $O(h^3)$ #### 4th Order Runge-Kutta (RK4) $$ y_{n+1} = y_n + \frac{h}{6}(k_1 + 2k_2 + 2k_3 + k_4) $$ Where: $$ \begin{aligned} k_1 &= f(t_n, y_n) \\ k_2 &= f\left(t_n + \frac{h}{2}, y_n + \frac{h}{2}k_1\right) \\ k_3 &= f\left(t_n + \frac{h}{2}, y_n + \frac{h}{2}k_2\right) \\ k_4 &= f(t_n + h, y_n + h \cdot k_3) \end{aligned} $$ **Local truncation error:** $O(h^5)$ **Global error:** $O(h^4)$ #### Method Comparison for FPGA | Method | Accuracy | Stability | FPGA Resources | Latency | |--------|----------|-----------|----------------|---------| | Forward Euler | $O(h)$ | Conditional | Low | 1 mult | | Backward Euler | $O(h)$ | Unconditional | Medium | Iterative | | Trapezoidal | $O(h^2)$ | A-stable | Medium | Iterative | | RK4 | $O(h^4)$ | Conditional | High | 4 stages | ### 4.2 Linear Algebra Operations #### Matrix-Vector Multiplication $$ \mathbf{y} = \mathbf{A}\mathbf{x} \quad \Rightarrow \quad y_i = \sum_{j=1}^{n} a_{ij} x_j $$ **FPGA Implementation:** - Fully parallel: $n^2$ multipliers, $O(n)$ adder tree - Systolic array: $n$ multipliers, $n$ cycles #### Matrix-Matrix Multiplication $$ \mathbf{C} = \mathbf{A}\mathbf{B} \quad \Rightarrow \quad c_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj} $$ **Computational complexity:** $O(n^3)$ multiplications #### LU Decomposition $$ \mathbf{A} = \mathbf{L}\mathbf{U} $$ **FPGA considerations:** - Division required for pivot elements - Sequential dependencies limit parallelism ## 5. Timing and Resource Models ### 5.1 Static Timing Analysis (STA) #### Setup Time Constraint $$ T_{\text{clk-to-q}} + T_{\text{logic}} + T_{\text{routing}} + T_{\text{setup}} \leq T_{\text{clk}} $$ Rearranged: $$ T_{\text{logic}} + T_{\text{routing}} \leq T_{\text{clk}} - T_{\text{clk-to-q}} - T_{\text{setup}} $$ #### Hold Time Constraint $$ T_{\text{clk-to-q}} + T_{\text{logic}} + T_{\text{routing}} \geq T_{\text{hold}} $$ #### Slack Calculations **Setup slack:** $$ \text{Slack}_{\text{setup}} = T_{\text{clk}} - T_{\text{clk-to-q}} - T_{\text{setup}} - T_{\text{data\_path}} $$ **Hold slack:** $$ \text{Slack}_{\text{hold}} = T_{\text{data\_path}} + T_{\text{clk-to-q}} - T_{\text{hold}} $$ **Condition for valid timing:** $$ \text{Slack} \geq 0 $$ #### Maximum Operating Frequency $$ f_{\text{max}} = \frac{1}{T_{\text{clk-to-q}} + T_{\text{logic}} + T_{\text{routing}} + T_{\text{setup}}} $$ ### 5.2 Resource Utilization Models #### LUT Utilization For $n$-input boolean function: $$ \text{LUTs required} = \left\lceil \frac{n}{k} \right\rceil^{\text{levels}} $$ Where $k$ is the LUT input count (typically 4-6). #### DSP Slice Utilization For multiplication $A \times B$: $$ \text{DSP slices} = \left\lceil \frac{\text{bitwidth}_A}{18} \right\rceil \times \left\lceil \frac{\text{bitwidth}_B}{18} \right\rceil $$ (For typical 18×18 DSP multipliers) #### Block RAM Utilization $$ \text{BRAMs} = \left\lceil \frac{\text{depth}}{2^{k}} \right\rceil \times \left\lceil \frac{\text{width}}{w} \right\rceil $$ Where $k$ and $w$ are BRAM address and data widths. #### Register Utilization $$ \text{Registers} = \sum_{\text{signals}} \text{bitwidth} \times \text{pipeline\_stages} $$ ### 5.3 Power Models #### Dynamic Power $$ P_{\text{dynamic}} = \alpha \cdot C_L \cdot V_{DD}^2 \cdot f $$ Where: - $\alpha$ = switching activity factor - $C_L$ = load capacitance - $V_{DD}$ = supply voltage - $f$ = clock frequency #### Static Power $$ P_{\text{static}} = I_{\text{leakage}} \cdot V_{DD} $$ #### Total Power $$ P_{\text{total}} = P_{\text{dynamic}} + P_{\text{static}} $$ ## 6. High-Level Synthesis Models ### 6.1 Quantization Models #### Fixed-Point Quantization $$ x_q = \text{round}\left(\frac{x}{2^{-n}}\right) \cdot 2^{-n} $$ Where $n$ is the number of fractional bits. #### Quantization Error $$ \epsilon_q = x - x_q $$ **For rounding:** $$ -\frac{2^{-n}}{2} \leq \epsilon_q < \frac{2^{-n}}{2} $$ **Quantization noise power (uniform distribution):** $$ \sigma_q^2 = \frac{(2^{-n})^2}{12} = \frac{2^{-2n}}{12} $$ #### Signal-to-Quantization-Noise Ratio (SQNR) $$ \text{SQNR (dB)} = 10 \log_{10}\left(\frac{\sigma_x^2}{\sigma_q^2}\right) \approx 6.02n + 1.76 \text{ dB} $$ For full-scale sinusoidal signal with $n$ fractional bits. ### 6.2 Scheduling Models #### ASAP (As Soon As Possible) Scheduling $$ t_{\text{ASAP}}(v) = \max_{u \in \text{pred}(v)} \left( t_{\text{ASAP}}(u) + d_u \right) $$ Where $d_u$ is the delay of operation $u$. #### ALAP (As Late As Possible) Scheduling $$ t_{\text{ALAP}}(v) = \min_{w \in \text{succ}(v)} \left( t_{\text{ALAP}}(w) - d_v \right) $$ #### Mobility (Slack) $$ \text{Mobility}(v) = t_{\text{ALAP}}(v) - t_{\text{ASAP}}(v) $$ Operations with zero mobility are on the critical path. ### 6.3 Resource Binding Models #### Resource Sharing Constraint $$ \sum_{v: t(v) = t} r(v, k) \leq R_k \quad \forall t, k $$ Where: - $r(v, k)$ = 1 if operation $v$ uses resource type $k$ - $R_k$ = number of available resources of type $k$ #### Conflict Graph Two operations conflict if they: 1. Execute in the same time step 2. Require the same resource type **Minimum resources = Chromatic number of conflict graph** ## 7. Optimization Models ### 7.1 Pipeline Optimization #### Pipeline Stages $$ \text{Stages} = \left\lceil \frac{T_{\text{combinational}}}{T_{\text{target\_clock}}} \right\rceil $$ #### Throughput $$ \text{Throughput} = \frac{f_{\text{clk}}}{\text{II}} $$ Where II = Initiation Interval (cycles between new inputs). #### Latency $$ \text{Latency} = \text{Stages} \times T_{\text{clk}} $$ #### Pipeline Efficiency $$ \eta = \frac{\text{Ideal Throughput}}{\text{Actual Throughput}} = \frac{1}{\text{II}} $$ ### 7.2 Loop Optimization #### Loop Unrolling For unroll factor $U$: $$ \text{New iterations} = \left\lceil \frac{N}{U} \right\rceil $$ **Resource increase:** ~$U \times$ original **Throughput increase:** up to $U \times$ #### Loop Pipelining $$ \text{II}_{\text{achieved}} = \max(\text{II}_{\text{resource}}, \text{II}_{\text{recurrence}}, \text{II}_{\text{target}}) $$ Where: - $\text{II}_{\text{resource}}$ = resource-constrained II - $\text{II}_{\text{recurrence}}$ = loop-carried dependency II #### Loop Tiling For tile size $T$: $$ \text{for } (ii = 0; ii < N; ii += T) \\ \quad \text{for } (i = ii; i < \min(ii + T, N); i++) $$ **Memory accesses reduced by factor of** $T$ for reused data. ### 7.3 Area-Time Trade-offs #### Area-Time Product $$ \text{AT} = \text{Area} \times \text{Time} $$ #### Pareto Optimality A design $D_1$ dominates $D_2$ if: $$ \text{Area}(D_1) \leq \text{Area}(D_2) \quad \text{AND} \quad \text{Time}(D_1) \leq \text{Time}(D_2) $$ with at least one strict inequality. #### Cost Function $$ \text{Cost} = \alpha \cdot \text{Area} + \beta \cdot \text{Latency} + \gamma \cdot \text{Power} $$ Where $\alpha + \beta + \gamma = 1$ are weighting factors. ## 8. Practical Examples ### 8.1 FIR Filter Implementation #### Mathematical Definition $$ y[n] = \sum_{k=0}^{N-1} h[k] \cdot x[n-k] $$ #### Direct Form Implementation **Resources:** - Multipliers: $N$ - Adders: $N-1$ - Registers: $N-1$ (delay line) **Accumulator sizing:** $$ \text{Acc bits} = \text{input bits} + \text{coef bits} + \lceil \log_2(N) \rceil $$ #### Transpose Form (Better for FPGA) **Advantages:** - All multipliers fed by same input - Shorter critical path - Better for high-speed designs ### 8.2 FFT Implementation #### Radix-2 DIT FFT $$ X[k] = \sum_{n=0}^{N-1} x[n] \cdot W_N^{nk} $$ Where $W_N = e^{-j2\pi/N}$ (twiddle factor). #### Butterfly Operation $$ \begin{aligned} X[k] &= A + W_N^k \cdot B \\ X[k + N/2] &= A - W_N^k \cdot B \end{aligned} $$ #### Resource Requirements | Architecture | Multipliers | Memory | Throughput | |--------------|-------------|--------|------------| | Radix-2 Serial | 1 complex | $N$ | $N \log_2 N$ cycles | | Radix-2 Parallel | $N/2$ complex | $N$ | $\log_2 N$ cycles | | Pipelined | $\log_2 N$ complex | $2N$ | 1 sample/cycle | ### 8.3 PID Controller #### Continuous-Time PID $$ u(t) = K_p \cdot e(t) + K_i \int_0^t e(\tau) d\tau + K_d \frac{de(t)}{dt} $$ #### Discrete-Time PID (Tustin Transform) $$ u[n] = u[n-1] + a_0 \cdot e[n] + a_1 \cdot e[n-1] + a_2 \cdot e[n-2] $$ Where: $$ \begin{aligned} a_0 &= K_p + K_i \frac{T_s}{2} + K_d \frac{2}{T_s} \\ a_1 &= -K_p + K_i \frac{T_s}{2} - K_d \frac{4}{T_s} \\ a_2 &= K_d \frac{2}{T_s} \end{aligned} $$ #### FPGA Implementation Considerations - **Coefficient quantization:** affects stability - **Accumulator overflow:** requires saturation logic - **Anti-windup:** needed for integrator term ## Common Mathematical Constants for FPGA | Constant | Value | Q16.16 Hex | |----------|-------|------------| | $\pi$ | 3.14159265 | `0x0003_243F` | | $e$ | 2.71828183 | `0x0002_B7E1` | | $\sqrt{2}$ | 1.41421356 | `0x0001_6A09` | | $\ln(2)$ | 0.69314718 | `0x0000_B172` | | $1/\pi$ | 0.31830989 | `0x0000_517C` | | CORDIC $K$ | 0.60725293 | `0x0000_9B74` | ## FPGA Vendor DSP Specifications | Vendor | Family | DSP Name | Multiplier Size | Pre-adder | |--------|--------|----------|-----------------|-----------| | AMD/Xilinx | 7 Series | DSP48E1 | 25×18 | Yes | | AMD/Xilinx | UltraScale | DSP48E2 | 27×18 | Yes | | Intel | Stratix 10 | DSP Block | 18×19 or 27×27 | Yes | | Lattice | ECP5 | MULT18X18D | 18×18 | No |

fpga alternative,chip design hobby,logisim,ngspice

# FPGA Mathematical Modeling A comprehensive guide to mathematical techniques, representations, and methodologies for implementing algorithms in Field-Programmable Gate Arrays. ## 1. Number Representation Models ### 1.1 Fixed-Point Representation Fixed-point representation maintains the decimal point within a fixed position, enabling straightforward arithmetic operations. #### Notation Format - **Q-format:** $Q(m, n)$ or $(x, y)$ - $m$ or $x$ = number of integer bits - $n$ or $y$ = number of fractional bits - Total bits = $m + n + 1$ (including sign bit for signed) #### Mathematical Formulations **Integer bits calculation:** $$ \text{Integer Bits} = \lceil \log_2(\text{max\_value}) \rceil $$ **Value representation:** $$ \text{Value} = \sum_{i=-n}^{m-1} b_i \cdot 2^i $$ Where $b_i \in \{0, 1\}$ for unsigned, or using two's complement for signed. **Resolution (LSB value):** $$ \text{Resolution} = 2^{-n} $$ **Range for signed Q(m,n):** $$ \text{Range} = \left[ -2^{m-1}, \, 2^{m-1} - 2^{-n} \right] $$ #### Fixed-Point Arithmetic Rules - **Addition/Subtraction:** - Decimal points must be aligned - Result: $\max(m_1, m_2) + 1$ integer bits, $\max(n_1, n_2)$ fractional bits $$ Q(m_1, n_1) \pm Q(m_2, n_2) \rightarrow Q(\max(m_1, m_2) + 1, \max(n_1, n_2)) $$ - **Multiplication:** $$ Q(m_1, n_1) \times Q(m_2, n_2) \rightarrow Q(m_1 + m_2, n_1 + n_2) $$ - **Scaling by powers of 2:** - Left shift by $k$: multiply by $2^k$ - Right shift by $k$: divide by $2^k$ #### Example Calculation For a 16-bit signed Q8.8 format: - Integer range: $[-128, 127]$ - Fractional resolution: $2^{-8} = 0.00390625$ - Example: Binary `0000_0001.1000_0000` = $1.5_{10}$ ### 1.2 Floating-Point Representation #### IEEE 754 Standard Formats | Format | Total Bits | Sign | Exponent | Mantissa | |--------|-----------|------|----------|----------| | Half | 16 | 1 | 5 | 10 | | Single | 32 | 1 | 8 | 23 | | Double | 64 | 1 | 11 | 52 | #### Mathematical Representation $$ \text{Value} = (-1)^{s} \times (1 + M) \times 2^{E - \text{bias}} $$ Where: - $s$ = sign bit - $M$ = mantissa (fractional part) - $E$ = exponent - $\text{bias} = 2^{k-1} - 1$ (where $k$ = exponent bits) #### Bias Values - Half precision: $\text{bias} = 15$ - Single precision: $\text{bias} = 127$ - Double precision: $\text{bias} = 1023$ #### Resource Comparison | Operation | Fixed-Point LUTs | Floating-Point LUTs | Ratio | |-----------|-----------------|---------------------|-------| | Add (32-bit) | ~32 | ~400 | ~12.5× | | Multiply (32-bit) | ~64 (DSP) | ~600 | ~9× | | Divide (32-bit) | ~200 | ~1200 | ~6× | ## 2. Mathematical Function Implementation ### 2.1 CORDIC Algorithm **CORDIC** (COordinate Rotation DIgital Computer) computes trigonometric, hyperbolic, and other functions using only shift-add operations. #### Core Rotation Matrix $$ \begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} $$ #### CORDIC Simplification By restricting $\tan(\theta_i) = 2^{-i}$, the rotation becomes: $$ \begin{bmatrix} x' \\ y' \end{bmatrix} = \cos\theta_i \begin{bmatrix} 1 & -\sigma_i 2^{-i} \\ \sigma_i 2^{-i} & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} $$ Where $\sigma_i \in \{-1, +1\}$ is the rotation direction. #### Iterative Equations $$ \begin{aligned} x_{i+1} &= x_i - \sigma_i \cdot y_i \cdot 2^{-i} \\ y_{i+1} &= y_i + \sigma_i \cdot x_i \cdot 2^{-i} \\ z_{i+1} &= z_i - \sigma_i \cdot \arctan(2^{-i}) \end{aligned} $$ #### CORDIC Gain (Scale Factor) $$ K_n = \prod_{i=0}^{n-1} \cos(\arctan(2^{-i})) = \prod_{i=0}^{n-1} \frac{1}{\sqrt{1 + 2^{-2i}}} $$ For $n \rightarrow \infty$: $$ K_\infty \approx 0.6072529350088812561694 $$ #### CORDIC Modes | Mode | Condition | Computes | |------|-----------|----------| | Rotation | $\sigma_i = \text{sign}(z_i)$ | $x_n = K(x_0\cos z_0 - y_0\sin z_0)$ | | Vectoring | $\sigma_i = -\text{sign}(y_i)$ | $z_n = z_0 + \arctan(y_0/x_0)$ | #### CORDIC Functions Table | Function | Mode | Initial Values | Result | |----------|------|----------------|--------| | $\sin(\theta)$ | Rotation | $x_0=1/K$, $y_0=0$, $z_0=\theta$ | $y_n$ | | $\cos(\theta)$ | Rotation | $x_0=1/K$, $y_0=0$, $z_0=\theta$ | $x_n$ | | $\arctan(a)$ | Vectoring | $x_0=1$, $y_0=a$, $z_0=0$ | $z_n$ | | $\sqrt{x^2+y^2}$ | Vectoring | $x_0=x$, $y_0=y$, $z_0=0$ | $K \cdot x_n$ | #### CORDIC VHDL Pseudo-Structure ```vhdl -- CORDIC iteration stage process(clk) begin if rising_edge(clk) then if z(i) >= 0 then -- σ = +1 x(i+1) <= x(i) - shift_right(y(i), i); y(i+1) <= y(i) + shift_right(x(i), i); z(i+1) <= z(i) - atan_table(i); else -- σ = -1 x(i+1) <= x(i) + shift_right(y(i), i); y(i+1) <= y(i) - shift_right(x(i), i); z(i+1) <= z(i) + atan_table(i); end if; end if; end process; ``` ### 2.2 Polynomial Approximation #### Taylor Series Expansion $$ f(x) = \sum_{n=0}^{N} \frac{f^{(n)}(a)}{n!}(x-a)^n + R_N(x) $$ Where $R_N(x)$ is the remainder term. #### Common Taylor Expansions **Sine function:** $$ \sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \cdots = \sum_{n=0}^{\infty} \frac{(-1)^n x^{2n+1}}{(2n+1)!} $$ **Cosine function:** $$ \cos(x) = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \frac{x^6}{6!} + \cdots = \sum_{n=0}^{\infty} \frac{(-1)^n x^{2n}}{(2n)!} $$ **Exponential function:** $$ e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots = \sum_{n=0}^{\infty} \frac{x^n}{n!} $$ **Natural logarithm (for $|x| < 1$):** $$ \ln(1+x) = x - \frac{x^2}{2} + \frac{x^3}{3} - \frac{x^4}{4} + \cdots = \sum_{n=1}^{\infty} \frac{(-1)^{n+1} x^n}{n} $$ #### Horner's Method for Evaluation For polynomial $P(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n$: $$ P(x) = a_0 + x(a_1 + x(a_2 + x(a_3 + \cdots + x(a_{n-1} + x \cdot a_n)\cdots))) $$ **Advantages:** - Reduces multiplications from $\frac{n(n+1)}{2}$ to $n$ - Reduces additions from $n$ to $n$ - Enables pipelining #### Minimax Polynomial Approximation Minimizes the maximum error over an interval: $$ \min_{p \in P_n} \max_{x \in [a,b]} |f(x) - p(x)| $$ Where $P_n$ is the set of polynomials of degree $\leq n$. **Chebyshev approximation** often provides near-minimax results: $$ T_n(x) = \cos(n \cdot \arccos(x)) $$ ### 2.3 Lookup Table (LUT) Methods #### Direct Table Lookup $$ f(x) \approx \text{LUT}[\lfloor x \cdot 2^n \rfloor] $$ **Memory requirement:** $$ \text{Memory (bits)} = 2^{\text{input\_bits}} \times \text{output\_bits} $$ #### Linear Interpolation $$ f(x) \approx f(x_0) + \frac{f(x_1) - f(x_0)}{x_1 - x_0} \cdot (x - x_0) $$ Simplified for uniform spacing ($\Delta x = x_1 - x_0$): $$ f(x) \approx f(x_0) + \frac{\Delta f}{\Delta x} \cdot (x - x_0) $$ #### Bipartite Table Method Decomposes the function into two smaller tables: $$ f(x) \approx T_1(x_{\text{MSB}}) + T_2(x_{\text{MSB}}, x_{\text{LSB}}) $$ **Memory savings:** $$ \text{Bipartite Memory} \approx 2 \cdot 2^{n/2} \ll 2^n \text{ (Direct)} $$ #### Piecewise Polynomial with LUT $$ f(x) = \sum_{k=0}^{d} c_k[i] \cdot (x - x_i)^k \quad \text{for } x \in [x_i, x_{i+1}) $$ Where coefficients $c_k[i]$ are stored in lookup tables. ## 3. System-Level Mathematical Modeling ### 3.1 Boolean Algebra Model #### Basic Operations | Operation | Symbol | VHDL | Verilog | |-----------|--------|------|---------| | AND | $A \cdot B$ | `A and B` | `A & B` | | OR | $A + B$ | `A or B` | `A ∣ B` | | NOT | $\overline{A}$ | `not A` | `~A` | | XOR | $A \oplus B$ | `A xor B` | `A ^ B` | | NAND | $\overline{A \cdot B}$ | `A nand B` | `~(A & B)` | | NOR | $\overline{A + B}$ | `A nor B` | `~(A ∣ B)` | #### Boolean Algebra Laws **Identity:** $$ A + 0 = A \quad \text{and} \quad A \cdot 1 = A $$ **Complement:** $$ A + \overline{A} = 1 \quad \text{and} \quad A \cdot \overline{A} = 0 $$ **De Morgan's Theorems:** $$ \overline{A + B} = \overline{A} \cdot \overline{B} \quad \text{and} \quad \overline{A \cdot B} = \overline{A} + \overline{B} $$ **Absorption:** $$ A + A \cdot B = A \quad \text{and} \quad A \cdot (A + B) = A $$ #### Sum of Products (SOP) Form $$ F = \sum_{i} m_i \quad \text{where } m_i \text{ are minterms} $$ **Example:** $$ F(A, B, C) = \overline{A} \cdot B \cdot C + A \cdot \overline{B} \cdot C + A \cdot B \cdot \overline{C} $$ ### 3.2 Finite State Machine (FSM) Models #### Moore Machine $$ \begin{aligned} \text{Next State:} \quad S_{n+1} &= \delta(S_n, I_n) \\ \text{Output:} \quad O_n &= \lambda(S_n) \end{aligned} $$ #### Mealy Machine $$ \begin{aligned} \text{Next State:} \quad S_{n+1} &= \delta(S_n, I_n) \\ \text{Output:} \quad O_n &= \lambda(S_n, I_n) \end{aligned} $$ #### State Encoding | Encoding | States | Flip-Flops | Transitions | |----------|--------|------------|-------------| | Binary | $N$ | $\lceil \log_2 N \rceil$ | Complex | | One-Hot | $N$ | $N$ | Simple | | Gray | $N$ | $\lceil \log_2 N \rceil$ | Low power | ### 3.3 Dataflow Graph Model #### Directed Acyclic Graph (DAG) Representation - **Nodes ($V$):** Operations (add, multiply, etc.) - **Edges ($E$):** Data dependencies - **Graph:** $G = (V, E)$ #### Critical Path Length $$ T_{\text{critical}} = \max_{\text{paths } p} \sum_{v \in p} t_v $$ Where $t_v$ is the delay of operation $v$. #### Parallelism Metrics **Available parallelism:** $$ P = \frac{\sum_{v \in V} t_v}{T_{\text{critical}}} $$ **Resource-constrained schedule length:** $$ T_{\text{schedule}} \geq \max\left( T_{\text{critical}}, \left\lceil \frac{\sum_{v \in V} t_v}{R} \right\rceil \right) $$ Where $R$ is the number of available resources. ### 3.4 State-Space Models #### Continuous-Time State-Space $$ \begin{aligned} \dot{\mathbf{x}}(t) &= \mathbf{A}\mathbf{x}(t) + \mathbf{B}\mathbf{u}(t) \\ \mathbf{y}(t) &= \mathbf{C}\mathbf{x}(t) + \mathbf{D}\mathbf{u}(t) \end{aligned} $$ #### Discrete-Time State-Space (for FPGA) $$ \begin{aligned} \mathbf{x}[k+1] &= \mathbf{A}_d\mathbf{x}[k] + \mathbf{B}_d\mathbf{u}[k] \\ \mathbf{y}[k] &= \mathbf{C}_d\mathbf{x}[k] + \mathbf{D}_d\mathbf{u}[k] \end{aligned} $$ #### Discretization (Zero-Order Hold) $$ \mathbf{A}_d = e^{\mathbf{A}T_s} \quad \text{and} \quad \mathbf{B}_d = \mathbf{A}^{-1}(e^{\mathbf{A}T_s} - \mathbf{I})\mathbf{B} $$ Where $T_s$ is the sampling period. ## 4. Numerical Methods for FPGA ### 4.1 ODE Solvers #### Forward Euler Method $$ y_{n+1} = y_n + h \cdot f(t_n, y_n) $$ **Local truncation error:** $O(h^2)$ **Global error:** $O(h)$ #### Backward Euler Method $$ y_{n+1} = y_n + h \cdot f(t_{n+1}, y_{n+1}) $$ Requires solving implicit equation (more stable but complex). #### Trapezoidal Method (Crank-Nicolson) $$ y_{n+1} = y_n + \frac{h}{2}\left[f(t_n, y_n) + f(t_{n+1}, y_{n+1})\right] $$ **Local truncation error:** $O(h^3)$ #### 4th Order Runge-Kutta (RK4) $$ y_{n+1} = y_n + \frac{h}{6}(k_1 + 2k_2 + 2k_3 + k_4) $$ Where: $$ \begin{aligned} k_1 &= f(t_n, y_n) \\ k_2 &= f\left(t_n + \frac{h}{2}, y_n + \frac{h}{2}k_1\right) \\ k_3 &= f\left(t_n + \frac{h}{2}, y_n + \frac{h}{2}k_2\right) \\ k_4 &= f(t_n + h, y_n + h \cdot k_3) \end{aligned} $$ **Local truncation error:** $O(h^5)$ **Global error:** $O(h^4)$ #### Method Comparison for FPGA | Method | Accuracy | Stability | FPGA Resources | Latency | |--------|----------|-----------|----------------|---------| | Forward Euler | $O(h)$ | Conditional | Low | 1 mult | | Backward Euler | $O(h)$ | Unconditional | Medium | Iterative | | Trapezoidal | $O(h^2)$ | A-stable | Medium | Iterative | | RK4 | $O(h^4)$ | Conditional | High | 4 stages | ### 4.2 Linear Algebra Operations #### Matrix-Vector Multiplication $$ \mathbf{y} = \mathbf{A}\mathbf{x} \quad \Rightarrow \quad y_i = \sum_{j=1}^{n} a_{ij} x_j $$ **FPGA Implementation:** - Fully parallel: $n^2$ multipliers, $O(n)$ adder tree - Systolic array: $n$ multipliers, $n$ cycles #### Matrix-Matrix Multiplication $$ \mathbf{C} = \mathbf{A}\mathbf{B} \quad \Rightarrow \quad c_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj} $$ **Computational complexity:** $O(n^3)$ multiplications #### LU Decomposition $$ \mathbf{A} = \mathbf{L}\mathbf{U} $$ **FPGA considerations:** - Division required for pivot elements - Sequential dependencies limit parallelism ## 5. Timing and Resource Models ### 5.1 Static Timing Analysis (STA) #### Setup Time Constraint $$ T_{\text{clk-to-q}} + T_{\text{logic}} + T_{\text{routing}} + T_{\text{setup}} \leq T_{\text{clk}} $$ Rearranged: $$ T_{\text{logic}} + T_{\text{routing}} \leq T_{\text{clk}} - T_{\text{clk-to-q}} - T_{\text{setup}} $$ #### Hold Time Constraint $$ T_{\text{clk-to-q}} + T_{\text{logic}} + T_{\text{routing}} \geq T_{\text{hold}} $$ #### Slack Calculations **Setup slack:** $$ \text{Slack}_{\text{setup}} = T_{\text{clk}} - T_{\text{clk-to-q}} - T_{\text{setup}} - T_{\text{data\_path}} $$ **Hold slack:** $$ \text{Slack}_{\text{hold}} = T_{\text{data\_path}} + T_{\text{clk-to-q}} - T_{\text{hold}} $$ **Condition for valid timing:** $$ \text{Slack} \geq 0 $$ #### Maximum Operating Frequency $$ f_{\text{max}} = \frac{1}{T_{\text{clk-to-q}} + T_{\text{logic}} + T_{\text{routing}} + T_{\text{setup}}} $$ ### 5.2 Resource Utilization Models #### LUT Utilization For $n$-input boolean function: $$ \text{LUTs required} = \left\lceil \frac{n}{k} \right\rceil^{\text{levels}} $$ Where $k$ is the LUT input count (typically 4-6). #### DSP Slice Utilization For multiplication $A \times B$: $$ \text{DSP slices} = \left\lceil \frac{\text{bitwidth}_A}{18} \right\rceil \times \left\lceil \frac{\text{bitwidth}_B}{18} \right\rceil $$ (For typical 18×18 DSP multipliers) #### Block RAM Utilization $$ \text{BRAMs} = \left\lceil \frac{\text{depth}}{2^{k}} \right\rceil \times \left\lceil \frac{\text{width}}{w} \right\rceil $$ Where $k$ and $w$ are BRAM address and data widths. #### Register Utilization $$ \text{Registers} = \sum_{\text{signals}} \text{bitwidth} \times \text{pipeline\_stages} $$ ### 5.3 Power Models #### Dynamic Power $$ P_{\text{dynamic}} = \alpha \cdot C_L \cdot V_{DD}^2 \cdot f $$ Where: - $\alpha$ = switching activity factor - $C_L$ = load capacitance - $V_{DD}$ = supply voltage - $f$ = clock frequency #### Static Power $$ P_{\text{static}} = I_{\text{leakage}} \cdot V_{DD} $$ #### Total Power $$ P_{\text{total}} = P_{\text{dynamic}} + P_{\text{static}} $$ ## 6. High-Level Synthesis Models ### 6.1 Quantization Models #### Fixed-Point Quantization $$ x_q = \text{round}\left(\frac{x}{2^{-n}}\right) \cdot 2^{-n} $$ Where $n$ is the number of fractional bits. #### Quantization Error $$ \epsilon_q = x - x_q $$ **For rounding:** $$ -\frac{2^{-n}}{2} \leq \epsilon_q < \frac{2^{-n}}{2} $$ **Quantization noise power (uniform distribution):** $$ \sigma_q^2 = \frac{(2^{-n})^2}{12} = \frac{2^{-2n}}{12} $$ #### Signal-to-Quantization-Noise Ratio (SQNR) $$ \text{SQNR (dB)} = 10 \log_{10}\left(\frac{\sigma_x^2}{\sigma_q^2}\right) \approx 6.02n + 1.76 \text{ dB} $$ For full-scale sinusoidal signal with $n$ fractional bits. ### 6.2 Scheduling Models #### ASAP (As Soon As Possible) Scheduling $$ t_{\text{ASAP}}(v) = \max_{u \in \text{pred}(v)} \left( t_{\text{ASAP}}(u) + d_u \right) $$ Where $d_u$ is the delay of operation $u$. #### ALAP (As Late As Possible) Scheduling $$ t_{\text{ALAP}}(v) = \min_{w \in \text{succ}(v)} \left( t_{\text{ALAP}}(w) - d_v \right) $$ #### Mobility (Slack) $$ \text{Mobility}(v) = t_{\text{ALAP}}(v) - t_{\text{ASAP}}(v) $$ Operations with zero mobility are on the critical path. ### 6.3 Resource Binding Models #### Resource Sharing Constraint $$ \sum_{v: t(v) = t} r(v, k) \leq R_k \quad \forall t, k $$ Where: - $r(v, k)$ = 1 if operation $v$ uses resource type $k$ - $R_k$ = number of available resources of type $k$ #### Conflict Graph Two operations conflict if they: 1. Execute in the same time step 2. Require the same resource type **Minimum resources = Chromatic number of conflict graph** ## 7. Optimization Models ### 7.1 Pipeline Optimization #### Pipeline Stages $$ \text{Stages} = \left\lceil \frac{T_{\text{combinational}}}{T_{\text{target\_clock}}} \right\rceil $$ #### Throughput $$ \text{Throughput} = \frac{f_{\text{clk}}}{\text{II}} $$ Where II = Initiation Interval (cycles between new inputs). #### Latency $$ \text{Latency} = \text{Stages} \times T_{\text{clk}} $$ #### Pipeline Efficiency $$ \eta = \frac{\text{Ideal Throughput}}{\text{Actual Throughput}} = \frac{1}{\text{II}} $$ ### 7.2 Loop Optimization #### Loop Unrolling For unroll factor $U$: $$ \text{New iterations} = \left\lceil \frac{N}{U} \right\rceil $$ **Resource increase:** ~$U \times$ original **Throughput increase:** up to $U \times$ #### Loop Pipelining $$ \text{II}_{\text{achieved}} = \max(\text{II}_{\text{resource}}, \text{II}_{\text{recurrence}}, \text{II}_{\text{target}}) $$ Where: - $\text{II}_{\text{resource}}$ = resource-constrained II - $\text{II}_{\text{recurrence}}$ = loop-carried dependency II #### Loop Tiling For tile size $T$: $$ \text{for } (ii = 0; ii < N; ii += T) \\ \quad \text{for } (i = ii; i < \min(ii + T, N); i++) $$ **Memory accesses reduced by factor of** $T$ for reused data. ### 7.3 Area-Time Trade-offs #### Area-Time Product $$ \text{AT} = \text{Area} \times \text{Time} $$ #### Pareto Optimality A design $D_1$ dominates $D_2$ if: $$ \text{Area}(D_1) \leq \text{Area}(D_2) \quad \text{AND} \quad \text{Time}(D_1) \leq \text{Time}(D_2) $$ with at least one strict inequality. #### Cost Function $$ \text{Cost} = \alpha \cdot \text{Area} + \beta \cdot \text{Latency} + \gamma \cdot \text{Power} $$ Where $\alpha + \beta + \gamma = 1$ are weighting factors. ## 8. Practical Examples ### 8.1 FIR Filter Implementation #### Mathematical Definition $$ y[n] = \sum_{k=0}^{N-1} h[k] \cdot x[n-k] $$ #### Direct Form Implementation **Resources:** - Multipliers: $N$ - Adders: $N-1$ - Registers: $N-1$ (delay line) **Accumulator sizing:** $$ \text{Acc bits} = \text{input bits} + \text{coef bits} + \lceil \log_2(N) \rceil $$ #### Transpose Form (Better for FPGA) **Advantages:** - All multipliers fed by same input - Shorter critical path - Better for high-speed designs ### 8.2 FFT Implementation #### Radix-2 DIT FFT $$ X[k] = \sum_{n=0}^{N-1} x[n] \cdot W_N^{nk} $$ Where $W_N = e^{-j2\pi/N}$ (twiddle factor). #### Butterfly Operation $$ \begin{aligned} X[k] &= A + W_N^k \cdot B \\ X[k + N/2] &= A - W_N^k \cdot B \end{aligned} $$ #### Resource Requirements | Architecture | Multipliers | Memory | Throughput | |--------------|-------------|--------|------------| | Radix-2 Serial | 1 complex | $N$ | $N \log_2 N$ cycles | | Radix-2 Parallel | $N/2$ complex | $N$ | $\log_2 N$ cycles | | Pipelined | $\log_2 N$ complex | $2N$ | 1 sample/cycle | ### 8.3 PID Controller #### Continuous-Time PID $$ u(t) = K_p \cdot e(t) + K_i \int_0^t e(\tau) d\tau + K_d \frac{de(t)}{dt} $$ #### Discrete-Time PID (Tustin Transform) $$ u[n] = u[n-1] + a_0 \cdot e[n] + a_1 \cdot e[n-1] + a_2 \cdot e[n-2] $$ Where: $$ \begin{aligned} a_0 &= K_p + K_i \frac{T_s}{2} + K_d \frac{2}{T_s} \\ a_1 &= -K_p + K_i \frac{T_s}{2} - K_d \frac{4}{T_s} \\ a_2 &= K_d \frac{2}{T_s} \end{aligned} $$ #### FPGA Implementation Considerations - **Coefficient quantization:** affects stability - **Accumulator overflow:** requires saturation logic - **Anti-windup:** needed for integrator term ## Common Mathematical Constants for FPGA | Constant | Value | Q16.16 Hex | |----------|-------|------------| | $\pi$ | 3.14159265 | `0x0003_243F` | | $e$ | 2.71828183 | `0x0002_B7E1` | | $\sqrt{2}$ | 1.41421356 | `0x0001_6A09` | | $\ln(2)$ | 0.69314718 | `0x0000_B172` | | $1/\pi$ | 0.31830989 | `0x0000_517C` | | CORDIC $K$ | 0.60725293 | `0x0000_9B74` | ## FPGA Vendor DSP Specifications | Vendor | Family | DSP Name | Multiplier Size | Pre-adder | |--------|--------|----------|-----------------|-----------| | AMD/Xilinx | 7 Series | DSP48E1 | 25×18 | Yes | | AMD/Xilinx | UltraScale | DSP48E2 | 27×18 | Yes | | Intel | Stratix 10 | DSP Block | 18×19 or 27×27 | Yes | | Lattice | ECP5 | MULT18X18D | 18×18 | No |

fpga for ai,hardware

Field-programmable gate arrays customized for ML.

fpga,field programmable

FPGA = Field-Programmable Gate Array. Reconfigurable hardware. Good for prototyping and low-volume. Less efficient than ASIC.

fractal dimension of surfaces, metrology

Characterize complexity of rough surfaces.

fractional factorial,doe

Test subset of combinations to save time.

fractured data, lithography

Design broken into write fields.

fragmentvc, audio & speech

Fragment-based voice conversion decomposes speech into phonetic posteriorgrams and prosody for better control.

frame interpolation, multimodal ai

Frame interpolation generates intermediate frames between existing frames.

frame interpolation, video generation

Generate intermediate frames.

frame interpolation,computer vision

Generate in-between frames to increase video frame rate or create slow motion.

frame order prediction, video understanding

Determine correct temporal order.

frand licensing, frand, legal

SEP licensing terms.

free adversarial training, ai safety

Train robustness with negligible overhead.

free cooling, environmental & sustainability

Free cooling uses ambient conditions when cold enough eliminating mechanical chiller operation.

free energy calculations, healthcare ai

Compute binding free energy.

freedom to operate, fto, legal

Right to use technology.

freematch, semi-supervised learning

Threshold-free pseudo-labeling.

freeze drying, process

Dry by sublimation.

freeze-out, device physics

Carriers freeze onto dopants at low T.

frenkel pair, defects

Vacancy-interstitial pair.

frequency penalty, llm optimization

Frequency penalty decreases probability proportional to token occurrence count.

frequency penalty, text generation

Penalize based on token frequency.

freshness in rag, rag

How current is retrieved information.

friedman test, quality & reliability

Friedman test analyzes repeated measures across conditions non-parametrically.

frontend,ui,ux,react,web

For front-end questions I can sketch component structure, state handling, and UX flows, especially around AI features.

frontier model, llm architecture

Frontier models represent state-of-the-art capabilities at largest scale.

frontier model,advanced model,frontier capability

Frontier models are most capable systems. Special safety considerations. Regulatory focus.

frozen features, transfer learning

Pre-trained representations kept fixed.

frozen graph, model optimization

Frozen graphs convert variables to constants enabling optimization and preventing modification.

fsdp,fully sharded,pytorch

FSDP = Fully Sharded Data Parallel. PyTorch native ZeRO-like sharding. Reduces memory per GPU.

fudge,text generation

Control generation using future discriminators.

full array bga, packaging

Balls across entire bottom.