← Back to AI Factory Chat

AI Factory Glossary

74 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 1 of 2 (74 entries)

h3 (hungry hungry hippos),h3,hungry hungry hippos,llm architecture

Hybrid SSM+attention architecture.

halide, model optimization

Halide separates algorithm from schedule enabling portable high-performance image processing.

hallucination detection, ai safety

Identify false statements.

hallucination in llms, challenges

Generating false information.

halstead metrics, code ai

Software complexity measures.

hamiltonian neural networks, scientific ml

Learn energy-conserving dynamics.

han, han, graph neural networks

Heterogeneous Graph Attention Network uses hierarchical attention at node-level and semantic-level to learn from multi-relational graph structures.

hard example mining, advanced training

Hard example mining focuses training on samples with high loss or misclassification to improve model performance on difficult cases.

hard example mining, machine learning

Focus on difficult examples.

hard routing, llm architecture

Hard routing assigns tokens exclusively to selected experts.

hardware-aware design, model optimization

Hardware-aware design optimizes architectures for specific deployment platforms considering latency memory and energy.

hardware-aware nas, neural architecture

Search considering hardware constraints.

hardware-aware nas, neural architecture search

Hardware-aware neural architecture search optimizes architectures jointly for accuracy and hardware metrics like latency energy or memory footprint.

hardware-software co-design, edge ai

Jointly optimize hardware and algorithms.

harmful content, ai safety

Harmful content includes text promoting violence illegal activity or other dangers.

hash routing, llm architecture

Hash routing deterministically assigns tokens based on hashing.

hat, hat, multimodal ai

Hybrid Attention Transformer improves super-resolution through channel and spatial attention.

hat, hat, neural architecture search

Hardware-Aware Transformers optimize transformer architectures jointly for accuracy and hardware-specific latency constraints.

hate speech detection,ai safety

Identify hateful or discriminatory content.

hawkes self-excitation, time series models

Self-excitation in Hawkes processes models how past events increase likelihood of future events.

hazardous waste, environmental & sustainability

Hazardous waste from semiconductor manufacturing requires specialized handling storage and disposal.

heat recovery, environmental & sustainability

Heat recovery systems capture waste heat from process tools and HVAC for space heating or power generation improving energy efficiency.

heat wheel, environmental & sustainability

Heat wheels transfer thermal energy between exhaust and supply air streams through rotating matrix.

heel crack, failure analysis

Crack at bond heel.

hepa filter (high-efficiency particulate air),hepa filter,high-efficiency particulate air,facility

Filter that removes 99.97% of particles 0.3 microns and larger.

heterogeneous graph neural networks,graph neural networks

GNNs for graphs with different node/edge types.

heterogeneous graph, graph neural networks

Heterogeneous graphs contain multiple node types and edge types requiring specialized message passing for different relation semantics.

heterogeneous skip-gram, graph neural networks

Heterogeneous skip-gram predicts context nodes of different types given target nodes.

hetsann, graph neural networks

Heterogeneous Self-Attention Neural Network adaptively learns importance of different metapaths and neighbors.

heun method sampling, generative models

Second-order ODE solver.

hgt, heterogeneous graph transformer, graph neural networks, gnn, heterogeneous graphs, transformer, attention mechanism

# Heterogeneous Graph Transformer (HGT) ## HGT Graph Neural Networks **HGT (Heterogeneous Graph Transformer)** is a graph neural network architecture designed specifically for **heterogeneous graphs** — graphs where nodes and edges can have different types. It was introduced by Hu et al. in 2020. ## 1. Problem Setting ### 1.1 Heterogeneous Graph Definition A heterogeneous graph is defined as: $$ G = (V, E, \tau, \phi) $$ Where: - $V$ — Set of nodes - $E$ — Set of edges - $\tau: V \rightarrow \mathcal{T}$ — Node type mapping function - $\phi: E \rightarrow \mathcal{R}$ — Edge type mapping function - $\mathcal{T}$ — Set of node types - $\mathcal{R}$ — Set of edge/relation types ### 1.2 Real-World Examples - **Academic Networks**: - Node types: `Paper`, `Author`, `Venue`, `Institution` - Edge types: `writes`, `cites`, `published_in`, `affiliated_with` - **E-commerce Graphs**: - Node types: `User`, `Product`, `Brand`, `Category` - Edge types: `purchases`, `reviews`, `belongs_to`, `manufactures` - **Knowledge Graphs**: - Node types: `Person`, `Organization`, `Location`, `Event` - Edge types: `works_at`, `located_in`, `participated_in` ## 2. HGT Architecture ### 2.1 Core Components The HGT layer consists of three main operations: 1. **Heterogeneous Mutual Attention** 2. **Heterogeneous Message Passing** 3. **Target-Specific Aggregation** ### 2.2 Type-Dependent Linear Projections For each node type $\tau \in \mathcal{T}$, HGT defines separate projection matrices: $$ Q_{\tau}^{(i)} \in \mathbb{R}^{d \times \frac{d}{h}}, \quad K_{\tau}^{(i)} \in \mathbb{R}^{d \times \frac{d}{h}}, \quad V_{\tau}^{(i)} \in \mathbb{R}^{d \times \frac{d}{h}} $$ Where: - $d$ — Hidden dimension - $h$ — Number of attention heads - $i$ — Attention head index $(i = 1, 2, \ldots, h)$ ## 3. Mathematical Formulation ### 3.1 Attention Mechanism For a source node $s$ and target node $t$ connected by edge $e$: #### Step 1: Compute Query and Key $$ \text{Query}^{(i)}(t) = Q_{\tau(t)}^{(i)} \cdot H^{(l-1)}[t] $$ $$ \text{Key}^{(i)}(s) = K_{\tau(s)}^{(i)} \cdot H^{(l-1)}[s] $$ #### Step 2: Compute Attention Score $$ \text{ATT-head}^{(i)}(s, e, t) = \left( \text{Key}^{(i)}(s) \cdot W_{\phi(e)}^{\text{ATT}} \cdot \text{Query}^{(i)}(t)^T \right) \cdot \frac{\mu_{\langle \tau(s), \phi(e), \tau(t) \rangle}}{\sqrt{d}} $$ Where: - $W_{\phi(e)}^{\text{ATT}} \in \mathbb{R}^{\frac{d}{h} \times \frac{d}{h}}$ — Edge-type-specific attention matrix - $\mu_{\langle \tau(s), \phi(e), \tau(t) \rangle}$ — Prior importance of meta-relation (learnable scalar) #### Step 3: Softmax Normalization $$ \text{Attention}(s, e, t) = \text{softmax}_{s \in \mathcal{N}(t)} \left( \text{ATT-head}^{(i)}(s, e, t) \right) $$ ### 3.2 Message Computation $$ \text{Message}^{(i)}(s, e, t) = V_{\tau(s)}^{(i)} \cdot H^{(l-1)}[s] \cdot W_{\phi(e)}^{\text{MSG}} $$ Where: - $W_{\phi(e)}^{\text{MSG}} \in \mathbb{R}^{\frac{d}{h} \times \frac{d}{h}}$ — Edge-type-specific message matrix ### 3.3 Multi-Head Aggregation $$ \tilde{H}^{(l)}[t] = \bigoplus_{i=1}^{h} \left( \sum_{s \in \mathcal{N}(t)} \text{Attention}^{(i)}(s, e, t) \cdot \text{Message}^{(i)}(s, e, t) \right) $$ Where $\bigoplus$ denotes concatenation across heads. ### 3.4 Final Output with Residual Connection $$ H^{(l)}[t] = \sigma \left( W_{\tau(t)}^{\text{OUT}} \cdot \tilde{H}^{(l)}[t] + H^{(l-1)}[t] \right) $$ Where: - $W_{\tau(t)}^{\text{OUT}} \in \mathbb{R}^{d \times d}$ — Target-type-specific output projection - $\sigma$ — Activation function (e.g., ReLU, GELU) ## 4. Relative Temporal Encoding (RTE) For temporal/dynamic graphs, HGT incorporates time information: $$ \text{RTE}(\Delta t) = \text{Linear}\left( \text{T2V}(\Delta t) \right) $$ Where $\Delta t = t_{\text{target}} - t_{\text{source}}$ is the time difference. ### Time2Vec Encoding $$ \text{T2V}(\Delta t)[i] = \begin{cases} \omega_i \cdot \Delta t + \varphi_i & \text{if } i = 0 \\ \sin(\omega_i \cdot \Delta t + \varphi_i) & \text{if } i > 0 \end{cases} $$ The temporal attention becomes: $$ \text{ATT-head}^{(i)}(s, e, t) = \left( \text{Key}^{(i)}(s) + \text{RTE}(\Delta t) \right) \cdot W_{\phi(e)}^{\text{ATT}} \cdot \text{Query}^{(i)}(t)^T $$ ## 5. Comparison | Method | Heterogeneity Handling | Metapaths Required | Parameter Efficiency | |--------|----------------------|-------------------|---------------------| | **GCN** | ❌ Homogeneous only | N/A | ✅ High | | **GAT** | ❌ Homogeneous only | N/A | ✅ High | | **R-GCN** | ✅ Yes | ❌ No | ❌ Low (separate weights per relation) | | **HAN** | ✅ Yes | ✅ Yes (manual design) | ⚠️ Medium | | **HGT** | ✅ Yes | ❌ No (automatic) | ✅ High (decomposition) | ## 6. Implementation ### 6.1 PyTorch Geometric Implementation ```python import torch import torch.nn as nn from torch_geometric.nn import HGTConv, Linear class HGT(nn.Module): def __init__(self, metadata, hidden_channels, out_channels, num_heads, num_layers): super().__init__() self.node_types = metadata[0] self.edge_types = metadata[1] # Linear projections for each node type self.lin_dict = nn.ModuleDict() for node_type in self.node_types: self.lin_dict[node_type] = Linear(-1, hidden_channels) # HGT convolutional layers self.convs = nn.ModuleList() for _ in range(num_layers): conv = HGTConv( in_channels=hidden_channels, out_channels=hidden_channels, metadata=metadata, heads=num_heads, group='sum' ) self.convs.append(conv) # Output projection self.out_lin = Linear(hidden_channels, out_channels) def forward(self, x_dict, edge_index_dict): # Initial projection x_dict = { node_type: self.lin_dict[node_type](x).relu() for node_type, x in x_dict.items() } # HGT layers for conv in self.convs: x_dict = conv(x_dict, edge_index_dict) return x_dict ``` ### 6.2 Usage Example ```python # Define metadata metadata = ( ['paper', 'author', 'venue'], # Node types [ ('author', 'writes', 'paper'), ('paper', 'cites', 'paper'), ('paper', 'published_in', 'venue'), ] # Edge types as (src, relation, dst) ) # Initialize model model = HGT( metadata=metadata, hidden_channels=64, out_channels=16, num_heads=4, num_layers=2 ) # Forward pass out_dict = model(x_dict, edge_index_dict) ``` ## 7. Training Objective ### 7.1 Node Classification $$ \mathcal{L}_{\text{node}} = -\sum_{v \in V_{\text{labeled}}} \sum_{c=1}^{C} y_{v,c} \log(\hat{y}_{v,c}) $$ Where: - $y_{v,c}$ — Ground truth label (one-hot) - $\hat{y}_{v,c} = \text{softmax}(H^{(L)}[v])_c$ — Predicted probability ### 7.2 Link Prediction $$ \mathcal{L}_{\text{link}} = -\sum_{(s,e,t) \in E} \log \sigma(H^{(L)}[s]^T \cdot W_{\phi(e)} \cdot H^{(L)}[t]) - \sum_{(s,e,t') \in E^{-}} \log \sigma(-H^{(L)}[s]^T \cdot W_{\phi(e)} \cdot H^{(L)}[t']) $$ Where: - $E^{-}$ — Negative edge samples - $\sigma$ — Sigmoid function ## 8. Complexity Analysis ### 8.1 Time Complexity $$ O\left( |E| \cdot d^2 / h + |V| \cdot d^2 \right) $$ Where: - $|E|$ — Number of edges - $|V|$ — Number of nodes - $d$ — Hidden dimension - $h$ — Number of heads ### 8.2 Space Complexity (Parameters) $$ O\left( |\mathcal{T}| \cdot d^2 + |\mathcal{R}| \cdot d^2 / h \right) $$ This is more efficient than R-GCN which requires $O(|\mathcal{R}| \cdot d^2)$. ## 9. Key Advantages - **No Manual Metapath Design**: Unlike HAN, HGT automatically learns the importance of different meta-relations - **Parameter Efficient**: Uses decomposition to avoid parameter explosion with many relation types - **Unified Framework**: Handles any heterogeneous graph schema - **Temporal Support**: Can incorporate relative time encoding for dynamic graphs - **Interpretable**: Attention weights reveal learned importance of different relations ## 10. Limitations - **Computational Overhead**: More complex than homogeneous GNNs - **Data Requirements**: Needs sufficient examples per node/edge type - **Memory Usage**: Multi-head attention increases memory consumption - **Hyperparameter Sensitivity**: Performance depends on number of heads, layers, hidden dimensions ## 12. Reference | Symbol | Description | |--------|-------------| | $G = (V, E, \tau, \phi)$ | Heterogeneous graph | | $\tau(v)$ | Type of node $v$ | | $\phi(e)$ | Type of edge $e$ | | $H^{(l)}[v]$ | Node $v$ representation at layer $l$ | | $\mathcal{N}(t)$ | Neighbors of target node $t$ | | $Q, K, V$ | Query, Key, Value projections | | $W^{\text{ATT}}, W^{\text{MSG}}$ | Attention and Message weight matrices | | $\mu$ | Learnable meta-relation prior |

hierarchical all-reduce, distributed training

Multi-level aggregation.

hierarchical attention, transformer

Multi-level attention structure.

hierarchical context, llm architecture

Multi-level context organization.

hierarchical fusion, multimodal ai

Multi-level fusion strategy.

hierarchical planning, ai agents

Hierarchical planning operates at multiple abstraction levels from high-level goals to low-level actions.

hierarchical pooling, graph neural networks

Hierarchical pooling creates multi-resolution graph representations through successive coarsening operations.

high availability (ha),high availability,ha,reliability

System remains operational despite failures.

high dimensional optimization, bayesian optimization, gaussian process, response surface, doe, design of experiments, pareto optimization, robust optimization, surrogate modeling, tcad, run to run control

# Semiconductor Manufacturing Process Recipe Optimization: Mathematical Modeling ## 1. Problem Context A semiconductor **recipe** is a vector of controllable parameters: $$ \mathbf{x} = \begin{bmatrix} T \\ P \\ Q_1 \\ Q_2 \\ \vdots \\ t \\ P_{\text{RF}} \end{bmatrix} \in \mathbb{R}^n $$ Where: - $T$ = Temperature (°C or K) - $P$ = Pressure (mTorr or Pa) - $Q_i$ = Gas flow rates (sccm) - $t$ = Process time (seconds) - $P_{\text{RF}}$ = RF power (Watts) **Goal**: Find optimal $\mathbf{x}$ such that output properties $\mathbf{y}$ meet specifications while accounting for variability. ## 2. Mathematical Modeling Approaches ### 2.1 Physics-Based (First-Principles) Models #### Chemical Vapor Deposition (CVD) Example **Mass transport and reaction equation:** $$ \frac{\partial C}{\partial t} + \nabla \cdot (\mathbf{u}C) = D\nabla^2 C + R(C, T) $$ Where: - $C$ = Species concentration - $\mathbf{u}$ = Velocity field - $D$ = Diffusion coefficient - $R(C, T)$ = Reaction rate **Surface reaction kinetics (Arrhenius form):** $$ k_s = A \exp\left(-\frac{E_a}{RT}\right) $$ Where: - $A$ = Pre-exponential factor - $E_a$ = Activation energy - $R$ = Gas constant - $T$ = Temperature **Deposition rate (transport-limited regime):** $$ r = \frac{k_s C_s}{1 + \frac{k_s}{h_g}} $$ Where: - $C_s$ = Surface concentration - $h_g$ = Gas-phase mass transfer coefficient **Characteristics:** - **Advantages**: Extrapolates outside training data, physically interpretable - **Disadvantages**: Computationally expensive, requires detailed mechanism knowledge ### 2.2 Empirical/Statistical Models (Response Surface Methodology) **Second-order polynomial model:** $$ y = \beta_0 + \sum_{i=1}^{n}\beta_i x_i + \sum_{i=1}^{n}\beta_{ii}x_i^2 + \sum_{i 50$ parameters) | PCA, PLS, sparse regression (LASSO), feature selection | | Small datasets (limited wafer runs) | Bayesian methods, transfer learning, multi-fidelity modeling | | Nonlinearity | GPs, neural networks, tree ensembles (RF, XGBoost) | | Equipment-to-equipment variation | Mixed-effects models, hierarchical Bayesian models | | Drift over time | Adaptive/recursive estimation, change-point detection, Kalman filtering | | Multiple correlated responses | Multi-task learning, co-kriging, multivariate GP | | Missing data | EM algorithm, multiple imputation, probabilistic PCA | ## 6. Dimensionality Reduction ### 6.1 Principal Component Analysis (PCA) **Objective:** $$ \max_{\mathbf{w}} \quad \mathbf{w}^T\mathbf{S}\mathbf{w} \quad \text{s.t.} \quad \|\mathbf{w}\|_2 = 1 $$ Where $\mathbf{S}$ is the sample covariance matrix. **Solution:** Eigenvectors of $\mathbf{S}$ $$ \mathbf{S} = \mathbf{W}\boldsymbol{\Lambda}\mathbf{W}^T $$ **Reduced representation:** $$ \mathbf{z} = \mathbf{W}_k^T(\mathbf{x} - \bar{\mathbf{x}}) $$ Where $\mathbf{W}_k$ contains the top $k$ eigenvectors. ### 6.2 Partial Least Squares (PLS) **Objective:** Maximize covariance between $\mathbf{X}$ and $\mathbf{Y}$ $$ \max_{\mathbf{w}, \mathbf{c}} \quad \text{Cov}(\mathbf{Xw}, \mathbf{Yc}) \quad \text{s.t.} \quad \|\mathbf{w}\|=\|\mathbf{c}\|=1 $$ ## 7. Multi-Fidelity Optimization **Combine cheap simulations with expensive experiments:** **Auto-regressive model (Kennedy-O'Hagan):** $$ y_{\text{HF}}(\mathbf{x}) = \rho \cdot y_{\text{LF}}(\mathbf{x}) + \delta(\mathbf{x}) $$ Where: - $y_{\text{HF}}$ = High-fidelity (experimental) response - $y_{\text{LF}}$ = Low-fidelity (simulation) response - $\rho$ = Scaling factor - $\delta(\mathbf{x}) \sim \mathcal{GP}$ = Discrepancy function **Multi-fidelity GP:** $$ \begin{bmatrix} \mathbf{y}_{\text{LF}} \\ \mathbf{y}_{\text{HF}} \end{bmatrix} \sim \mathcal{N}\left(\mathbf{0}, \begin{bmatrix} \mathbf{K}_{\text{LL}} & \rho\mathbf{K}_{\text{LH}} \\ \rho\mathbf{K}_{\text{HL}} & \rho^2\mathbf{K}_{\text{LL}} + \mathbf{K}_{\delta} \end{bmatrix}\right) $$ ## 8. Transfer Learning **Domain adaptation for tool-to-tool transfer:** $$ y_{\text{target}}(\mathbf{x}) = y_{\text{source}}(\mathbf{x}) + \Delta(\mathbf{x}) $$ **Offset model (simple):** $$ \Delta(\mathbf{x}) = c_0 \quad \text{(constant offset)} $$ **Linear adaptation:** $$ \Delta(\mathbf{x}) = \mathbf{c}^T\mathbf{x} + c_0 $$ **GP adaptation:** $$ \Delta(\mathbf{x}) \sim \mathcal{GP}(0, k_\Delta) $$ ## 9. Complete Optimization Framework ``` ┌────────────────────────────────────────────────────────────────────────────────────┐ │ RECIPE OPTIMIZATION FRAMEWORK │ ├────────────────────────────────────────────────────────────────────────────────────┤ │ │ │ RECIPE PARAMETERS PROCESS MODEL │ │ ───────────────── ───────────── │ │ x₁: Temperature (°C) ───► ┌───────────────┐ │ │ x₂: Pressure (mTorr) ───► │ │ │ │ x₃: Gas flow 1 (sccm) ───► │ y = f(x;θ) │ ───► y₁: Thickness (nm) │ │ x₄: Gas flow 2 (sccm) ───► │ │ ───► y₂: Uniformity (%) │ │ x₅: RF power (W) ───► │ + ε │ ───► y₃: CD (nm) │ │ x₆: Time (s) ───► └───────────────┘ ───► y₄: Defects (#/cm²) │ │ ▲ │ │ │ │ │ Uncertainty ξ │ │ │ ├────────────────────────────────────────────────────────────────────────────────────┤ │ OPTIMIZATION PROBLEM: │ │ │ │ min Σⱼ wⱼ(E[yⱼ] - yⱼ,target)² + λ·Var[y] │ │ x │ │ │ │ subject to: │ │ y_L ≤ E[y] ≤ y_U (specification limits) │ │ Pr(y ∈ spec) ≥ 0.9973 (Cpk ≥ 1.0) │ │ x_L ≤ x ≤ x_U (equipment limits) │ │ g(x) ≤ 0 (process constraints) │ │ │ └────────────────────────────────────────────────────────────────────────────────────┘ ``` ## 10. Key Equations Summary ### Process Modeling | Model Type | Equation | |:-----------|:---------| | Linear regression | $y = \mathbf{X}\boldsymbol{\beta} + \varepsilon$ | | Quadratic RSM | $y = \beta_0 + \sum_i \beta_i x_i + \sum_i \beta_{ii}x_i^2 + \sum_{i

high-angle grain boundary, defects

Large misorientation.

high-resolution generation, generative models

Create images beyond training resolution.

higher-order gnn, graph neural networks

Higher-order GNNs increase expressiveness by aggregating information from k-tuples of nodes rather than individuals.

highway networks, neural architecture

Gated skip connections.

hint learning, model compression

Student learns from teacher's intermediate layers.

hmm time series, hmm, time series models

Hidden Markov Models for time series assume observations generated by unobserved discrete states transitioning stochastically.

holt-winters, time series models

Holt-Winters method extends exponential smoothing to capture level trend and seasonality in time series forecasting.

homomorphic encryption, training techniques

Homomorphic encryption enables computation on encrypted data without decryption.

hopfield networks,neural architecture

Associative memory networks now connected to Transformer attention.

hopskipjump, ai safety

Efficient decision-based attack.

horizontal federated, training techniques

Horizontal federated learning trains on different samples with same features across parties.