← Back to AI Factory Chat

AI Factory Glossary

285 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 3 of 6 (285 entries)

help,assist,support

I can help with AI chips, LLMs, training, inference, and related topics. Just ask!

hepa filter (high-efficiency particulate air),hepa filter,high-efficiency particulate air,facility

Filter that removes 99.97% of particles 0.3 microns and larger.

hermetic sealing, packaging

Create airtight seal.

heterogeneous graph neural networks,graph neural networks

GNNs for graphs with different node/edge types.

heterogeneous graph, graph neural networks

Heterogeneous graphs contain multiple node types and edge types requiring specialized message passing for different relation semantics.

heterogeneous info net, recommendation systems

Heterogeneous information networks integrate multiple entity types and relations for unified recommendation frameworks.

heterogeneous integration, advanced packaging

Combine different technologies in one package.

heterogeneous integration, business & strategy

Heterogeneous integration combines different technologies materials or functions in single package.

heterogeneous integration,advanced packaging

Combine dies from different technologies or materials in one package.

heterogeneous skip-gram, graph neural networks

Heterogeneous skip-gram predicts context nodes of different types given target nodes.

hetsann, graph neural networks

Heterogeneous Self-Attention Neural Network adaptively learns importance of different metapaths and neighbors.

heun method sampling, generative models

Second-order ODE solver.

heuristic quality metrics, data quality

Simple quality indicators.

hf dip,clean tech

Hydrofluoric acid to remove native oxide and etch oxide.

hgt, heterogeneous graph transformer, graph neural networks, gnn, heterogeneous graphs, transformer, attention mechanism

# Heterogeneous Graph Transformer (HGT) ## HGT Graph Neural Networks **HGT (Heterogeneous Graph Transformer)** is a graph neural network architecture designed specifically for **heterogeneous graphs** — graphs where nodes and edges can have different types. It was introduced by Hu et al. in 2020. ## 1. Problem Setting ### 1.1 Heterogeneous Graph Definition A heterogeneous graph is defined as: $$ G = (V, E, \tau, \phi) $$ Where: - $V$ — Set of nodes - $E$ — Set of edges - $\tau: V \rightarrow \mathcal{T}$ — Node type mapping function - $\phi: E \rightarrow \mathcal{R}$ — Edge type mapping function - $\mathcal{T}$ — Set of node types - $\mathcal{R}$ — Set of edge/relation types ### 1.2 Real-World Examples - **Academic Networks**: - Node types: `Paper`, `Author`, `Venue`, `Institution` - Edge types: `writes`, `cites`, `published_in`, `affiliated_with` - **E-commerce Graphs**: - Node types: `User`, `Product`, `Brand`, `Category` - Edge types: `purchases`, `reviews`, `belongs_to`, `manufactures` - **Knowledge Graphs**: - Node types: `Person`, `Organization`, `Location`, `Event` - Edge types: `works_at`, `located_in`, `participated_in` ## 2. HGT Architecture ### 2.1 Core Components The HGT layer consists of three main operations: 1. **Heterogeneous Mutual Attention** 2. **Heterogeneous Message Passing** 3. **Target-Specific Aggregation** ### 2.2 Type-Dependent Linear Projections For each node type $\tau \in \mathcal{T}$, HGT defines separate projection matrices: $$ Q_{\tau}^{(i)} \in \mathbb{R}^{d \times \frac{d}{h}}, \quad K_{\tau}^{(i)} \in \mathbb{R}^{d \times \frac{d}{h}}, \quad V_{\tau}^{(i)} \in \mathbb{R}^{d \times \frac{d}{h}} $$ Where: - $d$ — Hidden dimension - $h$ — Number of attention heads - $i$ — Attention head index $(i = 1, 2, \ldots, h)$ ## 3. Mathematical Formulation ### 3.1 Attention Mechanism For a source node $s$ and target node $t$ connected by edge $e$: #### Step 1: Compute Query and Key $$ \text{Query}^{(i)}(t) = Q_{\tau(t)}^{(i)} \cdot H^{(l-1)}[t] $$ $$ \text{Key}^{(i)}(s) = K_{\tau(s)}^{(i)} \cdot H^{(l-1)}[s] $$ #### Step 2: Compute Attention Score $$ \text{ATT-head}^{(i)}(s, e, t) = \left( \text{Key}^{(i)}(s) \cdot W_{\phi(e)}^{\text{ATT}} \cdot \text{Query}^{(i)}(t)^T \right) \cdot \frac{\mu_{\langle \tau(s), \phi(e), \tau(t) \rangle}}{\sqrt{d}} $$ Where: - $W_{\phi(e)}^{\text{ATT}} \in \mathbb{R}^{\frac{d}{h} \times \frac{d}{h}}$ — Edge-type-specific attention matrix - $\mu_{\langle \tau(s), \phi(e), \tau(t) \rangle}$ — Prior importance of meta-relation (learnable scalar) #### Step 3: Softmax Normalization $$ \text{Attention}(s, e, t) = \text{softmax}_{s \in \mathcal{N}(t)} \left( \text{ATT-head}^{(i)}(s, e, t) \right) $$ ### 3.2 Message Computation $$ \text{Message}^{(i)}(s, e, t) = V_{\tau(s)}^{(i)} \cdot H^{(l-1)}[s] \cdot W_{\phi(e)}^{\text{MSG}} $$ Where: - $W_{\phi(e)}^{\text{MSG}} \in \mathbb{R}^{\frac{d}{h} \times \frac{d}{h}}$ — Edge-type-specific message matrix ### 3.3 Multi-Head Aggregation $$ \tilde{H}^{(l)}[t] = \bigoplus_{i=1}^{h} \left( \sum_{s \in \mathcal{N}(t)} \text{Attention}^{(i)}(s, e, t) \cdot \text{Message}^{(i)}(s, e, t) \right) $$ Where $\bigoplus$ denotes concatenation across heads. ### 3.4 Final Output with Residual Connection $$ H^{(l)}[t] = \sigma \left( W_{\tau(t)}^{\text{OUT}} \cdot \tilde{H}^{(l)}[t] + H^{(l-1)}[t] \right) $$ Where: - $W_{\tau(t)}^{\text{OUT}} \in \mathbb{R}^{d \times d}$ — Target-type-specific output projection - $\sigma$ — Activation function (e.g., ReLU, GELU) ## 4. Relative Temporal Encoding (RTE) For temporal/dynamic graphs, HGT incorporates time information: $$ \text{RTE}(\Delta t) = \text{Linear}\left( \text{T2V}(\Delta t) \right) $$ Where $\Delta t = t_{\text{target}} - t_{\text{source}}$ is the time difference. ### Time2Vec Encoding $$ \text{T2V}(\Delta t)[i] = \begin{cases} \omega_i \cdot \Delta t + \varphi_i & \text{if } i = 0 \\ \sin(\omega_i \cdot \Delta t + \varphi_i) & \text{if } i > 0 \end{cases} $$ The temporal attention becomes: $$ \text{ATT-head}^{(i)}(s, e, t) = \left( \text{Key}^{(i)}(s) + \text{RTE}(\Delta t) \right) \cdot W_{\phi(e)}^{\text{ATT}} \cdot \text{Query}^{(i)}(t)^T $$ ## 5. Comparison | Method | Heterogeneity Handling | Metapaths Required | Parameter Efficiency | |--------|----------------------|-------------------|---------------------| | **GCN** | ❌ Homogeneous only | N/A | ✅ High | | **GAT** | ❌ Homogeneous only | N/A | ✅ High | | **R-GCN** | ✅ Yes | ❌ No | ❌ Low (separate weights per relation) | | **HAN** | ✅ Yes | ✅ Yes (manual design) | ⚠️ Medium | | **HGT** | ✅ Yes | ❌ No (automatic) | ✅ High (decomposition) | ## 6. Implementation ### 6.1 PyTorch Geometric Implementation ```python import torch import torch.nn as nn from torch_geometric.nn import HGTConv, Linear class HGT(nn.Module): def __init__(self, metadata, hidden_channels, out_channels, num_heads, num_layers): super().__init__() self.node_types = metadata[0] self.edge_types = metadata[1] # Linear projections for each node type self.lin_dict = nn.ModuleDict() for node_type in self.node_types: self.lin_dict[node_type] = Linear(-1, hidden_channels) # HGT convolutional layers self.convs = nn.ModuleList() for _ in range(num_layers): conv = HGTConv( in_channels=hidden_channels, out_channels=hidden_channels, metadata=metadata, heads=num_heads, group='sum' ) self.convs.append(conv) # Output projection self.out_lin = Linear(hidden_channels, out_channels) def forward(self, x_dict, edge_index_dict): # Initial projection x_dict = { node_type: self.lin_dict[node_type](x).relu() for node_type, x in x_dict.items() } # HGT layers for conv in self.convs: x_dict = conv(x_dict, edge_index_dict) return x_dict ``` ### 6.2 Usage Example ```python # Define metadata metadata = ( ['paper', 'author', 'venue'], # Node types [ ('author', 'writes', 'paper'), ('paper', 'cites', 'paper'), ('paper', 'published_in', 'venue'), ] # Edge types as (src, relation, dst) ) # Initialize model model = HGT( metadata=metadata, hidden_channels=64, out_channels=16, num_heads=4, num_layers=2 ) # Forward pass out_dict = model(x_dict, edge_index_dict) ``` ## 7. Training Objective ### 7.1 Node Classification $$ \mathcal{L}_{\text{node}} = -\sum_{v \in V_{\text{labeled}}} \sum_{c=1}^{C} y_{v,c} \log(\hat{y}_{v,c}) $$ Where: - $y_{v,c}$ — Ground truth label (one-hot) - $\hat{y}_{v,c} = \text{softmax}(H^{(L)}[v])_c$ — Predicted probability ### 7.2 Link Prediction $$ \mathcal{L}_{\text{link}} = -\sum_{(s,e,t) \in E} \log \sigma(H^{(L)}[s]^T \cdot W_{\phi(e)} \cdot H^{(L)}[t]) - \sum_{(s,e,t') \in E^{-}} \log \sigma(-H^{(L)}[s]^T \cdot W_{\phi(e)} \cdot H^{(L)}[t']) $$ Where: - $E^{-}$ — Negative edge samples - $\sigma$ — Sigmoid function ## 8. Complexity Analysis ### 8.1 Time Complexity $$ O\left( |E| \cdot d^2 / h + |V| \cdot d^2 \right) $$ Where: - $|E|$ — Number of edges - $|V|$ — Number of nodes - $d$ — Hidden dimension - $h$ — Number of heads ### 8.2 Space Complexity (Parameters) $$ O\left( |\mathcal{T}| \cdot d^2 + |\mathcal{R}| \cdot d^2 / h \right) $$ This is more efficient than R-GCN which requires $O(|\mathcal{R}| \cdot d^2)$. ## 9. Key Advantages - **No Manual Metapath Design**: Unlike HAN, HGT automatically learns the importance of different meta-relations - **Parameter Efficient**: Uses decomposition to avoid parameter explosion with many relation types - **Unified Framework**: Handles any heterogeneous graph schema - **Temporal Support**: Can incorporate relative time encoding for dynamic graphs - **Interpretable**: Attention weights reveal learned importance of different relations ## 10. Limitations - **Computational Overhead**: More complex than homogeneous GNNs - **Data Requirements**: Needs sufficient examples per node/edge type - **Memory Usage**: Multi-head attention increases memory consumption - **Hyperparameter Sensitivity**: Performance depends on number of heads, layers, hidden dimensions ## 12. Reference | Symbol | Description | |--------|-------------| | $G = (V, E, \tau, \phi)$ | Heterogeneous graph | | $\tau(v)$ | Type of node $v$ | | $\phi(e)$ | Type of edge $e$ | | $H^{(l)}[v]$ | Node $v$ representation at layer $l$ | | $\mathcal{N}(t)$ | Neighbors of target node $t$ | | $Q, K, V$ | Query, Key, Value projections | | $W^{\text{ATT}}, W^{\text{MSG}}$ | Attention and Message weight matrices | | $\mu$ | Learnable meta-relation prior |

hi,hello,hey,greet

Hello! How can I help you today?

hidden factory, production

Rework not visible in metrics.

hidden factory, quality & reliability

Hidden factory represents rework and waste consumed fixing defects reducing effective capacity.

hidden loss, manufacturing operations

Hidden losses are inefficiencies not immediately apparent requiring detailed analysis.

hierarchical all-reduce, distributed training

Multi-level aggregation.

hierarchical attention, transformer

Multi-level attention structure.

hierarchical clustering, manufacturing operations

Hierarchical clustering creates tree-structured groupings at multiple similarity levels.

hierarchical context, llm architecture

Multi-level context organization.

hierarchical federated learning, federated learning

Multi-level federation structure.

hierarchical fusion, multimodal ai

Multi-level fusion strategy.

hierarchical moe, moe

Multi-level expert organization.

hierarchical optimization, optimization

Optimize at different levels sequentially.

hierarchical planning, ai agents

Hierarchical planning operates at multiple abstraction levels from high-level goals to low-level actions.

hierarchical pooling, graph neural networks

Hierarchical pooling creates multi-resolution graph representations through successive coarsening operations.

hierarchical rl, reinforcement learning

Decompose tasks into subtasks.

hierarchical rl, reinforcement learning advanced

Hierarchical reinforcement learning decomposes tasks into subtasks with multiple levels of temporal abstraction.

hierarchical sampling, 3d vision

Sample coarse then fine.

hifi-gan, audio & speech

HiFi-GAN is a generative adversarial network vocoder that synthesizes high-fidelity audio efficiently using multi-scale discriminators.

high availability (ha),high availability,ha,reliability

System remains operational despite failures.

high bandwidth memory advanced, hbm, advanced packaging

Stacked DRAM with wide interface.

high dimensional optimization, bayesian optimization, gaussian process, response surface, doe, design of experiments, pareto optimization, robust optimization, surrogate modeling, tcad, run to run control

# Semiconductor Manufacturing Process Recipe Optimization: Mathematical Modeling ## 1. Problem Context A semiconductor **recipe** is a vector of controllable parameters: $$ \mathbf{x} = \begin{bmatrix} T \\ P \\ Q_1 \\ Q_2 \\ \vdots \\ t \\ P_{\text{RF}} \end{bmatrix} \in \mathbb{R}^n $$ Where: - $T$ = Temperature (°C or K) - $P$ = Pressure (mTorr or Pa) - $Q_i$ = Gas flow rates (sccm) - $t$ = Process time (seconds) - $P_{\text{RF}}$ = RF power (Watts) **Goal**: Find optimal $\mathbf{x}$ such that output properties $\mathbf{y}$ meet specifications while accounting for variability. ## 2. Mathematical Modeling Approaches ### 2.1 Physics-Based (First-Principles) Models #### Chemical Vapor Deposition (CVD) Example **Mass transport and reaction equation:** $$ \frac{\partial C}{\partial t} + \nabla \cdot (\mathbf{u}C) = D\nabla^2 C + R(C, T) $$ Where: - $C$ = Species concentration - $\mathbf{u}$ = Velocity field - $D$ = Diffusion coefficient - $R(C, T)$ = Reaction rate **Surface reaction kinetics (Arrhenius form):** $$ k_s = A \exp\left(-\frac{E_a}{RT}\right) $$ Where: - $A$ = Pre-exponential factor - $E_a$ = Activation energy - $R$ = Gas constant - $T$ = Temperature **Deposition rate (transport-limited regime):** $$ r = \frac{k_s C_s}{1 + \frac{k_s}{h_g}} $$ Where: - $C_s$ = Surface concentration - $h_g$ = Gas-phase mass transfer coefficient **Characteristics:** - **Advantages**: Extrapolates outside training data, physically interpretable - **Disadvantages**: Computationally expensive, requires detailed mechanism knowledge ### 2.2 Empirical/Statistical Models (Response Surface Methodology) **Second-order polynomial model:** $$ y = \beta_0 + \sum_{i=1}^{n}\beta_i x_i + \sum_{i=1}^{n}\beta_{ii}x_i^2 + \sum_{i 50$ parameters) | PCA, PLS, sparse regression (LASSO), feature selection | | Small datasets (limited wafer runs) | Bayesian methods, transfer learning, multi-fidelity modeling | | Nonlinearity | GPs, neural networks, tree ensembles (RF, XGBoost) | | Equipment-to-equipment variation | Mixed-effects models, hierarchical Bayesian models | | Drift over time | Adaptive/recursive estimation, change-point detection, Kalman filtering | | Multiple correlated responses | Multi-task learning, co-kriging, multivariate GP | | Missing data | EM algorithm, multiple imputation, probabilistic PCA | ## 6. Dimensionality Reduction ### 6.1 Principal Component Analysis (PCA) **Objective:** $$ \max_{\mathbf{w}} \quad \mathbf{w}^T\mathbf{S}\mathbf{w} \quad \text{s.t.} \quad \|\mathbf{w}\|_2 = 1 $$ Where $\mathbf{S}$ is the sample covariance matrix. **Solution:** Eigenvectors of $\mathbf{S}$ $$ \mathbf{S} = \mathbf{W}\boldsymbol{\Lambda}\mathbf{W}^T $$ **Reduced representation:** $$ \mathbf{z} = \mathbf{W}_k^T(\mathbf{x} - \bar{\mathbf{x}}) $$ Where $\mathbf{W}_k$ contains the top $k$ eigenvectors. ### 6.2 Partial Least Squares (PLS) **Objective:** Maximize covariance between $\mathbf{X}$ and $\mathbf{Y}$ $$ \max_{\mathbf{w}, \mathbf{c}} \quad \text{Cov}(\mathbf{Xw}, \mathbf{Yc}) \quad \text{s.t.} \quad \|\mathbf{w}\|=\|\mathbf{c}\|=1 $$ ## 7. Multi-Fidelity Optimization **Combine cheap simulations with expensive experiments:** **Auto-regressive model (Kennedy-O'Hagan):** $$ y_{\text{HF}}(\mathbf{x}) = \rho \cdot y_{\text{LF}}(\mathbf{x}) + \delta(\mathbf{x}) $$ Where: - $y_{\text{HF}}$ = High-fidelity (experimental) response - $y_{\text{LF}}$ = Low-fidelity (simulation) response - $\rho$ = Scaling factor - $\delta(\mathbf{x}) \sim \mathcal{GP}$ = Discrepancy function **Multi-fidelity GP:** $$ \begin{bmatrix} \mathbf{y}_{\text{LF}} \\ \mathbf{y}_{\text{HF}} \end{bmatrix} \sim \mathcal{N}\left(\mathbf{0}, \begin{bmatrix} \mathbf{K}_{\text{LL}} & \rho\mathbf{K}_{\text{LH}} \\ \rho\mathbf{K}_{\text{HL}} & \rho^2\mathbf{K}_{\text{LL}} + \mathbf{K}_{\delta} \end{bmatrix}\right) $$ ## 8. Transfer Learning **Domain adaptation for tool-to-tool transfer:** $$ y_{\text{target}}(\mathbf{x}) = y_{\text{source}}(\mathbf{x}) + \Delta(\mathbf{x}) $$ **Offset model (simple):** $$ \Delta(\mathbf{x}) = c_0 \quad \text{(constant offset)} $$ **Linear adaptation:** $$ \Delta(\mathbf{x}) = \mathbf{c}^T\mathbf{x} + c_0 $$ **GP adaptation:** $$ \Delta(\mathbf{x}) \sim \mathcal{GP}(0, k_\Delta) $$ ## 9. Complete Optimization Framework ``` ┌────────────────────────────────────────────────────────────────────────────────────┐ │ RECIPE OPTIMIZATION FRAMEWORK │ ├────────────────────────────────────────────────────────────────────────────────────┤ │ │ │ RECIPE PARAMETERS PROCESS MODEL │ │ ───────────────── ───────────── │ │ x₁: Temperature (°C) ───► ┌───────────────┐ │ │ x₂: Pressure (mTorr) ───► │ │ │ │ x₃: Gas flow 1 (sccm) ───► │ y = f(x;θ) │ ───► y₁: Thickness (nm) │ │ x₄: Gas flow 2 (sccm) ───► │ │ ───► y₂: Uniformity (%) │ │ x₅: RF power (W) ───► │ + ε │ ───► y₃: CD (nm) │ │ x₆: Time (s) ───► └───────────────┘ ───► y₄: Defects (#/cm²) │ │ ▲ │ │ │ │ │ Uncertainty ξ │ │ │ ├────────────────────────────────────────────────────────────────────────────────────┤ │ OPTIMIZATION PROBLEM: │ │ │ │ min Σⱼ wⱼ(E[yⱼ] - yⱼ,target)² + λ·Var[y] │ │ x │ │ │ │ subject to: │ │ y_L ≤ E[y] ≤ y_U (specification limits) │ │ Pr(y ∈ spec) ≥ 0.9973 (Cpk ≥ 1.0) │ │ x_L ≤ x ≤ x_U (equipment limits) │ │ g(x) ≤ 0 (process constraints) │ │ │ └────────────────────────────────────────────────────────────────────────────────────┘ ``` ## 10. Key Equations Summary ### Process Modeling | Model Type | Equation | |:-----------|:---------| | Linear regression | $y = \mathbf{X}\boldsymbol{\beta} + \varepsilon$ | | Quadratic RSM | $y = \beta_0 + \sum_i \beta_i x_i + \sum_i \beta_{ii}x_i^2 + \sum_{i

high temperature, text generation

More random generation.

high vacuum pump, manufacturing operations

High vacuum pumps achieve extremely low pressures for critical processes.

high-angle annular dark field, haadf, metrology

Z-contrast imaging in STEM.

high-angle grain boundary, defects

Large misorientation.

high-aspect-ratio mol, process integration

High-aspect-ratio contacts and vias in scaled nodes challenge gap fill and reliability requiring advanced processes.

high-k dielectric,technology

Dielectric with high dielectric constant.

high-k first, process integration

Deposit high-k before poly gate.

high-k last, process integration

Deposit high-k after removing poly.

high-k metal gate (hkmg),high-k metal gate,hkmg,technology

High dielectric constant gate oxide and metal gate for better performance.

high-k metal gate, process integration

High-k metal gate technology replaces silicon dioxide and polysilicon with high dielectric constant materials and metal electrodes reducing leakage.

high-na euv,lithography

Higher numerical aperture for better resolution.

high-order overlay, metrology

Overlay beyond simple X-Y shift (rotation scaling).

high-power probe, advanced test & probe

High-power probing tests devices at elevated current and voltage levels requiring specialized probe tips and thermal management.

high-resolution fine-tuning, computer vision

Adapt to higher resolution images.