gev beamforming, gev, audio & speech
Generalized Eigenvalue beamforming estimates spatial filters from speech and noise covariance matrices.
355 technical terms and definitions
Generalized Eigenvalue beamforming estimates spatial filters from speech and noise covariance matrices.
GAN for face restoration.
GGML is C library for tensor operations. CPU inference. llama.cpp foundation.
GGUF is format for llama.cpp quantized models. Various quant levels (Q4, Q5, Q8). CPU inference.
Generate redundant features cheaply.
Ghost modules generate redundant features through cheap linear operations reducing computation.
Python GIL prevents true parallelism. Use multiprocessing, async, or native extensions for performance.
# Graph Isomorphism Network (GIN) in Graph Neural Networks ## Overview The **Graph Isomorphism Network (GIN)** is a graph neural network architecture introduced by Xu et al. in their seminal 2019 paper *"How Powerful are Graph Neural Networks?"*. GIN was specifically designed to maximize the expressive power of message-passing neural networks. ### Key Contributions - Established a theoretical framework connecting GNN expressiveness to the Weisfeiler-Lehman (WL) test - Proved that standard GNNs are at most as powerful as the 1-WL test - Designed GIN to achieve the maximum possible expressiveness for message-passing GNNs - Demonstrated that aggregation function choice fundamentally limits GNN power ## Theoretical Foundation ### The Weisfeiler-Lehman Test The **Weisfeiler-Lehman (WL) graph isomorphism test** is a classical algorithm for determining whether two graphs are structurally identical (isomorphic). #### 1-WL Algorithm Steps 1. **Initialize**: Assign each node an initial label (typically based on node features or degree) 2. **Aggregate**: For each node, collect the multiset of neighbor labels 3. **Hash**: Create a new label by hashing the node's current label with the aggregated neighbor information 4. **Iterate**: Repeat steps 2-3 until labels stabilize 5. **Compare**: Two graphs are potentially isomorphic if they have identical label histograms #### Mathematical Representation For node $v$ at iteration $k$: $$ c^{(k)}(v) = \text{HASH}\left( c^{(k-1)}(v), \{\!\!\{ c^{(k-1)}(u) : u \in \mathcal{N}(v) \}\!\!\} \right) $$ Where: - $c^{(k)}(v)$ is the label of node $v$ at iteration $k$ - $\mathcal{N}(v)$ denotes the neighborhood of node $v$ - $\{\!\!\{ \cdot \}\!\!\}$ denotes a multiset (bag) ## The GIN Architecture ### Core Insight The authors proved that for a GNN to be maximally powerful (i.e., as powerful as the 1-WL test), its aggregation function must be **injective** over multisets. ### Design Principles - **Injective aggregation**: The function must map different multisets to different representations - **Sum aggregation**: Chosen because it preserves multiset information completely - **MLP transformation**: Provides universal approximation capability - **Learnable center weighting**: The $\epsilon$ parameter distinguishes center node from neighbors ## Mathematical Formulation ### GIN Update Rule The GIN layer updates node representations as follows: $$ h_v^{(k)} = \text{MLP}^{(k)}\left( \left(1 + \epsilon^{(k)}\right) \cdot h_v^{(k-1)} + \sum_{u \in \mathcal{N}(v)} h_u^{(k-1)} \right) $$ Where: - $h_v^{(k)}$ is the feature vector of node $v$ at layer $k$ - $h_v^{(k-1)}$ is the feature vector from the previous layer - $\epsilon^{(k)}$ is a learnable parameter (or fixed scalar) - $\mathcal{N}(v)$ is the set of neighbors of node $v$ - $\text{MLP}^{(k)}$ is a multi-layer perceptron at layer $k$ ### Expanded Form Breaking down the computation: $$ h_v^{(k)} = \text{MLP}^{(k)}\left( (1 + \epsilon^{(k)}) \cdot h_v^{(k-1)} + \text{AGGREGATE}\left( \{ h_u^{(k-1)} : u \in \mathcal{N}(v) \} \right) \right) $$ ### Graph-Level Readout For graph classification, GIN uses a readout function combining features from all layers: $$ h_G = \text{CONCAT}\left( \text{READOUT}\left( \{ h_v^{(k)} : v \in G \} \right) \, \Big| \, k = 0, 1, \ldots, K \right) $$ Common readout functions: - **Sum**: $\text{READOUT}(\{h_v\}) = \sum_{v \in G} h_v$ - **Mean**: $\text{READOUT}(\{h_v\}) = \frac{1}{|G|} \sum_{v \in G} h_v$ - **Max**: $\text{READOUT}(\{h_v\}) = \max_{v \in G} h_v$ ## Aggregation Function Analysis ### Why Sum Aggregation? The choice of aggregation function is critical for GNN expressiveness. The key requirement is **injectivity over multisets**. ### Comparison of Aggregation Functions | Aggregator | Formula | Injectivity | Information Loss | |------------|---------|-------------|------------------| | **Sum** | $\sum h_u$ for all neighbors u | ✅ Injective | None | | **Mean** | $(1 / \text{deg}(v)) \cdot \sum h_u$ | ❌ Not injective | Count information | | **Max** | $\max(h_u)$ for all neighbors u | ❌ Not injective | Multiplicity | **Formal mathematical notation:** - **Sum**: $$\text{AGG}_{\text{sum}} = \sum_{u \in \mathcal{N}(v)} h_u$$ - **Mean**: $$\text{AGG}_{\text{mean}} = \frac{1}{|\mathcal{N}(v)|} \sum_{u \in \mathcal{N}(v)} h_u$$ - **Max**: $$\text{AGG}_{\text{max}} = \max_{u \in \mathcal{N}(v)} h_u$$ ### Concrete Examples #### Mean Aggregation Failure Mean cannot distinguish these multisets: $mean({1, 1, 1}) = 1 = mean({1})$ $mean({1, 2, 3}) = 2 = mean({2, 2, 2})$ #### Max Aggregation Failure Max cannot distinguish these multisets: $max({1, 2, 2, 2}) = 2 = max({1, 2})$ $max({5, 5, 5, 5}) = 5 = max({5})$ #### Sum Preserves Information Sum is injective on bounded multisets: $$ \text{sum}(\{1, 1, 1\}) = 3 \neq \text{sum}(\{1\}) = 1 $$ $$ \text{sum}(\{1, 2, 3\}) = 6 \neq \text{sum}(\{2, 2, 2\}) = 6 \text{ (need additional features)} $$ ### Theorem: Sum-Based Aggregation **Theorem (Xu et al., 2019)**: With sufficient MLP capacity, the function: $$ f\left( c, X \right) = (1 + \epsilon) \cdot \phi(c) + \sum_{x \in X} \phi(x) $$ is injective over pairs $(c, X)$ where $c$ is a center element and $X$ is a countable multiset. ## Expressiveness and Limitations ### What GIN Can Distinguish GIN (matching 1-WL) can distinguish: - Graphs with different node counts - Graphs with different edge counts - Graphs with different degree distributions - Most random graphs - Trees with different structures ### What GIN Cannot Distinguish GIN (and 1-WL) fails on: - **Regular graphs**: Cannot distinguish some $k$-regular graphs - **Symmetric structures**: Certain pairs of non-isomorphic graphs with high symmetry #### Classic 1-WL Failure Example The following pair of non-isomorphic graphs cannot be distinguished by 1-WL: **Graph 1**: Two triangles connected by an edge **Graph 1**: Two triangles connected by an edge ``` - A - - - B E - - - F \ / \ / C - - - - - - - - - D ``` **Graph 2**: A hexagon with a chord ``` - A - - - B / \ F C \ / E - - - D ``` Both have: - 6 nodes, all of degree 2 or 3 - Same multiset of neighbor degree sequences ### Higher-Order Extensions To overcome 1-WL limitations: | Method | Power | Complexity | |--------|-------|------------| | 1-WL / GIN | Baseline | O(n × d) per layer | | 2-WL | Strictly stronger | O(n²) | | k-WL | Increasing with k | O(nᵏ) | | k-FWL (Folklore) | Hierarchy | O(nᵏ) | ## Implementation Details ### GIN Layer (PyTorch-style Pseudocode) ```python class GINLayer: def __init__(self, input_dim, hidden_dim, epsilon=0): self.mlp = MLP(input_dim, hidden_dim) self.epsilon = Parameter(epsilon) # learnable or fixed def forward(self, h, adjacency): # h: node features [N, D] # adjacency: adjacency matrix [N, N] # Aggregate neighbor features (sum) neighbor_sum = adjacency @ h # [N, D] # Combine with center node combined = (1 + self.epsilon) * h + neighbor_sum # Apply MLP return self.mlp(combined) ``` ### MLP Architecture Recommendations The MLP in GIN typically consists of: $$ \text{MLP}(x) = W_2 \cdot \sigma(W_1 \cdot x + b_1) + b_2 $$ Where: - $\sigma$ is a non-linear activation (ReLU, LeakyReLU) - At least 2 layers are recommended - Batch normalization often improves training ### Hyperparameter Guidelines | Hyperparameter | Typical Values | Notes | |----------------|----------------|-------| | Number of layers $K$ | 3-5 | More layers = larger receptive field | | Hidden dimension | 64-256 | Task-dependent | | $\epsilon$ | Learnable or 0 | Learnable often works better | | Dropout | 0.0-0.5 | Regularization | | Learning rate | 0.001-0.01 | With Adam optimizer | ## Applications ### Molecular Property Prediction GIN excels at: - Drug discovery (molecular classification) - Toxicity prediction - Solubility estimation ### Graph Classification Benchmarks Performance on standard datasets: | Dataset | Type | GIN Accuracy | |---------|------|--------------| | MUTAG | Molecules | ~89% | | PTC | Molecules | ~64% | | PROTEINS | Bioinformatics | ~76% | | IMDB-BINARY | Social | ~75% | | COLLAB | Social | ~80% | ### Other Applications - Social network analysis - Knowledge graph reasoning - Point cloud processing - Program analysis ## Variants and Extensions ### GIN-ε (Learnable Epsilon) $$ h_v^{(k)} = \text{MLP}^{(k)}\left( (1 + \epsilon^{(k)}) \cdot h_v^{(k-1)} + \sum_{u \in \mathcal{N}(v)} h_u^{(k-1)} \right) $$ Where $\epsilon^{(k)}$ is learned via backpropagation. ### GIN-0 (Fixed Epsilon) $$ h_v^{(k)} = \text{MLP}^{(k)}\left( h_v^{(k-1)} + \sum_{u \in \mathcal{N}(v)} h_u^{(k-1)} \right) $$ Setting $\epsilon = 0$ simplifies the architecture while maintaining expressiveness. ### Edge-Featured GIN For graphs with edge features $e_{uv}$: $$ h_v^{(k)} = \text{MLP}^{(k)}\left( (1 + \epsilon) \cdot h_v^{(k-1)} + \sum_{u \in \mathcal{N}(v)} \text{ReLU}(h_u^{(k-1)} + e_{uv}) \right) $$ ## Summary ### Key Takeaways - GIN achieves maximum expressiveness among message-passing GNNs - Sum aggregation is crucial for injectivity - MLP provides universal approximation capability - The $\epsilon$ parameter helps distinguish center nodes from neighbors - GIN matches the power of the 1-WL graph isomorphism test ### When to Use GIN **Recommended for:** - Tasks requiring fine-grained structural discrimination - Graph classification problems - Molecular property prediction - When theoretical guarantees matter **Consider alternatives when:** - Node features dominate structure - Computational efficiency is critical - Graph size varies significantly (may need normalization) ## Mathematical | Symbol | Meaning | |--------|---------| | $G = (V, E)$ | Graph with vertices $V$ and edges $E$ | | $\mathcal{N}(v)$ | Neighborhood of node $v$ | | $h_v^{(k)}$ | Feature vector of node $v$ at layer $k$ | | $\epsilon^{(k)}$ | Learnable/fixed scalar at layer $k$ | | $\{\!\!\{ \cdot \}\!\!\}$ | Multiset notation | | $\text{MLP}^{(k)}$ | Multi-layer perceptron at layer $k$ | | $\sigma(\cdot)$ | Non-linear activation function | | $\oplus$ | Concatenation operation |
Giskard tests ML models for quality and bias. Automated test generation.
Generate git commit messages from diffs. Descriptive, conventional.
Version large files in Git.
Ask about Git and I will explain branching, merging, rebasing, pull requests, and practical workflows for solo or team dev.
Git tracks code versions. Branches, commits, merges.
GitHub Actions runs workflows. Build, test, deploy on push.
AI pair programmer that suggests code completions.
GitHub hosts git repositories. Collaboration, CI/CD, issues.
GitLab provides DevOps platform. Self-host option.
Sparse MoE language model from Google.
Predict glass-forming ability.
Unified grounding and detection.
Global-to-Local Neural Architecture Search discovers hierarchical vision transformer architectures efficiently.
Large and small crops.
Total batch across all devices.
Efficient global context modeling.
Flatness across entire wafer.
Main GPU memory.
Global pooling aggregates all node features into graph-level representations using operations like sum mean or attention-weighted averaging.
Global variation affects entire die uniformly from wafer-level process differences.
Hybrid timing approach.
PyTorch's collective communication library.
I can act as a living glossary: define terms, connect them to each other, and give concrete examples.
Glove boxes provide isolated atmospheres for handling sensitive materials.
Bulk impurity analysis.
Glow-TTS combines normalizing flows with monotonic alignment for fast high-quality speech synthesis.
Different gated activation functions.
Gating mechanism for sequences.
Suite of language understanding tasks.
Standard language understanding evaluation.
General Language Understanding Evaluation tests models on diverse NLP tasks.
Generalized Matrix Factorization applies element-wise product followed by neural layers generalizing linear matrix factorization models.
MLP with gating for language.
Gated MLP for images.
Graph Multiset Transformer uses multiset attention and virtual nodes for expressive graph-level representation learning.
GNN expressiveness theory studies which graph properties can be distinguished by different message passing architectures.
Go-Explore maintains archive of interesting states and returns to them enabling exploration of sparse reward environments.
Goal achievement detection recognizes when objectives are satisfied enabling termination.
Goal stacks organize objectives hierarchically tracking completion dependencies.
Learn to reach specified goals.
Goal-conditioned reinforcement learning trains agents to reach diverse goals specified as part of state representation.
GOAT is arithmetic fine-tuned LLM. Perfect on many arithmetic tasks.