Equivariant Neural Networks

Equivariant Neural Networks are architectures that guarantee when the input is transformed by a group operation $g$ (rotation, translation, reflection, permutation), the internal features and outputs transform by the same operation or a well-defined representation of it — encoding the mathematical structure of symmetry groups directly into the network's computation, ensuring that learned representations respect the geometric fabric of the data domain without requiring data augmentation or hoping the model discovers symmetry from examples.

What Are Equivariant Neural Networks?

- Definition: A neural network layer $f$ is equivariant to a group $G$ if for every group element $g in G$ and input $x$: $f( ho_{in}(g) cdot x) = ho_{out}(g) cdot f(x)$, where $ ho_{in}$ and $ ho_{out}$ are the group representations acting on the input and output spaces respectively. This means applying a transformation before the layer produces the same result as applying the corresponding transformation after the layer.
- Group Convolution: Standard convolution is equivariant to translations — shifting the input shifts the feature map by the same amount. Equivariant neural networks generalize this to arbitrary groups by replacing standard convolution with group convolution, which also slides and rotates (or reflects, scales, etc.) the filter according to the symmetry group.
- Feature Types: Equivariant networks classify features by their transformation type under the group — scalar features (type-0, invariant), vector features (type-1, rotate with the input), matrix features (type-2, transform as tensors). Different feature types carry different geometric information and interact through Clebsch-Gordan-like tensor product operations.

Why Equivariant Neural Networks Matter

- Molecular Property Prediction: Molecular binding energy, protein docking affinity, and crystal formation energy must not change when the entire system is rotated or translated — these are SE(3)-invariant quantities. An SE(3)-equivariant network guarantees this invariance architecturally, while a standard MLP would need to learn it from data augmentation across all possible 3D orientations.
- Exact Symmetry: Data augmentation can only approximate symmetry — it samples a finite set of transformations during training and hopes generalization covers the rest. Equivariant networks enforce exact symmetry for every possible transformation in the group, including those never seen during training. For continuous groups like SO(3), this is the difference between sampling a handful of rotations and guaranteeing correctness for all infinite rotations.
- Scientific Discovery: Equivariant networks are essential for scientific ML where the outputs must respect physical symmetries. Force predictions must be SE(3)-equivariant (forces rotate with the coordinate system), energy must be SE(3)-invariant (scalar under rotation), and stress must be SO(3)-equivariant (tensor transformation). The network architecture enforces these physical constraints.
- AlphaFold Connection: AlphaFold2's structure module uses an Invariant Point Attention mechanism that is SE(3)-equivariant with respect to the protein backbone frames, ensuring that the predicted 3D structure is independent of the arbitrary choice of global coordinate system.

Equivariant Architecture Families

| Architecture | Group | Domain |
|-------------|-------|--------|
| Standard CNN | $mathbb{Z}^2$ (translation) | 2D image grids |
| Group CNN (Cohen & Welling) | $p4m$ (translation + rotation + flip) | 2D images needing orientation awareness |
| EGNN | $E(n)$ (Euclidean) | 3D molecular graphs |
| SE(3)-Transformers | $SE(3)$ (rotation + translation) | Protein structure, 3D point clouds |
| Tensor Field Networks | $SO(3)$ (rotation) | 3D scalar/vector/tensor field prediction |

Equivariant Neural Networks are geometry-locked computation — changing internal state in exact lockstep with transformations of the external world, ensuring that the network's understanding of physics, chemistry, and geometry is independent of the arbitrary coordinate frame used to describe it.

Want to learn more?