Capsule Networks

Keywords: capsule network dynamic routing,capsule part whole relationship,equivariance capsule,em routing capsule,hinton capsule network

Capsule Networks is the alternative neural architecture where capsules (groups of neurons encoding pose and viewpoint parameters) are routed through dynamic agreement — capturing part-whole hierarchies and equivariance to transformations better than traditional convolutional networks.

Capsule Entity Representation:
- Capsule abstraction: group of neurons (4-8 typically) represents entity at specific position/scale; vector encodes pose information
- Pose vector: contains position, size, orientation, and other transformation parameters for detected feature
- Equivariance property: when input transforms (rotation, translation), pose vectors transform correspondingly; not true for standard neurons
- Routing responsibility: capsule outputs routed to higher-level capsules based on agreement; mechanism for part-whole relationships

Dynamic Routing by Agreement:
- Routing algorithm: iterative procedure routes lower-level capsule outputs to higher-level capsules based on prediction agreement
- Coupling coefficients: learned soft weights determining capsule routing; updated each iteration based on agreement metrics
- Routing iterations: typically 2-3 iterations; each iteration refines coupling coefficients to route to agreeing capsules
- Squashing activation: output capsule activations squeezed to unit norm via non-linear squashing function
- Prediction agreement: if lower-capsule predicts upper-capsule's activity, coupling strength increases (routing to agreeing capsules)

EM Routing (Hinton 2018):
- Expectation-Maximization routing: alternative to dynamic routing; more principled probabilistic approach
- Gaussian modeling: model capsule outputs as mixture of Gaussians; EM algorithm learns mixture weights and parameters
- Linear transformation: pose predictions from lower to higher capsules via learned transformation matrices
- Iterative EM: alternating expectation (assign capsules to clusters) and maximization (update cluster parameters)
- Improved performance: EM routing slightly improves accuracy; computational cost vs marginal gain tradeoff

Part-Whole Relationships:
- Hierarchical structure: capsules explicitly encode part-whole relationships; lower-level features → higher-level entities
- Compositional learning: model learns that wheels, doors, windows compose cars; explicit semantic hierarchy
- Robustness to viewpoint: capsule vectors contain viewpoint information; networks generalize across viewpoints
- Inverse graphics: capsules hypothesized to learn inverse graphics model (generate images from poses)

Equivariance to Transformations:
- Equivariance advantage: standard CNNs have limited equivariance (only translation for convolution); capsules equivariant to more transforms
- Pose generalization: viewpoint transformation in input reflected in pose vector; enables better generalization
- Affine transformations: capsule networks hypothesized to be equivariant to affine transforms; supported empirically
- Robustness benefits: equivariance hypothesized to improve adversarial robustness; empirical validation ongoing

Capsule Network Architecture:
- CapsNet for MNIST: two convolutional capsule layers + fully-connected capsule layer; margin loss for multiclass classification
- Instance parameters: each capsule type shares weights across spatial positions; reduces parameters vs fully-connected networks
- Reconstruction regularizer: add reconstruction loss (decoder reconstructs image from class capsule); additional supervision signal

Limitations and Challenges:
- Scalability: routing complexity increases with network depth; computational overhead substantial
- Training difficulty: capsule networks harder to train than CNNs; require careful initialization and hyperparameter tuning
- Performance gains: improvements over CNNs modest on standard benchmarks; larger benefits hypothesized for novel viewpoints
- Interpretability: capsule poses should be interpretable (rotations, positions, etc.); empirical pose interpretability mixed

Capsule networks introduce geometric structure — routing by agreement and pose vectors encoding transformations — proposing a more biologically inspired alternative to standard convolutions with advantages in capturing part-whole hierarchies.

Want to learn more?

Search 13,225+ semiconductor and AI topics or chat with our AI assistant.

Search Topics Chat with CFSGPT