Home Knowledge Base Superposition

Superposition is the phenomenon where neural networks represent more features (concepts) than they have dimensions by encoding them as overlapping, nearly-orthogonal directions in activation space — explaining why individual neurons are polysemantic (responding to multiple unrelated concepts) and why direct neuron-level interpretability is so difficult in large models.

What Is Superposition?

Why Superposition Matters

The Mathematics of Superposition

In a d-dimensional space with N features (N >> d):

When Does Superposition Occur?

Neural networks "choose" superposition based on the cost-benefit analysis:

Superposition is preferred when:

Toy Model Demonstration

Anthropic trained a simple model (5 inputs → 2D → 5 outputs) and found:

Polysemanticity in Practice

Superposition vs. Monosemanticity

RepresentationFeatures per neuronInterpretabilityInformation density
Monosemantic1HighLow
Polysemantic (superposition)ManyLowHigh
SAE features~1 (decomposed)HighModerate

Implications for Alignment and Safety

Superposition is the fundamental reason why neural networks are so difficult to interpret — by revealing that the basic unit of neural computation (the neuron) is not the basic unit of representation (the feature), superposition theory reframes the interpretability challenge and motivates the entire research agenda of sparse autoencoders and mechanistic feature analysis.

superpositionfeaturepolysemantic

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.