Home Knowledge Base Induction Heads

Induction Heads are the specific two-layer attention head circuits in transformer models that implement pattern matching by searching for previously-seen context and predicting the token that followed it — identified as the mechanistic foundation of in-context learning and representing one of the most significant discoveries in mechanistic interpretability research.

What Are Induction Heads?

Why Induction Heads Matter

How Induction Heads Work — The Mechanism

The Two-Head Circuit:

Head 1 — Previous Token Head (layer L₁):

Head 2 — Induction Head (layer L₂, L₂ > L₁):

In-Context Few-Shot Learning:

The Phase Transition

During transformer training, a clear phase transition occurs at a specific training step:

Evidence: Ablating the attention heads that form during the phase transition restores the pre-transition loss — confirming these heads causally produce the capability.

Induction Head Variants

Implications for AI Safety

Induction heads are the Rosetta Stone of mechanistic interpretability — the first complete, formal account of a transformer capability that validated the entire research program of understanding neural networks as reverse-engineered algorithms rather than inscrutable black boxes, demonstrating that even seemingly mysterious capabilities like in-context learning have precise, understandable mechanical implementations.

induction headcopyingicl

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.