Home Knowledge Base Attention Rollout

Attention Rollout is a visualization technique that aggregates attention weights across all transformer layers — recursively multiplying attention matrices to reveal which input tokens ultimately influence the final output, providing insight into multi-layer information flow in transformer models like BERT and GPT.

What Is Attention Rollout?

Why Attention Rollout Matters

How Attention Rollout Works

Step 1: Extract Attention Matrices:

Step 2: Account for Residual Connections:

Step 3: Recursive Multiplication:

Step 4: Visualization:

Mathematical Formulation

Computation:

A_rollout = ∏(l=1 to L) (0.5 × A_l + 0.5 × I)

Interpretation:

Benefits & Limitations

Benefits:

Limitations:

Variants & Extensions

Applications

Model Debugging:

Research Insights:

Tools & Platforms

Attention Rollout is a foundational tool for transformer interpretability — despite known limitations, it provides valuable insights into multi-layer attention flow and remains one of the most popular methods for understanding what transformers learn and how they make decisions.

attention rolloutexplainable ai

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.