Home Knowledge Base Shifted window attention

Shifted window attention is the cross-window communication mechanism in Swin Transformer that shifts the window partition grid by half a window size between consecutive transformer layers — enabling information flow across window boundaries while maintaining the computational efficiency of local window attention, effectively providing global context through alternating local computations.

What Is Shifted Window Attention?

Why Shifted Window Attention Matters

How Shifted Window Attention Works

Regular Window (Layer L):

Shifted Window (Layer L+1):

Efficient Masking Implementation:

Information Flow Example

LayerWindow ConfigCross-Window Info
Layer 1Regular windowsNone — isolated
Layer 2Shifted windowsAdjacent windows connected
Layer 3Regular windows2-hop connections form
Layer 4Shifted windows3-hop connections — near global
Layer 5+AlternatingEffectively global receptive field

Swin Transformer Architecture

StageLayersWindowResolutionCross-Window
Stage 127×756×561 shifted layer
Stage 227×728×281 shifted layer
Stage 36-187×714×143-9 shifted layers
Stage 427×77×71 shifted layer (global)

Performance Impact

ModelAttention TypeImageNet Top-1FLOPs
ViT-B/16Global77.9%17.6G
DeiT-BGlobal + distill83.4%17.6G
Swin-BShifted window83.5%15.4G
Swin-LShifted window87.3%34.5G

Shifted window attention is the elegant solution to the locality-efficiency tradeoff in Vision Transformers — by simply alternating window positions between layers, Swin Transformer achieves global information flow with purely local computation, proving that cleverness in architecture design can be more powerful than brute-force compute.

shifted window attentioncomputer vision

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.