Home Knowledge Base Relative position bias

Relative position bias is a learned spatial encoding in Vision Transformers that captures the relative distance and direction between pairs of patches rather than their absolute positions — providing translation invariance so that spatial relationships like "nose is above mouth" hold regardless of where the face appears in the image, improving generalization and enabling flexible resolution handling.

What Is Relative Position Bias?

Why Relative Position Bias Matters

How Relative Position Bias Works

Bias Table Construction:

Index Mapping:

Attention Computation:

Comparison of Position Encoding Methods

MethodTypeTranslation InvariantResolution FlexibleParameters
Learned AbsoluteAdditive embeddingNoNo (fixed length)N × D
Sinusoidal AbsoluteFixed, no learningNoPartially0
Relative Position BiasAttention biasYesYes (interpolate)(2M-1)² per head
RoPE (Rotary)Rotation in Q/KYesYes0
Conditional (CPE)Conv-basedYesYesConv params

Relative Position Bias Variants

Relative position bias is the position encoding method of choice for modern Vision Transformers — by learning how patches relate to each other rather than where they are in absolute terms, it provides the spatial understanding transformers need while maintaining the flexibility to generalize across resolutions and spatial configurations.

relative position bias in vitcomputer vision

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.