Home Knowledge Base 2D sinusoidal position encoding

2D sinusoidal position encoding is a fixed mathematical position representation that extends the original 1D sinusoidal encoding from "Attention Is All You Need" to two spatial dimensions — encoding each patch's (x, y) grid position using independent sine and cosine functions along each axis and concatenating them to provide deterministic, parameter-free spatial information to Vision Transformers.

What Is 2D Sinusoidal Position Encoding?

Why 2D Sinusoidal Encoding Matters

How 2D Sinusoidal Encoding Works

Step 1 — Separate Axes:

Step 2 — Encode Each Axis:

Step 3 — Concatenate:

Frequency Design:

Mathematical Properties

2D Sinusoidal vs. Other Position Encodings

Property2D SinusoidalLearnedRelative BiasCPE
Parameters0N × D(2M-1)²Conv params
Resolution FlexibleGoodPoor (interpolation)GoodExcellent
Translation InvariantNoNoYesYes
ExtrapolationModeratePoorLimitedGood
ImplementationSimple formulaEmbedding tableIndex lookupConv layer
Training StabilityPerfect (fixed)May overfitStableStable

Usage in Vision Transformers

When to Use 2D Sinusoidal Encoding

2D sinusoidal position encoding is the mathematical foundation of spatial awareness in transformers — by encoding latitude and longitude with multi-frequency sine and cosine waves, it provides every patch with a precise, unique grid coordinate that requires no learning and no parameters, proving that sometimes the simplest solution is also one of the best.

2d sinusoidal position encoding2dcomputer vision

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.