euclidean distance

Euclidean distance (also called L2 distance) is the straight-line distance between two points (vectors) in multi-dimensional space, calculated as the square root of the sum of squared differences across all dimensions: d(a, b) = √(Σ(a_i - b_i)²). In vector databases and similarity search, Euclidean distance measures how far apart two embedding vectors are in the geometric sense — smaller distances indicate more similar vectors. Euclidean distance is one of the most intuitive distance metrics because it corresponds to physical distance in 2D and 3D space, extending naturally to high-dimensional embedding spaces. In vector search applications, it is commonly used for: image embeddings (where spatial relationships in embedding space correspond to visual similarity), recommendation systems (where items are represented as points in a feature space), and anomaly detection (identifying points far from cluster centers). Comparison with other distance metrics used in vector databases: cosine similarity measures the angle between vectors regardless of magnitude — preferred for text embeddings because document length shouldn't affect semantic similarity; dot product measures alignment and magnitude together — used when embedding magnitudes carry meaning; and Manhattan distance (L1) sums absolute differences rather than squared differences — more robust to outliers in individual dimensions. Important considerations for high-dimensional spaces: the curse of dimensionality causes Euclidean distances to concentrate — in very high dimensions, the difference between the nearest and farthest points becomes proportionally small, reducing discriminative power. This is why dimensionality reduction and approximate nearest neighbor algorithms (HNSW, IVF, product quantization) are essential for practical vector search. For normalized vectors (unit length), Euclidean distance and cosine similarity are monotonically related: d² = 2(1 - cos(θ)), meaning they produce identical nearest-neighbor rankings — so the choice between them is irrelevant for normalized embeddings.

Want to learn more?