Home Knowledge Base Euclidean distance

Euclidean distance (also called L2 distance or straight-line distance) measures the direct distance between two points in space, calculated using the Pythagorean theorem and the most common distance metric in machine learning.

What Is Euclidean Distance?

Mathematical Formula

2D (Plane): d = √[(x₂-x₁)² + (y₂-y₁)²]

Example: From (0,0) to (3,4) d = √[(3-0)² + (4-0)²] = √[9 + 16] = √25 = 5 units

N-Dimensional: d(A, B) = √[Σ(aᵢ - bᵢ)²] for i = 1 to n

Intuition: Sum of squared differences, then take square root

Python Implementation

NumPy Method:

import numpy as np

def euclidean_distance(a, b):
    """Calculate Euclidean distance between points."""
    return np.sqrt(np.sum((a - b)**2))

# Example
point1 = np.array([1, 2, 3])
point2 = np.array([4, 5, 6])
distance = euclidean_distance(point1, point2)
# = √[(4-1)² + (5-2)² + (6-3)²]
# = √[9 + 9 + 9] = √27 ≈ 5.196

SciPy (Optimized):

from scipy.spatial.distance import euclidean

distance = euclidean([1, 2, 3], [4, 5, 6])
# ≈ 5.196 (same result, highly optimized)

Scikit-learn (Pairwise):

from sklearn.metrics.pairwise import euclidean_distances

# Compare multiple points
X = [[1, 2], [3, 4], [5, 6]]
Y = [[1, 2], [7, 8]]

distances = euclidean_distances(X, Y)
# Returns matrix of all pairwise distances

Use Cases

K-Nearest Neighbors:

Clustering:

Anomaly Detection:

Image Similarity:

Recommendation Systems:

Information Retrieval:

Mathematical Properties

Metric Properties: 1. Non-negative: d(a,b) ≥ 0 2. Identity: d(a,a) = 0 3. Symmetry: d(a,b) = d(b,a) 4. Triangle inequality: d(a,c) ≤ d(a,b) + d(b,c)

Invariance:

Relationship to Other Metrics:

When to Use Euclidean Distance

✅ Excellent For:

❌ Not Ideal For:

Normalization Importance

Problem: Different feature scales distort distance

# Without normalization
person1 = [age=30, salary=50000]
person2 = [age=32, salary=51000]

distance = sqrt((32-30)² + (51000-50000)²)
         = sqrt(4 + 10^9)
         ≈ 31623  # Salary dominates!

Solution: Normalize before computing distance

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_normalized = scaler.fit_transform(X)

distance = euclidean(X_normalized[0], X_normalized[1])
# Now age and salary contribute equally

Performance Optimization

Squared Distance (avoid sqrt):

# If you only need relative distances
squared_distance = np.sum((a - b)**2)
# Ranking is same, but faster (no sqrt)

Vectorized Computation:

# Slow: Python loop
distances = [euclidean(point, reference) for point in points]

# Fast: NumPy vectorization
distances = np.sqrt(np.sum((points - reference)**2, axis=1))
# 100x+ faster for large arrays

Common Mistakes

Using on non-normalized features: Larger-scale features dominate ❌ High dimensions without care: Distances become less meaningful ❌ Computing distance on text data: Euclidean designed for numerical ❌ Not considering alternatives: Cosine better for high dimensions

Euclidean vs Manhattan vs Cosine

PropertyEuclideanManhattanCosine
Formula√Σ(dᵢ²)Σdᵢ1 - (A·B)/(‖A‖‖B‖)
High DimsStrugglesBetterBest
Sparse DataPoorBetterBest
InterpretationStraight lineGrid pathAngle
ScalingSensitiveLess sensitiveScale invariant

Benchmark Example

import numpy as np
import time

# Generate random points
X = np.random.randn(10000, 784)  # 10K images, 784 features

# Euclidean distance
start = time.time()
distances = np.sqrt(np.sum((X - X[0])**2, axis=1))
euclidean_time = time.time() - start

print(f"Euclidean: {euclidean_time:.4f}s")
# Typical: ~0.01s for 10K points

Euclidean distance is the foundation of geometric understanding in ML — simple yet powerful, it works beautifully for continuous features and serves as the baseline distance metric that all others are compared against.

euclidean distancel2metric

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.