Home Knowledge Base ScaNN: Google's Efficient Vector Search

ScaNN: Google's Efficient Vector Search

Overview ScaNN (Scalable Nearest Neighbors) is a vector similarity search library open-sourced by Google Research. It powers search inside many Google products. It is known for state-of-the-art performance, often beating HNSW and FAISS in benchmarks (ann-benchmarks).

Key Innovation: Anisotropic Quantization Standard vector compression (Quantization) creates loss errors that are directionally uniform. ScaNN prioritizes accuracy for high inner-product values (the ones that matter for search results) and sacrifices accuracy for low values. Result: Higher recall for the same compression rate.

Architecture 1. Partitioning: Divide space into regions (like IVFFlat). 2. Scoring: Score points in the region using SIMD-optimized routines. 3. Rescoring: Re-check the top candidates with full precision.

Usage (Python)

import scann
import numpy as np

# Create Index
searcher = scann.scann_ops_pybind.builder(dataset, 10, "dot_product")
    .tree(num_leaves=2000, num_leaves_to_search=100, training_sample_size=250000)
    .score_ah(2, anisotropic_quantization_threshold=0.2)
    .reorder(100)
    .build()

# Search
neighbors, distances = searcher.search(query_vector, final_num_neighbors=10)

Pros/Cons

Use ScaNN when you need maximum query throughput on CPU hardware.

scanngoogleefficient

Related Topics

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.