Home Knowledge Base Vector databases

Vector databases are specialized storage systems optimized for storing, indexing, and searching high-dimensional embedding vectors — enabling fast similarity search across millions to billions of vectors, essential infrastructure for RAG systems, semantic search, recommendation engines, and any application requiring finding "similar" items in embedding space.

What Are Vector Databases?

Why Vector Databases Matter

Core Concepts

Embedding Vectors:

Distance Metrics:

Metric              | Formula                    | Use Case
--------------------|----------------------------|------------------
Cosine Similarity   | 1 - (A·B)/(|A||B|)        | Text embeddings
Euclidean (L2)      | sqrt(Σ(ai-bi)²)            | Image features
Dot Product (IP)    | A·B                        | Normalized vectors

Index Types:

Major Vector Databases

Dedicated Vector DBs:

Database   | Highlights                        | Best For
-----------|-----------------------------------|------------------
FAISS      | Meta, library, CPU/GPU            | Research, embedded
Milvus     | Distributed, scalable, open source| Large-scale prod
Qdrant     | Rust, filtering, rich features    | Production RAG
Pinecone   | Managed, serverless, easy         | Quick start, scale
Weaviate   | Hybrid search, GraphQL            | Complex queries
ChromaDB   | Simple, embedded, dev-friendly    | Prototyping, local

Database Extensions:

Performance Comparison

Database    | Vectors   | QPS (K=10) | Recall@10
------------|-----------|------------|----------
Milvus      | 1B        | 2,000+     | 95%+
Qdrant      | 100M      | 5,000+     | 98%+
Pinecone    | 1B        | ~1,000     | 95%+
pgvector    | 10M       | ~500       | 99%+
ChromaDB    | 1M        | ~1,000     | 99%+

Varies significantly by hardware, index config, vector dimension

RAG Architecture with Vector DB

User Query: "How does photosynthesis work?"
       ↓
┌─────────────────────────────────────────┐
│  Embed query → [0.23, -0.45, ...]       │
├─────────────────────────────────────────┤
│  Vector DB similarity search            │
│  → Find top 5 most similar chunks       │
├─────────────────────────────────────────┤
│  Retrieved context + original query     │
├─────────────────────────────────────────┤
│  LLM generates response with context    │
└─────────────────────────────────────────┘
       ↓
Response: "Photosynthesis is the process by which..."

Key Features to Consider

Selection Criteria

Vector databases are the infrastructure foundation for semantic AI applications — as more applications need to find "similar" rather than "exact" matches, vector databases provide the scalable, fast retrieval that makes RAG, recommendation systems, and semantic search practical at production scale.

vector dbfaissmilvusqdrantpineconechromadbweaviateembeddingssimilarity search

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.