Why hybrid?

Keywords: hybrid search,rag

Hybrid search combines dense (semantic) and sparse (keyword) retrieval for optimal results. Why hybrid?: Dense excels at semantic similarity but may miss exact matches; sparse catches exact keywords but misses synonyms. Together they cover both cases. Fusion methods: Reciprocal Rank Fusion (RRF) - combine ranked lists, Linear combination - weighted scores from both methods, Cascaded - sparse first then dense rerank. RRF formula: score = Σ 1/(k + rank_i) across retrieval systems, k typically 60. Implementation: Run BM25 + vector search in parallel, merge results, optionally rerank with cross-encoder. Score normalization: Min-max scaling, z-score normalization before combination. Weight tuning: Domain-specific - technical docs may favor keyword, conversational queries favor semantic. Production systems: Elasticsearch with dense vectors, Vespa, Weaviate hybrid mode. Results: 10-20% improvement over single-method retrieval on benchmarks. Best practices: Start with equal weights, tune on validation set, consider query-dependent weighting for advanced systems.

Want to learn more?

Search 13,225+ semiconductor and AI topics or chat with our AI assistant.

Search Topics Chat with CFSGPT