← Back to AI Factory Chat

AI Factory Glossary

145 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 1 of 3 (145 entries)

r-gcn, r-gcn, graph neural networks

Relational Graph Convolutional Networks extend GCNs to multi-relational data by learning separate transformation matrices for each edge type.

rademacher complexity, advanced training

Rademacher complexity bounds generalization error by measuring how well a function class can fit random noise.

radiology report generation,healthcare ai

Describe medical images in text.

rag, retrieval, knowledge base, retrieval augmented generation, vector search, embeddings, semantic search, llm

# RAG, Retrieval, and Knowledge Bases A comprehensive technical guide with mathematical foundations ## 1. Overview **RAG (Retrieval-Augmented Generation)** is an architecture that enhances Large Language Models (LLMs) by grounding their responses in external knowledge sources. ### Core Components - **Generator**: The LLM that produces the final response - **Retriever**: The system that finds relevant documents - **Knowledge Base**: The corpus of documents being searched ## 2. Mathematical Foundations ### 2.1 Vector Embeddings Documents and queries are converted to dense vectors in $\mathbb{R}^d$ where $d$ is the embedding dimension (typically 384, 768, or 1536). **Embedding Function:** $$ E: \text{Text} \rightarrow \mathbb{R}^d $$ For a document $D$ and query $Q$: $$ \vec{d} = E(D) \in \mathbb{R}^d $$ $$ \vec{q} = E(Q) \in \mathbb{R}^d $$ ### 2.2 Similarity Metrics #### Cosine Similarity $$ \text{sim}_{\cos}(\vec{q}, \vec{d}) = \frac{\vec{q} \cdot \vec{d}}{||\vec{q}|| \cdot ||\vec{d}||} = \frac{\sum_{i=1}^{d} q_i \cdot d_i}{\sqrt{\sum_{i=1}^{d} q_i^2} \cdot \sqrt{\sum_{i=1}^{d} d_i^2}} $$ #### Euclidean Distance (L2) $$ \text{dist}_{L2}(\vec{q}, \vec{d}) = ||\vec{q} - \vec{d}||_2 = \sqrt{\sum_{i=1}^{d} (q_i - d_i)^2} $$ #### Dot Product $$ \text{sim}_{\text{dot}}(\vec{q}, \vec{d}) = \vec{q} \cdot \vec{d} = \sum_{i=1}^{d} q_i \cdot d_i $$ ### 2.3 BM25 (Sparse Retrieval) $$ \text{BM25}(Q, D) = \sum_{i=1}^{n} \text{IDF}(q_i) \cdot \frac{f(q_i, D) \cdot (k_1 + 1)}{f(q_i, D) + k_1 \cdot \left(1 - b + b \cdot \frac{|D|}{\text{avgdl}}\right)} $$ Where: - $f(q_i, D)$ = frequency of term $q_i$ in document $D$ - $|D|$ = document length - $\text{avgdl}$ = average document length in corpus - $k_1$ = term frequency saturation parameter (typically 1.2–2.0) - $b$ = length normalization parameter (typically 0.75) **Inverse Document Frequency (IDF):** $$ \text{IDF}(q_i) = \ln\left(\frac{N - n(q_i) + 0.5}{n(q_i) + 0.5} + 1\right) $$ Where: - $N$ = total number of documents - $n(q_i)$ = number of documents containing $q_i$ ## 3. RAG Pipeline Architecture ### 3.1 Pipeline Stages 1. **Indexing Phase** - Document ingestion - Chunking strategy selection - Embedding generation - Vector storage 2. **Query Phase** - Query embedding: $\vec{q} = E(Q)$ - Top-$k$ retrieval: $\mathcal{D}_k = \text{argmax}_{D \in \mathcal{C}}^k \text{sim}(\vec{q}, \vec{d})$ - Context assembly - LLM generation ### 3.2 Retrieval Formula Given a query $Q$ and corpus $\mathcal{C}$, retrieve top-$k$ documents: $$ \mathcal{D}_k = \{D_1, D_2, ..., D_k\} \quad \text{where} \quad \text{sim}(Q, D_1) \geq \text{sim}(Q, D_2) \geq ... \geq \text{sim}(Q, D_k) $$ ### 3.3 Generation with Context $$ P(\text{Response} | Q, \mathcal{D}_k) = \text{LLM}(Q \oplus \mathcal{D}_k) $$ Where $\oplus$ denotes context concatenation. ## 4. Chunking Strategies ### 4.1 Fixed-Size Chunking - **Chunk size**: $c$ tokens (typically 256–1024) - **Overlap**: $o$ tokens (typically 10–20% of $c$) $$ \text{Number of chunks} = \left\lceil \frac{|D| - o}{c - o} \right\rceil $$ ### 4.2 Semantic Chunking - Split by semantic boundaries (paragraphs, sections) - Use sentence embeddings to detect topic shifts - Threshold: $\theta$ for similarity drop detection $$ \text{Split at } i \quad \text{if} \quad \text{sim}(s_i, s_{i+1}) < \theta $$ ### 4.3 Recursive Chunking - Hierarchical splitting: Document → Sections → Paragraphs → Sentences - Maintains context hierarchy ## 5. Knowledge Base Design ### 5.1 Metadata Schema ```json { "chunk_id": "string", "document_id": "string", "content": "string", "embedding": "vector[d]", "metadata": { "source": "string", "title": "string", "author": "string", "date_created": "ISO8601", "date_modified": "ISO8601", "section": "string", "page_number": "integer", "chunk_index": "integer", "total_chunks": "integer", "tags": ["string"], "confidence_score": "float" } } ``` ### 5.2 Index Types - **Flat Index**: Exact search, $O(n)$ complexity - **IVF (Inverted File)**: Approximate, $O(\sqrt{n})$ complexity - **HNSW (Hierarchical Navigable Small World)**: Graph-based, $O(\log n)$ complexity **HNSW Search Complexity:** $$ O(d \cdot \log n) $$ Where $d$ is embedding dimension and $n$ is corpus size. ## 6. Evaluation Metrics ### 6.1 Retrieval Metrics #### Recall@k $$ \text{Recall@}k = \frac{|\text{Relevant} \cap \text{Retrieved@}k|}{|\text{Relevant}|} $$ #### Precision@k $$ \text{Precision@}k = \frac{|\text{Relevant} \cap \text{Retrieved@}k|}{k} $$ #### Mean Reciprocal Rank (MRR) $$ \text{MRR} = \frac{1}{|Q|} \sum_{i=1}^{|Q|} \frac{1}{\text{rank}_i} $$ #### Normalized Discounted Cumulative Gain (NDCG) $$ \text{DCG@}k = \sum_{i=1}^{k} \frac{2^{\text{rel}_i} - 1}{\log_2(i + 1)} $$ $$ \text{NDCG@}k = \frac{\text{DCG@}k}{\text{IDCG@}k} $$ ### 6.2 Generation Metrics - **Faithfulness**: Is response grounded in retrieved context? - **Relevance**: Does response answer the query? - **Groundedness Score**: $$ G = \frac{|\text{Claims supported by context}|}{|\text{Total claims}|} $$ ## 7. Advanced Techniques ### 7.1 Hybrid Search Combine dense and sparse retrieval: $$ \text{score}_{\text{hybrid}} = \alpha \cdot \text{score}_{\text{dense}} + (1 - \alpha) \cdot \text{score}_{\text{sparse}} $$ Where $\alpha \in [0, 1]$ is the weighting parameter. ### 7.2 Reranking Apply cross-encoder reranking to top-$k$ results: $$ \text{score}_{\text{rerank}}(Q, D) = \text{CrossEncoder}(Q, D) $$ Cross-encoder complexity: $O(k \cdot |Q| \cdot |D|)$ ### 7.3 Query Expansion - **HyDE (Hypothetical Document Embeddings)**: $$ \vec{q}_{\text{HyDE}} = E(\text{LLM}(Q)) $$ - **Multi-Query Retrieval**: $$ \mathcal{D}_{\text{merged}} = \bigcup_{i=1}^{m} \text{Retrieve}(Q_i) $$ ### 7.4 Contextual Compression Reduce retrieved context before generation: $$ C_{\text{compressed}} = \text{Compress}(\mathcal{D}_k, Q) $$ ## 8. Vector Database Options | Database | Index Types | Hosting | Scalability | |----------|-------------|---------|-------------| | Pinecone | HNSW, IVF | Cloud | High | | Weaviate | HNSW | Self/Cloud | High | | Qdrant | HNSW | Self/Cloud | High | | Milvus | IVF, HNSW | Self/Cloud | Very High | | FAISS | Flat, IVF, HNSW | Self | Medium | | Chroma | HNSW | Self | Low-Medium | | pgvector | IVFFlat, HNSW | Self | Medium | ## 9. Best Practices Checklist - Choose appropriate chunk size based on content type - Implement chunk overlap to preserve context - Store rich metadata for filtering - Use hybrid search for better recall - Implement reranking for precision - Monitor retrieval metrics continuously - Evaluate groundedness of generated responses - Handle edge cases (no results, low confidence) - Implement caching for common queries - Version control your knowledge base ## 10. Code Examples ### 10.1 Cosine Similarity (Python) ```python import numpy as np def cosine_similarity(vec_q: np.ndarray, vec_d: np.ndarray) -> float: """ Calculate cosine similarity between two vectors. $$\text{sim}_{\cos}(\vec{q}, \vec{d}) = \frac{\vec{q} \cdot \vec{d}}{||\vec{q}|| \cdot ||\vec{d}||}$$ """ dot_product = np.dot(vec_q, vec_d) norm_q = np.linalg.norm(vec_q) norm_d = np.linalg.norm(vec_d) return dot_product / (norm_q * norm_d) ``` ### 10.2 BM25 Implementation ```python import math from collections import Counter def bm25_score( query_terms: list[str], document: list[str], corpus: list[list[str]], k1: float = 1.5, b: float = 0.75 ) -> float: """ Calculate BM25 score for a query-document pair. """ doc_len = len(document) avg_doc_len = sum(len(d) for d in corpus) / len(corpus) doc_freq = Counter(document) N = len(corpus) score = 0.0 for term in query_terms: # Document frequency n_q = sum(1 for d in corpus if term in d) # IDF calculation idf = math.log((N - n_q + 0.5) / (n_q + 0.5) + 1) # Term frequency in document f_q = doc_freq.get(term, 0) # BM25 term score numerator = f_q * (k1 + 1) denominator = f_q + k1 * (1 - b + b * (doc_len / avg_doc_len)) score += idf * (numerator / denominator) return score ```

rainbow dqn, reinforcement learning

Combine multiple DQN improvements.

raised floor,facility

Elevated floor creating space underneath for utilities cabling and air return.

raised source-drain, process integration

Raised source-drain structures elevate junctions above channel reducing parasitic resistance and capacitance.

random failure, business & standards

Random failures occur unpredictably at constant rate during useful life period.

random feature attention,llm architecture

Approximate attention with random features.

random grain boundary, defects

Non-special boundary.

random routing, llm architecture

Random routing assigns tokens to experts stochastically.

random search,model training

Sample random hyperparameter combinations.

random synthesizer, transformer

Use random attention patterns.

randomized smoothing, ai safety

Provable robustness via smoothing.

rare earth recovery, environmental & sustainability

Rare earth recovery extracts valuable lanthanides from electronic waste for reuse.

rate limiting, llm optimization

Rate limiting restricts request frequency preventing abuse and ensuring fair access.

ray marching, multimodal ai

Ray marching samples points along rays through scene accumulating color and opacity.

rba, rba, environmental & sustainability

Responsible Business Alliance establishes labor rights and environmental standards for electronics supply chains.

re-sampling strategies, machine learning

Adjust sampling to handle imbalance.

reachability analysis, ai safety

Compute reachable output set.

react (reasoning + acting),react,reasoning + acting,ai agent

Agent pattern where model alternates between reasoning steps and taking actions.

reaction condition recommendation, chemistry ai

Suggest optimal reaction conditions.

reaction extraction, chemistry ai

Extract chemical reactions from text.

reaction prediction, chemistry ai

Predict products of chemical reactions.

readout functions, graph neural networks

Readout functions aggregate node features into graph-level representations for classification or regression tasks.

reagent selection, chemistry ai

Choose appropriate reagents for synthesis.

real-esrgan, multimodal ai

Real-ESRGAN uses adversarial training for practical blind super-resolution.

realm (retrieval-augmented language model),realm,retrieval-augmented language model,foundation model

End-to-end training of retriever and generator.

recency bias, training phenomena

Recent examples influence more.

recurrent memory transformer, llm architecture

Combine attention with external memory.

recurrent memory transformer,llm architecture

Combine recurrence with attention.

recurrent state space models, rssm, reinforcement learning

Model dynamics with recurrent and stochastic components.

recurrent video models, video understanding

RNN/LSTM for temporal modeling.

recursive forecasting, time series models

Recursive forecasting uses one-step-ahead model iteratively feeding predictions as inputs for multi-step forecasts.

recursive reward modeling, ai safety

Supervise reward model with another model.

recursive reward, ai safety

Recursive reward modeling trains reward models using other reward models hierarchically.

red teaming, ai safety

Adversarial testing by human experts.

red teaming,ai safety

Adversarial testing to find model vulnerabilities weaknesses or harmful behaviors.

red-teaming, ai safety

Red-teaming probes models for harmful behaviors informing safety training.

reference image conditioning, generative models

Use reference image to guide.

reference image, multimodal ai

Reference images guide generation by transferring style or content through adapter networks.

referring expression comprehension, multimodal ai

Find objects from descriptions.

referring expression generation, multimodal ai

Generate descriptions of objects.

reflection agent, ai agents

Reflection enables agents to critique their own plans and outputs identifying improvements.

reflexion,ai agent

Agent learns from feedback and mistakes by generating reflections and improving.

reformer, llm architecture

Reformer uses locality-sensitive hashing for approximate attention matching.

reformer,foundation model

Use LSH attention to reduce complexity from quadratic to linear.

refusal behavior, ai safety

Model declining to answer.

refusal calibration, ai safety

Balance safety and helpfulness.

refusal training, ai safety

Refusal training explicitly teaches models when to decline requests.