← Back to AI Factory Chat

AI Factory Glossary

9,967 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 72 of 200 (9,967 entries)

gpu memory hierarchy, gpu, hardware

Levels of memory on GPU.

gpu memory utilization, gpu, optimization

How much GPU memory is used.

gpu operator,device plugin,nvidia

NVIDIA GPU Operator manages GPU drivers and plugins in Kubernetes. Required for GPU workloads on k8s.

gpu utilization,optimization

Percentage of GPU compute actually being used.

gpu, gpus, graphics card, accelerator, parallel processing, cuda, opencl, graphics processing unit, compute

# GPU (Graphics Processing Unit) ## Graphics Processing Unit - **GPU (Graphics Processing Unit)**: A specialized processor designed for parallel processing tasks - **GPUs**: Plural form of GPU - **Graphics Card**: Physical hardware component containing a GPU, VRAM, and cooling system - **Accelerator**: Specialized hardware that offloads computation from the CPU ## Architecture Fundamentals ### Core Components - **Streaming Multiprocessors (SMs)**: Contain multiple CUDA cores for parallel execution - **VRAM (Video RAM)**: High-bandwidth memory dedicated to the GPU - **Memory Bus**: Data pathway between GPU and VRAM - **PCIe Interface**: Connection to the motherboard/CPU ### Parallelism Model GPUs excel at **SIMD** (Single Instruction, Multiple Data) operations: $$ \text{Speedup} = \frac{T_{\text{sequential}}}{T_{\text{parallel}}} \leq \frac{1}{(1-P) + \frac{P}{N}} $$ Where: - $P$ = Parallelizable fraction of code - $N$ = Number of parallel processors - This is **Amdahl's Law** ## Performance Metrics ### FLOPS (Floating Point Operations Per Second) $$ \text{FLOPS} = \text{Cores} \times \text{Clock Speed (Hz)} \times \text{FLOPs per cycle} $$ Example calculation for a GPU with 10,000 cores at 2 GHz: $$ \text{FLOPS} = 10{,}000 \times 2 \times 10^9 \times 2 = 40 \text{ TFLOPS} $$ ### Memory Bandwidth $$ \text{Bandwidth (GB/s)} = \frac{\text{Memory Clock (Hz)} \times \text{Bus Width (bits)} \times \text{Data Rate}}{8 \times 10^9} $$ ### Arithmetic Intensity $$ \text{Arithmetic Intensity} = \frac{\text{FLOPs}}{\text{Bytes Accessed}} $$ The **Roofline Model** bounds performance: $$ \text{Attainable FLOPS} = \min\left(\text{Peak FLOPS}, \text{Bandwidth} \times \text{Arithmetic Intensity}\right) $$ ## GPU Computing Concepts ### Thread Hierarchy (CUDA Model) - **Thread**: Smallest unit of execution - Each thread has unique indices: `threadIdx.x`, `threadIdx.y`, `threadIdx.z` - **Block**: Group of threads that can cooperate - Shared memory accessible within block - Maximum threads per block: typically 1024 - **Grid**: Collection of blocks - Total threads: $\text{Grid Size} \times \text{Block Size}$ ### Memory Hierarchy | Memory Type | Scope | Latency | Size | |-------------|-------|---------|------| | Registers | Thread | ~1 cycle | ~256 KB total | | Shared Memory | Block | ~5 cycles | 48-164 KB | | L1 Cache | SM | ~30 cycles | 128 KB | | L2 Cache | Device | ~200 cycles | 4-50 MB | | Global Memory (VRAM) | Device | ~400 cycles | 8-80 GB | ## Matrix Operations Key for AI/ML ### Matrix Multiplication Complexity Standard matrix multiplication for $A_{m \times k} \cdot B_{k \times n}$: $$ C_{ij} = \sum_{l=1}^{k} A_{il} \cdot B_{lj} $$ - **Time Complexity**: $O(m \times n \times k)$ - **Naive**: $O(n^3)$ for square matrices - **Strassen's Algorithm**: $O(n^{2.807})$ ### Tensor Core Operations Mixed-precision matrix multiply-accumulate: $$ D = A \times B + C $$ Where: - $A, B$ are FP16 (16-bit floating point) - $C, D$ are FP32 (32-bit floating point) Throughput comparison: - **FP32 CUDA Cores**: ~40 TFLOPS - **FP16 Tensor Cores**: ~300+ TFLOPS - **INT8 Tensor Cores**: ~600+ TFLOPS ## Power and Thermal Equations ### Thermal Design Power (TDP) $$ P_{\text{dynamic}} = \alpha \cdot C \cdot V^2 \cdot f $$ Where: - $\alpha$ = Activity factor - $C$ = Capacitance - $V$ = Voltage - $f$ = Frequency ### Temperature Relationship $$ T_{\text{junction}} = T_{\text{ambient}} + (P \times R_{\theta}) $$ Where $R_{\theta}$ is thermal resistance in °C/W. ## Deep Learning Operations ### Convolution (CNN) For a 2D convolution with input $I$, kernel $K$, output $O$: $$ O(i,j) = \sum_{m}\sum_{n} I(i+m, j+n) \cdot K(m,n) $$ Output dimensions: $$ O_{\text{size}} = \left\lfloor \frac{I_{\text{size}} - K_{\text{size}} + 2P}{S} \right\rfloor + 1 $$ Where: - $P$ = Padding - $S$ = Stride ### Attention Mechanism (Transformers) $$ \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V $$ Memory complexity: $O(n^2 \cdot d)$ where $n$ is sequence length. ## Major GPU Vendors ### NVIDIA - **Gaming**: GeForce RTX series - **Professional**: Quadro / RTX A-series - **Data Center**: A100, H100, H200, B100, B200 - **CUDA Ecosystem**: Dominant in AI/ML ### AMD - **Gaming**: Radeon RX series - **Data Center**: Instinct MI series (MI300X) - **ROCm**: Open-source GPU computing platform ### Intel - **Consumer**: Arc A-series - **Data Center**: Gaudi accelerators, Max series ## Code Example: CUDA Kernel ```cuda // Vector addition kernel __global__ void vectorAdd(float *A, float *B, float *C, int N) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx < N) { C[idx] = A[idx] + B[idx]; } } // Launch configuration int threadsPerBlock = 256; int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock; vectorAdd<<>>(d_A, d_B, d_C, N); ``` ## Formulas | Metric | Formula | |--------|---------| | Thread Index (1D) | $\text{idx} = \text{blockIdx.x} \times \text{blockDim.x} + \text{threadIdx.x}$ | | Memory Bandwidth | $BW = \frac{\text{Clock} \times \text{Width} \times 2}{8}$ GB/s | | FLOPS | $\text{Cores} \times \text{Freq} \times \text{FMA}$ | | Power Efficiency | $\frac{\text{TFLOPS}}{\text{Watts}}$ | | Utilization | $\frac{\text{Active Warps}}{\text{Max Warps}} \times 100\%$ | ## Architecture - NVIDIA CUDA - AMD ROCm

gqa (general question answering),gqa,general question answering,evaluation

Reasoning about visual scenes.

graceful degradation, llm optimization

Graceful degradation maintains partial functionality when components fail.

graceful degradation,reliability

System maintains functionality when components fail.

graclus pooling, graph neural networks

Graclus pooling uses deterministic graph coarsening algorithm for hierarchical graph classification.

gradcam, explainable ai

Visualize important regions for predictions.

gradcam, interpretability

Gradient-weighted Class Activation Mapping highlights image regions important for CNN predictions.

gradcam++, explainable ai

Improved version of GradCAM.

gradient accumulation in vit, computer vision

Simulate larger batches.

gradient accumulation steps, optimization

Virtual batch size multiplication.

gradient accumulation,effective batch

Gradient accumulation sums gradients over mini-batches before update. Simulates larger batch size when GPU memory is limited.

gradient accumulation,microbatch

Gradient accumulation sums gradients over multiple microbatches before update. Simulates larger batch size.

gradient accumulation,model training

Accumulate gradients over multiple mini-batches before updating weights.

gradient boosting for defect detection, data analysis

Boosted trees for identifying defects.

gradient boosting,xgboost,lgbm

Gradient boosting builds trees sequentially. XGBoost, LightGBM, CatBoost.

gradient bucketing, distributed training

Group gradients for efficient communication.

gradient centralization, optimization

Center gradients to improve training.

gradient clipping, training techniques

Gradient clipping bounds gradient norms preventing privacy leakage and training instability.

gradient clipping,max norm,stability

Gradient clipping limits gradient magnitude. Prevents exploding gradients. Typical max norm 1.0.

gradient clipping,model training

Cap gradient magnitude to prevent exploding gradients.

gradient compression for privacy, privacy

Compress gradients while preserving privacy.

gradient compression techniques, distributed training

Reduce gradient communication.

gradient compression,communication

Gradient compression reduces communication in distributed training. Quantize or sparsify gradients.

gradient episodic memory, gem, continual learning

Constrain gradients to preserve knowledge.

gradient flow in deep vits, computer vision

Maintaining gradients in very deep models.

gradient flow preservation,model training

Maintain gradient flow in sparse networks.

gradient masking, ai safety

Make gradients uninformative.

gradient noise, optimization

Add noise to gradients for regularization.

gradient normalization, optimization

Normalize gradient magnitude.

gradient penalty, generative models

Regularize gradient magnitude (GANs).

gradient quantization for communication, distributed training

Quantize gradients for transmission.

gradient reversal layer, domain adaptation

Reverse gradients for adversarial training.

gradient scaling, optimization

Scale gradients to prevent underflow.

gradient sparsification, optimization

Send only significant gradient components.

gradient synchronization, distributed training

Aggregate gradients across devices.

gradient-based masking, nlp

Mask tokens with large gradients.

gradient-based nas, neural architecture

Optimize architecture with gradients.

gradient-based prompt tuning,fine-tuning

Optimize continuous prompt embeddings using gradients.

gradient-based pruning, model optimization

Gradient-based pruning estimates weight importance using gradient information.

gradient-based pruning,model optimization

Use gradients to determine importance.

gradient,backprop,backward pass

Backpropagation computes gradients via chain rule, flowing error from output to input. Gradients update weights via optimizer.

gradio,interface,demo

Gradio creates ML demo interfaces quickly. Hugging Face integration. Share instantly.

gradual rollout,deployment

Slowly increase traffic to new version.

gradual rollout,percentage,traffic

Gradually increase traffic to new model: 1%, 10%, 50%, 100%. Monitor metrics at each stage.

gradual unfreezing, fine-tuning

Unfreeze from top to bottom gradually.

grafana,dashboard,visualize

Grafana dashboards visualize metrics. Alerts on thresholds. Operations visibility.