Home Knowledge Base Unified Memory

Unified Memory is the CUDA programming model that provides a single memory address space accessible from both CPU and GPU — automatically migrating data between host and device on-demand through page faulting, eliminating explicit cudaMemcpy calls and enabling memory oversubscription (using more GPU memory than physically available), simplifying development while achieving 70-95% of manual memory management performance when properly optimized with prefetching and usage hints.

Unified Memory Fundamentals:

Page Migration and Faulting:

Prefetching and Hints:

Memory Advice Flags:

Performance Optimization:

Multi-GPU Unified Memory:

Limitations and Trade-offs:

Use Cases:

Performance Comparison:

Unified Memory is the productivity-enhancing feature that simplifies CUDA programming by eliminating explicit memory management — when combined with strategic prefetching and memory advice, it achieves near-optimal performance while providing automatic data migration, memory oversubscription, and simplified multi-GPU programming, making it the preferred memory model for modern CUDA applications.

unified memory cudamanaged memory allocationpage migration gpuprefetching unified memorymemory oversubscription

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.