Distributed Shared Memory (DSM) and PGAS are the programming abstractions that present a single shared address space to processes running on physically separate machines, each with its own local memory โ allowing programmers to write parallel code using shared-memory semantics (reads, writes, pointers) while the runtime or hardware transparently handles data movement between nodes, bridging the ease of shared-memory programming with the scalability of distributed-memory systems.
DSM Concept
```
Physical reality: Programmer's view:
[Node 0: Local RAM] [Single Shared Address Space]
[Node 1: Local RAM] All nodes can read/write any address
[Node 2: Local RAM] Runtime handles data movement
Connected by network Transparent to application
DSM vs. Other Models
| Model | Abstraction | Communication | Example |
|-------|-----------|---------------|--------|
| Shared memory | Global address space | Load/store | OpenMP, pthreads |
| Message passing | Separate address spaces | Send/receive | MPI |
| DSM | Virtual shared address space | Load/store (with runtime) | OpenSHMEM, UPC |
| PGAS | Partitioned shared space | Local fast, remote explicit | Chapel, Co-array Fortran |
Software DSM Implementation
- Virtual memory (VM) based: Pages mapped across nodes.
- Page fault on remote access โ runtime fetches page from owning node โ maps locally.
- Consistency: Invalidation protocol (like hardware cache coherence but at page granularity).
- Granularity problem: Page (4KB) much larger than cache line (64B) โ false sharing severe.
PGAS (Partitioned Global Address Space)
`
Node 0 memory Node 1 memory Node 2 memory
[LOCAL | REMOTE] [LOCAL | REMOTE] [LOCAL | REMOTE]
โ fast โ slow โ fast โ slow โ fast โ slow
Each thread has fast LOCAL access + slower REMOTE access
Programmer controls data placement for performance
`
PGAS Languages
| Language | Developer | Key Feature |
|----------|----------|------------|
| UPC (Unified Parallel C) | UC Berkeley | C extension, shared arrays |
| Co-array Fortran | Standard (F2008) | Square bracket syntax for remote access |
| Chapel | Cray/HPE | High-level, productive, domain maps |
| X10 | IBM | Place-based, async activities |
| OpenSHMEM | Consortium | C/Fortran library, one-sided comms |
Chapel Example
`chapel
// Distributed array across all nodes
var A: [1..1000000] real dmapped Block(1..1000000);
// Each node owns a contiguous chunk
// Access any element with simple indexing:
A[500000] = 3.14; // Local or remote โ Chapel handles it
// Parallel loop โ each node processes its local elements
forall i in A.domain do
A[i] = compute(A[i]); // Runs locally where data resides
`
OpenSHMEM One-Sided Operations
`c
#include <shmem.h>
static long data[1000]; // Symmetric variable (exists on all PEs)
// PE 0 writes to PE 1's data array
if (shmem_my_pe() == 0) {
shmem_long_put(&data[100], local_buf, 50, 1); // Put 50 longs to PE 1
}
shmem_barrier_all();
// PE 1 reads from PE 0's data array
if (shmem_my_pe() == 1) {
shmem_long_get(local_buf, &data[0], 100, 0); // Get 100 longs from PE 0
}
``
Performance Considerations
| Access | Latency | Bandwidth |
|--------|---------|----------|
| Local memory | ~100 ns | ~200 GB/s (DDR5) |
| Remote (same rack, InfiniBand) | ~1-2 ยตs | ~25-50 GB/s |
| Remote (cross-rack) | ~5-10 ยตs | ~12-25 GB/s |
- Key optimization: Data locality โ keep accesses local, minimize remote.
- PGAS advantage over DSM: Programmer explicitly knows what's local vs. remote.
- PGAS advantage over MPI: Simpler syntax, one-sided (no matching recv needed).
Distributed shared memory and PGAS are the programming model bridge between shared-memory simplicity and distributed-memory scalability โ by providing a global address space abstraction over physically distributed memory, DSM and PGAS languages allow parallel programmers to write cleaner, more intuitive code for distributed systems while maintaining awareness of data locality for performance, making them increasingly relevant for large-scale scientific computing and emerging memory architectures like CXL-connected memory pools.