Home Knowledge Base Distributed Shared Memory (DSM) and PGAS

Distributed Shared Memory (DSM) and PGAS are the programming abstractions that present a single shared address space to processes running on physically separate machines, each with its own local memory — allowing programmers to write parallel code using shared-memory semantics (reads, writes, pointers) while the runtime or hardware transparently handles data movement between nodes, bridging the ease of shared-memory programming with the scalability of distributed-memory systems.

DSM Concept

 Physical reality:              Programmer's view:
 [Node 0: Local RAM]            [Single Shared Address Space]
 [Node 1: Local RAM]            All nodes can read/write any address
 [Node 2: Local RAM]            Runtime handles data movement
 Connected by network            Transparent to application

DSM vs. Other Models

ModelAbstractionCommunicationExample
Shared memoryGlobal address spaceLoad/storeOpenMP, pthreads
Message passingSeparate address spacesSend/receiveMPI
DSMVirtual shared address spaceLoad/store (with runtime)OpenSHMEM, UPC
PGASPartitioned shared spaceLocal fast, remote explicitChapel, Co-array Fortran

Software DSM Implementation

PGAS (Partitioned Global Address Space)

  Node 0 memory    Node 1 memory    Node 2 memory
  [LOCAL | REMOTE] [LOCAL | REMOTE] [LOCAL | REMOTE]
   ↑ fast   ↑ slow  ↑ fast   ↑ slow  ↑ fast   ↑ slow

  Each thread has fast LOCAL access + slower REMOTE access
  Programmer controls data placement for performance

PGAS Languages

LanguageDeveloperKey Feature
UPC (Unified Parallel C)UC BerkeleyC extension, shared arrays
Co-array FortranStandard (F2008)Square bracket syntax for remote access
ChapelCray/HPEHigh-level, productive, domain maps
X10IBMPlace-based, async activities
OpenSHMEMConsortiumC/Fortran library, one-sided comms

Chapel Example

// Distributed array across all nodes
var A: [1..1000000] real dmapped Block(1..1000000);
// Each node owns a contiguous chunk
// Access any element with simple indexing:
A[500000] = 3.14;  // Local or remote — Chapel handles it

// Parallel loop — each node processes its local elements
forall i in A.domain do
    A[i] = compute(A[i]);  // Runs locally where data resides

OpenSHMEM One-Sided Operations

#include <shmem.h>

static long data[1000];  // Symmetric variable (exists on all PEs)

// PE 0 writes to PE 1's data array
if (shmem_my_pe() == 0) {
    shmem_long_put(&data[100], local_buf, 50, 1);  // Put 50 longs to PE 1
}
shmem_barrier_all();

// PE 1 reads from PE 0's data array
if (shmem_my_pe() == 1) {
    shmem_long_get(local_buf, &data[0], 100, 0);  // Get 100 longs from PE 0
}

Performance Considerations

AccessLatencyBandwidth
Local memory~100 ns~200 GB/s (DDR5)
Remote (same rack, InfiniBand)~1-2 µs~25-50 GB/s
Remote (cross-rack)~5-10 µs~12-25 GB/s

Distributed shared memory and PGAS are the programming model bridge between shared-memory simplicity and distributed-memory scalability — by providing a global address space abstraction over physically distributed memory, DSM and PGAS languages allow parallel programmers to write cleaner, more intuitive code for distributed systems while maintaining awareness of data locality for performance, making them increasingly relevant for large-scale scientific computing and emerging memory architectures like CXL-connected memory pools.

distributed shared memorydsmsoftware dsmpartitioned global addresspgas

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.