Distributed Memory Programming and Domain Decomposition

Distributed Memory Programming and Domain Decomposition is the parallel computing methodology where a large computational domain is partitioned into subdomains, each processed by a separate MPI rank on its own memory space, with explicit message passing to exchange boundary data (ghost cells/halo regions) between neighboring subdomains — the foundational approach for scaling scientific simulations (fluid dynamics, molecular dynamics, climate models) across thousands of compute nodes. Domain decomposition transforms a single large problem that would not fit in one machine's memory into a distributed problem that scales to any desired size.

Why Distributed Memory (Not Shared Memory)?

- Shared memory (OpenMP): Scales to ~100 cores on a single node → limited.
- Distributed memory (MPI): Scales to 10,000+ nodes → petaflop-class computation.
- Memory wall: A 10-terabyte simulation domain cannot fit in one node's RAM → must distribute.
- MPI model: Each process has its own private memory → no automatic data sharing → explicit messages.

Domain Decomposition

- Divide the simulation domain (e.g., 3D grid, graph, mesh) into P subdomains (P = number of MPI ranks).
- Each subdomain assigned to one MPI rank → owned by that process's memory.
- Goal: Minimize communication (boundary data exchange) while balancing computation load.

1D, 2D, 3D Decomposition

| Decomposition | Communication Partners | Surface-to-Volume Ratio |
|--------------|----------------------|------------------------|
| 1D (slab) | 2 neighbors | High (large surfaces) |
| 2D (pencil) | 4 neighbors | Medium |
| 3D (cube) | 6 neighbors | Lowest (best scalability) |

- 3D decomposition scales best: Surface grows as P^(2/3) while volume grows as P → communication fraction decreases with P.

Ghost Cells (Halo Regions)

- Each subdomain needs boundary data from neighboring subdomains to compute stencil operations (finite difference, finite element).
- Ghost cells: Extra rows/columns/layers at subdomain boundary → filled from neighbor data.
- Halo width: Determined by stencil width (nearest-neighbor → 1 cell halo; 5-point stencil → 1 halo; higher-order → wider halo).
- Halo exchange: MPI sends/receives boundary data to/from each neighbor → fill ghost cells → then compute interior.

Halo Exchange Pattern

``MPI Rank 0: MPI Rank 1: ┌──────────┬─ghost─┐ ┌─ghost─┬──────────┐ │ owned │ ←──────── MPI Send ────→ owned │ │ data │ │ │ │ data │ └──────────┴───────┘ └───────┴──────────┘`

MPI Communication Patterns

- MPI_Sendrecv(): Send to one neighbor + receive from other simultaneously → deadlock-free exchange. -MPI_Isend/Irecv(): Non-blocking → overlap communication with computation of interior cells. -MPI_Waitall()`: Wait for all non-blocking communications to complete before using ghost data.
- Optimized: Start halo exchange → compute interior (away from boundary) → wait for halos → compute boundary.

Load Balancing

- Static: Divide domain equally → works for uniform computation (structured grids).
- Dynamic: Some subdomains have more work (physics events, adaptive mesh refinement) → rebalance.
- Dynamic load balancing: Periodic remapping → METIS, ParMETIS graph partitioning → minimize cut edges → minimize communication.

Applications of Domain Decomposition

| Application | Domain Type | Decomposition |
|------------|------------|---------------|
| Weather/climate models | 3D atmosphere grid | 2D or 3D slab |
| Molecular dynamics (LAMMPS) | Particle positions | 3D spatial cube |
| Finite element analysis (ANSYS, OpenFOAM) | Unstructured mesh | Graph partitioning |
| Turbulence simulation (DNS) | 3D Cartesian grid | Pencil (2D) |
| Lattice Boltzmann | 3D grid | 3D block |

Scalability Analysis

- Strong scaling: Fixed problem, increase P → communication fraction increases → efficiency drops.
- Weak scaling: Problem grows with P → communication fraction constant → ideal scaling.
- Amdahl serial fraction: Even 1% serial code → max speedup = 100× → limits strong scaling.
- Halo-to-interior ratio: As P increases, each rank's domain shrinks → halo fraction grows → communication dominates → limits strong scaling.

Distributed memory programming with domain decomposition is the engine of scientific discovery at planetary scale — enabling climate simulations that model every square kilometer of Earth's atmosphere, molecular dynamics simulations with billions of atoms, and turbulence studies at Reynolds numbers unreachable with any smaller system, these techniques transform the impossible into the merely expensive, making large-scale distributed memory programming one of the most consequential engineering disciplines in modern science and engineering.

Distributed Memory Programming and Domain Decomposition

Want to learn more?