On 3/16/21 6:30 AM, JCampbell wrote:
> I also have no experience with climate modeling, but have been using OpenMP to improve computing performance in structural finite element analysis.
> My reason for this is that it is a simpler approach involving a single executable, without the complexity of distributed memory or multiple processors.[...]
It is a common misconception that shared memory programming is "simpler"
than distributed memory programming. It is just a matter of perspective
as to which programming model is adopted. For a wide class of problems,
single-program multiple-data (SPMD) distributed memory programming is
the simplest and most straightforward approach, especially when trying
to scale beyond the small-scale 8 to 16 processor range.
Message passing libraries were developed in the late 1980s when parallel
computing first became popular. Their main advantage then was the fact
that no compiler or language modifications were required, and thus they
were portable to a wide range of underlying hardware. MPI was initially
a collection of the features of these early libraries. It has been
around for almost 30 years now, so that is why application areas such as
climate modeling, fluid dynamics, and quantum chemistry began using it
and have continued to use it over time. Higher level libraries, such as
scalapack and the global array library have been developed on top of MPI.
It was not uncommon to see SPMD programs running on shared-memory
machines at that time that outperformed the compiler-driven
shared-memory programs. The SPMD model bypasses many performance
bottlenecks associated with memory bandwidth and cache coherence. And of
course, the shared-memory programs at that time were limited to specific
compilers on specific hardware, whereas the MPI SPMD programs were
portable to a wider range of compiler and hardware combinations, and in
the late 1980s and early 1990s, there was a wide range of hardware and
software combinations (compilers, operating systems, network
connectivity, CPU architectures, etc.).
OpenMP is an attempt to address those portability issues for the shared
memory model, but it is still a shared-memory model, so it is inherently
limited.
I view coarrays as another way to write SPMD programs that fits within
the MPI and global arrays application domain. I know there are
shared-memory implementations of coarrays, but I have not used those. I
first used coarray fortran in 2004. At that time it was just a Cray
compiler feature, so it was not portable. Now it is part of the fortran
language standard, which makes it more portable in some ways, but less
so in others. The problem I think is that MPI itself is pretty easy to
use in most applications, and it works with mixed-language fortran-C
applications, so coarrays have made only slow progress over the last
decade. But at that same time, MPI has not kept up with modern fortran
practices (scope hierarchy, allocatable arrays, pointers, etc.), so that
makes coarray programming more appealing.
I personally would have preferred if MPI had kept up with the fortran
language over the last 20 years, rather than to introduce coarrays as a
new programming layer into the language. What we have now is kind of the
worst situation possible for fortran. We are now in a situation where we
need to use an MPI based distributed approach for parallel scaling, but
use some combination of coarrays and OpenMP on the shared-memory
multicore nodes for maximum performance. That is a difficult programming
model. Maybe coarray implementations can somehow grow at both ends and
bridge that gap, allowing million-node SPMD programs and efficient use
of shared-memory nodes.
$.02 -Ron Shepard