On Thursday, April 28, 2016 at 5:13:54 AM UTC-7, Alberto Ramos wrote:
> Hi everybody,
>
> FORTRAN coarrays are supposed to be able to substitute both MPI and openMP.
As others have noted, coarrays could substitute for many aspects of MPI, but they
can also be mixed harmoniously with MPI. Cray's documentation says this explicitly.
Intel's documentation was silent on the matter when I first looked several years ago,
but I would imagine that has changed.
> For me it is very clear how coarrays imitate a typical MPI code.
Hmm... in principle, I can imagine why that would seem clear. In practice, not so
much: a "typical" MPI code uses two-sided communication; whereas a code based
on MPI-3 one-sided communication would be a much closer comparison to
coarrays. Most application developers with whom I've talked, however, had either
not heard about one-sided MPI or had investigated it and found the switch from
two-sided to one-sided communication too time-consuming and troublesome to
make the transition. And that transition will become increasingly necessary to
take advantage of the latest interconnects. This is where it's time to finally
relegate MPI to its originally intended purpose as the assembly language of
parallel programming. Let the compiler or parallel runtime library generate
the one-sided MPI-3.
> But for the case of openMP the situation is not that clear.
>
This owes at least partly to the fact that we often closely associate a language
or programming model with a specific programming paradigm -- probably
because of the way in which the developers of the model or language
describe their intentions. The first sentence of the OpenMP standard's Execution
Model section states, "The OpenMP API uses the fork-join model of parallel
execution." By contrast, section 1.1 of the Fortran 2008 standard states,
"Coarrays and synchronization constructs support parallel programming using
a single program multiple data (SPMD) model."
Fork/Join and SPMD might at first seem orthogonal, but I recall seeing a couple
of fascinating journal articles from the early 2000's in which the authors describe
an approach that involves forking all threads at the beginning of execution and
only joining them at the end of execution. In that approach, an executing OpenMP
program resembles in many ways an SPMD coarray program except that the
OpenMP program uses logically shared memory (which doesn't necessarily have
to imply physically shared memory but burdens the hardware with maintaining
cache coherency and such), while the coarray program uses logically distributed
memory (which could be physically shared memory and the coarray implementation
might exploit that fact where possible).
> My question is the following is there a way to tell that several coarrays
> share the same variables (space in memory)?
No.
>
> The idea is that a loop
>
> do i = 1, N
> a(i) = b(i) + c(i)
> enddo
>
> Can be split into different images *without* having to transfer data between
> coarrays.
I'm not sure I'm understanding, but because you're only showing one loop, are
you imagining a fork/join execution model. Coarray programs cannot
fork new images after program start-up. If the data is created in a distributed
manner, then no transfer is necessary in the above example -- although I'm
inferring a lot because your syntax doesn't explicitly show any coarrays.
>
> I guess the answer to the previous question is not. In that case, one could
> still transfer the data between images, but one would like these images not
> to be on different nodes accross a network (this would make the code
> terribly ineficient). Is there a way to know which images are on the same
> node in a run?
The standard doesn't guarantee this, but Bob Numrich, the co-inventor of
coarrays, envisioned that a program might use multiple co-dimensions to
describe locality. The first codimension could span the cores on chip.
The second cxdimension could span the chips in a node. And the third
cxdimension could span across nodes. See slide 13 of the following
presentation:
http://tinyurl.com/z56f4ve.
At least one vendor has indicated to me that such a mapping is is likely to
be accurate even though they don't guarantee it. I like the idea, but I'm
uncertain about the extent to which I want to get involved with describing
the hardware layout in my code. Looking at where things are headed with
many-core processors, do I also want to design my data structures in a
way that somehow reflects the 2D mesh arrangement of the 36 tiles,
each containing 2 cores, on a single processor? (See slide 4 of
the following presentation:
http://tinyurl.com/j3saxhg) I hope not, but
then again, we might have no choice if we want performance. I really
don't know the answer.
>
> In more general terms: Does anyone have concrete working examples of an
> approach equivalent to a mix of MPI/openMP but using only coarray fortran?
Would papers suffice or are you looking for source code? There are several
papers that report on mixed coarray/OpenMP code.
If you're starting from scratch and your use for OpenMP is multi-threading,
however, you might consider Pthreads instead of OpenMP. Even the recent
OpenMP 4.0 standard explicitly states it it does not support numerous
Fortran 2003 features, including type extension, type-bound procedures,
polymorphic entities, defined operators, parameterized derived types, etc.
I would imagine that some of these features might work anyway, but I'd hate
to open that can of worms. If inserting OpenMP really does restrict the code
that much, then it will greatly limit the ability to take advantage of lots of
newer language features.
If your use for OpenMP is vectorization, I wonder how much
DO CONCURRENT would help. But before going down this path at all,
it's worth seeing if you can get the performance you desire by simply
running coarray Fortran with one image per core. If that's the case, then
anything more sophisticated could wait until the compilers and parallel
runtime libraries are sophisticated enough to map images to threads.
I'm fairly certain this day will come. We won't necessarily work on it
anytime soon in OpenCoarrays, but it's certainly on our long-term
roadmap to evaluate someday.
Damian