COARRAY and openMP

Alberto Ramos

unread,

Apr 28, 2016, 8:13:54 AM4/28/16

to

Hi everybody,

FORTRAN coarrays are supposed to be able to substitute both MPI and openMP.
For me it is very clear how coarrays imitate a typical MPI code. But for the
case of openMP the situation is not that clear.

My question is the following is there a way to tell that several coarrays
share the same variables (space in memory)?

The idea is that a loop

do i = 1, N
a(i) = b(i) + c(i)
enddo

Can be split into different images *without* having to transfer data between
coarrays.

I guess the answer to the previous question is not. In that case, one could
still transfer the data between images, but one would like these images not
to be on different nodes accross a network (this would make the code
terribly ineficient). Is there a way to know which images are on the same
node in a run?

In more general terms: Does anyone have concrete working examples of an
approach equivalent to a mix of MPI/openMP but using only coarray fortran?

many thanks,

A.

vladim...@gmail.com

unread,

Apr 28, 2016, 10:08:03 AM4/28/16

to

> FORTRAN coarrays are supposed to be able to substitute both MPI and openMP.

By whom? Do you have a reference?

> My question is the following is there a way to tell that several coarrays
> share the same variables (space in memory)?

Coarrays are distributed memory concept (somewhat like MPI). Theoretically you could mix coarrays and OpenMP threads because they can be independent. I don't know if any compiler supports that though.

Vladimir

Anton Shterenlikht

unread,

Apr 28, 2016, 10:59:42 AM4/28/16

to

Alberto Ramos <albert...@desy.de> writes:

>FORTRAN coarrays are supposed to be able to substitute both MPI and openMP.

It might sound pedandic, but I think it's a very
important point. Coarrays and their behaviour are
defined in the Fortran standard. The Fortran standard
says nothing on OpenMP and nothing (in the normative text)
on MPI. Coarray usage must conform to the standard.
If one is then able to use, in addition to coarrays,
MPI or OpenMP or both, in a way that does not violate
the standard [1], then this is fine.
However, the behaviour can be only expected, not guaranteed.
The standard cannot say what is or is not a correct
behaviour of such program.

>My question is the following is there a way to tell that several coarrays
>share the same variables (space in memory)?

The standard does not use the term "share" when
dealing with coarrays. But in broad terms one
can say that coarray variables definitely do not
share storage.

>still transfer the data between images, but one would like these images not
>to be on different nodes accross a network (this would make the code
>terribly ineficient). Is there a way to know which images are on the same
>node in a run?

Again, the standard says nothing on this.
This is probably entirely OS and runtime
environment dependent. There might or might
not be compiler flags to influence this.
There might or might not be switches
for the batch queue affecting this.
I don't know of any.

>In more general terms: Does anyone have concrete working examples of an
>approach equivalent to a mix of MPI/openMP but using only coarray fortran?

ECMWF codes use MPI+OpenMP+coarrays [1].
We have a prototype MPI+coarrays code [2], but no OpenMP.

Anton

[1] Mozdzynski G. et al (2015) Int J. High Perf Comp 29:261-273.
DOI: 10.1177/1094342015576773

[2] https://sourceforge.net/p/parafem/code/HEAD/
tree/trunk/parafem/src/programs/dev/xx14/
Also p. 13 in http://eis.bris.ac.uk/~mexas/pub/2015acmff.pdf

>many thanks,

>A.

michael siehl

unread,

Apr 28, 2016, 11:14:41 AM4/28/16

to

Hi Alberto,
I personally use coarrays as a novel approach and do not compare it with MPI/OpenMP.
The one article, that I am aware of, that describes something similar is here:
http://phys.org/news/2015-08-future-weather-agency-titan-advance.html
Nevertheless, I can't tell if the overlap of communication with computation can be regarded as a complete substitute for OpenMP.
See also this recent technical report by Alessandro Fanfarillo et al., pointing out some of the current problems with todays MPI implementations and todays hardware (which may also apply to todays MPI-based coarray implementations):
https://art.torvergata.it/retrieve/handle/2108/140530/291158/mpiprog.pdf

Best Regards
Michael

fj

unread,

Apr 28, 2016, 12:17:21 PM4/28/16

to

On a multi-core machine, it is possible to implement coarrays using threads sharing a same memory like OpenMP does. But implementing coarrays with OpenMP seems a little bit strange...

Damian Rouson

unread,

Apr 29, 2016, 2:11:21 AM4/29/16

to

On Thursday, April 28, 2016 at 5:13:54 AM UTC-7, Alberto Ramos wrote:
> Hi everybody,
>
> FORTRAN coarrays are supposed to be able to substitute both MPI and openMP.

As others have noted, coarrays could substitute for many aspects of MPI, but they
can also be mixed harmoniously with MPI. Cray's documentation says this explicitly.
Intel's documentation was silent on the matter when I first looked several years ago,
but I would imagine that has changed.

> For me it is very clear how coarrays imitate a typical MPI code.

Hmm... in principle, I can imagine why that would seem clear. In practice, not so
much: a "typical" MPI code uses two-sided communication; whereas a code based
on MPI-3 one-sided communication would be a much closer comparison to
coarrays. Most application developers with whom I've talked, however, had either
not heard about one-sided MPI or had investigated it and found the switch from
two-sided to one-sided communication too time-consuming and troublesome to
make the transition. And that transition will become increasingly necessary to
take advantage of the latest interconnects. This is where it's time to finally
relegate MPI to its originally intended purpose as the assembly language of
parallel programming. Let the compiler or parallel runtime library generate
the one-sided MPI-3.

> But for the case of openMP the situation is not that clear.
>

This owes at least partly to the fact that we often closely associate a language
or programming model with a specific programming paradigm -- probably
because of the way in which the developers of the model or language
describe their intentions. The first sentence of the OpenMP standard's Execution
Model section states, "The OpenMP API uses the fork-join model of parallel
execution." By contrast, section 1.1 of the Fortran 2008 standard states,
"Coarrays and synchronization constructs support parallel programming using
a single program multiple data (SPMD) model."

Fork/Join and SPMD might at first seem orthogonal, but I recall seeing a couple
of fascinating journal articles from the early 2000's in which the authors describe
an approach that involves forking all threads at the beginning of execution and
only joining them at the end of execution. In that approach, an executing OpenMP
program resembles in many ways an SPMD coarray program except that the
OpenMP program uses logically shared memory (which doesn't necessarily have
to imply physically shared memory but burdens the hardware with maintaining
cache coherency and such), while the coarray program uses logically distributed
memory (which could be physically shared memory and the coarray implementation
might exploit that fact where possible).

> My question is the following is there a way to tell that several coarrays
> share the same variables (space in memory)?

No.

>
> The idea is that a loop
>
> do i = 1, N
> a(i) = b(i) + c(i)
> enddo
>
> Can be split into different images *without* having to transfer data between
> coarrays.

I'm not sure I'm understanding, but because you're only showing one loop, are
you imagining a fork/join execution model. Coarray programs cannot
fork new images after program start-up. If the data is created in a distributed
manner, then no transfer is necessary in the above example -- although I'm
inferring a lot because your syntax doesn't explicitly show any coarrays.

>
> I guess the answer to the previous question is not. In that case, one could
> still transfer the data between images, but one would like these images not
> to be on different nodes accross a network (this would make the code
> terribly ineficient). Is there a way to know which images are on the same
> node in a run?

The standard doesn't guarantee this, but Bob Numrich, the co-inventor of
coarrays, envisioned that a program might use multiple co-dimensions to
describe locality. The first codimension could span the cores on chip.
The second cxdimension could span the chips in a node. And the third
cxdimension could span across nodes. See slide 13 of the following
presentation:http://tinyurl.com/z56f4ve.

At least one vendor has indicated to me that such a mapping is is likely to
be accurate even though they don't guarantee it. I like the idea, but I'm
uncertain about the extent to which I want to get involved with describing
the hardware layout in my code. Looking at where things are headed with
many-core processors, do I also want to design my data structures in a
way that somehow reflects the 2D mesh arrangement of the 36 tiles,
each containing 2 cores, on a single processor? (See slide 4 of
the following presentation: http://tinyurl.com/j3saxhg) I hope not, but
then again, we might have no choice if we want performance. I really
don't know the answer.

>
> In more general terms: Does anyone have concrete working examples of an
> approach equivalent to a mix of MPI/openMP but using only coarray fortran?

Would papers suffice or are you looking for source code? There are several
papers that report on mixed coarray/OpenMP code.

If you're starting from scratch and your use for OpenMP is multi-threading,
however, you might consider Pthreads instead of OpenMP. Even the recent
OpenMP 4.0 standard explicitly states it it does not support numerous
Fortran 2003 features, including type extension, type-bound procedures,
polymorphic entities, defined operators, parameterized derived types, etc.
I would imagine that some of these features might work anyway, but I'd hate
to open that can of worms. If inserting OpenMP really does restrict the code
that much, then it will greatly limit the ability to take advantage of lots of
newer language features.

If your use for OpenMP is vectorization, I wonder how much
DO CONCURRENT would help. But before going down this path at all,
it's worth seeing if you can get the performance you desire by simply
running coarray Fortran with one image per core. If that's the case, then
anything more sophisticated could wait until the compilers and parallel
runtime libraries are sophisticated enough to map images to threads.
I'm fairly certain this day will come. We won't necessarily work on it
anytime soon in OpenCoarrays, but it's certainly on our long-term
roadmap to evaluate someday.

Damian

campbel...@gmail.com

unread,

Apr 29, 2016, 5:22:59 AM4/29/16

to

On Friday, April 29, 2016 at 4:11:21 PM UTC+10, Damian Rouson wrote:
>
> ... In that approach, an executing OpenMP

> program resembles in many ways an SPMD coarray program except that the
> OpenMP program uses logically shared memory (which doesn't necessarily have
> to imply physically shared memory but burdens the hardware with maintaining
> cache coherency and such),

> ...

> If your use for OpenMP is vectorization, I wonder how much
> DO CONCURRENT would help. But before going down this path at all,
> it's worth seeing if you can get the performance you desire by simply
> running coarray Fortran with one image per core. If that's the case, then
> anything more sophisticated could wait until the compilers and parallel
> runtime libraries are sophisticated enough to map images to threads.
> I'm fairly certain this day will come. We won't necessarily work on it
> anytime soon in OpenCoarrays, but it's certainly on our long-term
> roadmap to evaluate someday.
>
> Damian

My understanding is that OpenMP and Coarrays are very different but also similar. While Coarrays manage sharing information between different processors, OpemMP deals with sharing information between threads/cores/cache.

I have been investigating improved performance on a single multi-core processor by combining vectorization with OpenMP. I have found that for simple loops in each thread, vector instructions require cache coherency to work well. When using multiple threads, you use strategies to hopefully achieve this, but there is little control on what shared information is in L1, L2 or L3 cache. Depending on how independent the threads are, there is again little control on what "shared" variables are cohabitating in the cache. Managing what is in the cache is important to performance, but managing this is not part of Fortran or OpenMP instructions that I know.
Addressing this problem is probably not suitable for Fortran as it is a short term issue based on the current hardware architecture and can be better addressed in the compiler, as with ifort and gFortran.
Are you suggesting DO CONCURRENT could be a Fortran replacement for !$OMP PARALLEL DO ?
It would be good if a syntax, as has been developed for Coarray Fortran could be developed to assist performance management in OpenMP, or is that the role of a smart compiler ?

John

Anton Shterenlikht

unread,

Apr 29, 2016, 5:29:45 AM4/29/16

to

campbel...@gmail.com writes:
>Are you suggesting DO CONCURRENT could be a Fortran replacement for !$OMP P=
>ARALLEL DO ?

I'd rather say that it is possible for an optimising
compiler to implement DO CONCURRENT via some threaded
model, e.g. OpenMP, if the execution is on a shared
memory system. This will likely require some instructions
from the user to the compiler. It is possible that
the compiler will try to estimate the likely performance
gain from parallelising the user's DO CONCURRENT,
and might well decide that it's not worth the effort.

Anton

Arjen Markus

unread,

Apr 29, 2016, 5:42:51 AM4/29/16

to

Op vrijdag 29 april 2016 11:29:45 UTC+2 schreef Anton Shterenlikht:

That is my experience with the Intel Fortran compiler. Note however that compilers are required to be very conservative about such things. If they cannot prove that the loop can be safely parallellised they won't. A write statement in the loop may prevent it from being parallellised for instance.

With OpenMP you are yourself responsible to make sure it is parallellisable. It does give you the freedom to put in write statements and accept that they will appear in an unpredictable order though - which can be useful for debugging.

Regards,

Arjen

vladim...@gmail.com

unread,

Apr 29, 2016, 9:32:48 AM4/29/16

to

> If you're starting from scratch and your use for OpenMP is multi-threading,
> however, you might consider Pthreads instead of OpenMP. Even the recent
> OpenMP 4.0 standard explicitly states it it does not support numerous
> Fortran 2003 features, including type extension, type-bound procedures,
> polymorphic entities, defined operators, parameterized derived types, etc.
> I would imagine that some of these features might work anyway, but I'd hate
> to open that can of worms. If inserting OpenMP really does restrict the code
> that much, then it will greatly limit the ability to take advantage of lots

Yes, but you don't have to avoid all these features in your code just because you use OpenMP in others. I avoid many of these in performance sensitive parts of the code anyway. And parametrized derived types are still unimplemented future for me. Pthreads bring so much complication in comparison with OpenMP that I don't see them worthwhile especially when your code is already coarray parallel.