Hi Zaak,
sorry I couldn't make it to FOSDEM, I'm ill with a flu. (The following may contain mistakes).
Within our parallel programming, there might be three (main) types of code:
1. Codes for only local computations that do execute on the
cores/images and do not require any remote interaction. These are the
main codes to feed the cores and keep them busy working.
2. Codes that do directly declare, define, and access coarrays, to
establish and use data transfer channels among cores/images. I call
these coarray wrapper codes.
3. Codes that comprise (new kinds of) parallel algorithms. I seek to implement such codes as (kind of) distributed objects.
With current gfortran 9.0.0, I did identify the following code containers for use with the above types of codes:
1. For the local computations: anything from (existing) Fortran 77 procedures, F95-style ADT, or F03 class.
2. For the coarray wrapper codes: only F95-style ADT, gfortran does
still not allow to use F03 type-bound procedures for direct access with
coarrays or atomic subroutines.
3. For any parallel algorithm as distributed objects: F95-style ADT
or, better, F03-style class together with (full?) use of OOP.
My recent use of OpenCoarrays is for testing different ways for
implementing (kind of) distributed objects. I believe that (a certain
kind of) distributed objects are the perfect way to use coarrays for
development of general-purpose parallel applications in a simple and
safe manner (including parallel error detection and handling through
atomics).
Current gfortran (9.0.0) does still not allow direct access to
coarrays (or atomic subroutines) with type-bound procedures. The simple
trick to circumvent this limitation (to me it is not really a limitation
yet) remains the use of F9x-style ADT's as coarray wrappers: Any direct
access to coarrays or atomic subroutines must be encapsulated into
these. I did already succeed to keep the codes in these coarray wrappers
to an absolute minimum. (Besides, code-reuse with these coarray
wrappers can be achieved relatively easily through a code generator).
The true parallel logic codes (for implementing any kind of parallel
algorithm) can and should reside outside the coarray wrappers: A
customized (user-defined) synchronization procedure is a simple example
of such. It is no problem to implement such as type-bound procedures and
use it with Fortran 2003 classes.
Implementing a very simple parallel algorithm may require the use of
two methods, for example: method A does execute on image/core 1 to
control the distributed execution of method B, while method B does
execute 99 times on the images/cores 2-100 to compute. To achieve that,
distributed execution of method A and B may require remote data
transfer among them at multiple times. Such highly flexible but limited
data transfers with procedure level parallelism can be achieved through
atomics (Fortran only I think).
With a simple parallel algorithm we could encapsulate all the
parallel logic codes into a single derived type object (using a Fortran
2003 class). With more sophisticated parallel algorithms we may want to
implement our codes with multiple distinct derived type objects (using
Fortran 2003 classes).
To establish data transfer channels between distributed objects
(implemented as Fortran 2003 classes) of same or of distinct(!) derived
type, we USE the same coarray wrapper: coarray correspondence between
the distributed objects is established by USE association. (The coarray
of derived type is declared within the coarray wrapper).
Coarray components are no different concept but only a simple
extension for using coarrays: We can use the same coarray wrapper (as
above) as a coarray component. But using it as a coarray component has
some implications (if not limitations): With coarray components the data
transfer channels (corresponding coarrays) are not established by USE
association but rather by instantiation of the surrounding F03 class.
This may not allow to establish data transfer channels between
distributed objects of distinct type but only if they have the same
type. (A limited solution could be to use inheritance). This may be a
major limitation for implementing sophisticated parallel algorithms with
coarray components. Also, use of OOP appears to be limited if a a class
contains a coarray component, while the above USE association could
allow more unlimited use of OOP with parallel programming (not tested
yet).
It appears that Remote Procedure Calls (RPC) are a major feature of
distributed objects. (See Wikipedia and also the UPC++ Programmers
Guide, chapter 9: Distributed Objects -except with the most recent
version where they have removed that chapter-. To me, distributed
objects in UPC++ appear to be somewhat complicated and even limited).
RPCs appear also to be a major complaint for not using distributed
objects, see:
https://www.martinfowler.com/articles/distributed-objects-microservices.html
. If I understand correctly, RPCs are also allowed with coarrays. (I
did not test but it should not work with gfortran yet because a
type-bound procedure is required for direct use with a coarray).
I do not want to use RPCs with my distributed objects but instead
should be able to implement similar functionality through data transfers
only (mainly put operations, remote write), but with more runtime
efficiency: I think the PGAS model is perfect for implementing
distributed objects based on data transfers only. The programmer must
implement the logic codes to achieve the correct and required different
execution paths on the distinct images (considering SPMD is only
underlying).
A final note on F18 coarray teams: I think the backbone of these are
the ALLOCATE statement (for coarrays), which allows to (newly)
establish the data transfer channels, (newly) establish segment ordering
for a coarray within a team, and even to repair a defect data transfer
channel by reallocating a coarray on the images of a team (from my
testing with ifort-Intel MPI / OpenCoarrays-gfortran-MPICH).
Best Regards