do concurrent vs forall

Anton Shterenlikht

unread,

Mar 1, 2013, 10:17:29 AM3/1/13

to

I've this triple loop:

do x3 = lbr(3),ubr(3)
do x2 = lbr(2),ubr(2)
do x1 = lbr(1),ubr(1)
if (coarray(x1,x2,x3).le.0) then
call random_number(candidate) ! 0 .le. candidate .lt. 1
step = nint(candidate*2-1) ! step = [-1 0 1]
array (x1,x2,x3) = &
coarray (x1+step(1),x2+step(2),x3+step(3))
end if
end do
end do
end do

where

integer(kind=iarr),allocatable,intent(inout) :: coarray(:,:,:)[:,:,:]

real :: candidate(3)

integer(kind=iarr),allocatable :: array(:,:,:)
integer(kind=idef) :: lbr(3), ubr(3), x1, x2, x3, step(3)

and iarr=idef=4

As far as I understand the rules, e.g. p.359-361 of MFE,
this loop can be trasformed in a do concurrent constract,
right? I presume that random_number is pure.

However, is it worth doing?
And I don't understand how do concurrent differs
from forall (p.114 of MFE). The two contructs
seem to do the same thing.

Thanks

Anton

Tobias Burnus

unread,

Mar 1, 2013, 11:05:57 AM3/1/13

to

Anton Shterenlikht wrote:
> And I don't understand how do concurrent differs
> from forall (p.114 of MFE). The two contructs
> seem to do the same thing.

Said bluntly, those who added FORALL thought that it would act like DO
CONCURRENT but it doesn't. DO CONCURRENT was then added as "FORALL but
done correctly".

Less enigmatic:

FORALL is a fancy assignment statement and a bit limited in what can be
in the forall-body. Like all all assignment statements, the right side
is first evaluated and then assigned to the left. If the compiler does
not see whether there is interdependence between the left and right side
of the equation, it has to generate a temporary array. Temporary arrays
may cause memory problems and make the execution slower.

That's similar to whole-array/array-section assignments such as:
A(m:n) = B
Also in that case, the right side has to be evaluated first. [If A is a
pointer and B a pointer or target, a temporary is needed. For instance,
one might have: "A => B(n:1:-1)".]

With DO CONCURRENT, the user guarantees that execution will give the
same result, independent of the index order. The standard helps by
posing some constraints, which the compiler can and must diagnose. If
the compiler cannot check, it assumes with DO CONCURRENT that there is
no issue while with FORALL it assumes the worst and creates a temporary.
In addition, DO CONCURRENT allows a lot of constructs in the body while
FORALL is more limited.

In terms of performance: If the compiler does not generate a temporary
variable, all are likely to have the same performance. Actually, they
might even generate exactly the same assembler code. As FORALL is not
that widely used, it is likely that the compiler does not always detect
whether a temporary is needed or not, even if it could. For
array-section/whole-array assignments, those are usually a bit better
optimized as they occur more often; still, if the LHS or RHS is a
pointer, the alias analysis is very difficult.

For normal loops and for do concurrent, the user automatically writes
the loop such that no temporary is needed (unless there are array
sections in the body of the loop). DO CONCURRENT allows in principle
some more optimizations, but I do not think that this is currently
really used for optimization. Some compilers might use the DO CONCURRENT
information when autoparallelization is used. DO CONCURRENT also helps
with manual parallelization such as with OpenMP as it highlights the
that a loop can be parallelized and the constraints of the compiler
helps to ensure that certain nonparallelizable constructs are not in the
code.

I personally would avoid FORALL; in particular, I do find it less
readable than other constructs. Thus, I would either use
whole-array/array-section constructs or DO loop – and if DO CONCURRENT
if the loop allows it and all compilers support it.

But if you find FORALL more readable, you should use it. Similarly,
whole-arrays/array-sections can be more readable than DO loops. Usually,
code maintainability is more important than performance, especially as
one easily guesses wrongly which version is faster. In particular, if on
the left side is a nonpointer, nontarget variable which is not also on
the right side and no impure functions are called, compilers shouldn't
generate temporary variables. But even if not, it changing it to a DO
(CONCURRENT) loop only makes sense if either the array is very large
(memory issues) or in a hot loop (performance). (Replacing POINTER but
something else is often also a good idea.)

If you use the gfortran compiler, -Warray-temporaries tells you when the
compiler uses a temporary array in assignments and FORALL. (And when it
inserts code which might do copy-in/copy-out in procedure calls. For the
latter, -fcheck=array-temps tells at run time whether it actually did a
copy-in/copy-out.)

Tobias

Richard Maine

unread,

Mar 1, 2013, 11:30:53 AM3/1/13

to

Anton Shterenlikht <me...@mech-cluster241.men.bris.ac.uk> wrote:

> I presume that random_number is pure.

No. Not at all. Quite the opposite. Random_number is about as impure as
one can get. That's why it is a subroutine instead of a function. While
a user can write impure functions, the standard intrinsic functions are
all pure.

> And I don't understand how do concurrent differs
> from forall (p.114 of MFE). The two contructs
> seem to do the same thing.

No. Again not at all. Forall is an array assignment. That's *ALL*.
Nothing other than the array assignment (or nesting other foralls and
wheres) can be done in a forall. Do not be mislead by the fact that its
syntax is somewhat reminiscent of a DO loop. It is not a general DO
loop.

DO concurrent, on the other hand, is indeed a DO loop. You can have all
kinds of statements in a DO concurrent, basically as long as the
different iterations don't interact in ways that make the order of
iteration matter.

Order matters very much in calls to random_number.

--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain

Anton Shterenlikht

unread,

Mar 4, 2013, 5:52:48 AM3/4/13

to

nos...@see.signature (Richard Maine) writes:

>Anton Shterenlikht <me...@mech-cluster241.men.bris.ac.uk> wrote:

>> I presume that random_number is pure.

>No. Not at all. Quite the opposite. Random_number is about as impure as
>one can get. That's why it is a subroutine instead of a function. While
>a user can write impure functions, the standard intrinsic functions are
>all pure.

Ok, I see your point. The results will indeed depend
on the order of execution. In other words, if the order
of execution is unpredictable, then the results from
random_number are also unpredictable, which is exactly
what a user will want from random_number, right?

Although I can see that in some applications it might
be necessary to predict what random_number will return.

To quote MFE (p.359): "by using this construct, the
programmer asserts that there are no interdependencies between
loop iterations." Sure there are inter-dependencies
in this loop, but such that I don't care about.
I don't care which random number is being used
in which iteration. Is it then up to me to deal
with the consequences?

Anyway, this fragment:

do concurrent (x1=lbr(1):ubr(1), &
x2=lbr(2):ubr(2), &
x3=lbr(3):ubr(3), coarray(x1,x2,x3).le.0)

call random_number(candidate) ! 0 .le. candidate .lt. 1
step = nint(candidate*2-1) ! step = [-1 0 1]

array (x1,x2,x3) = coarray (x1+step(1),x2+step(2),x3+step(3))

end do

compiles fine with Intel compiler with "-warn all"
switched on. Are you saying this is in violation
of the standard?

If a reference to a procedure that is not pure
is not allowed, then this becomes a non standard
compliant code, which the compiler should catch, right?

Many thanks

Anton

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

Tobias Burnus

unread,

Mar 4, 2013, 6:24:09 AM3/4/13

to

Anton Shterenlikht wrote:
> Anyway, this fragment:
>
> do concurrent (x1=lbr(1):ubr(1), &
> x2=lbr(2):ubr(2), &
> x3=lbr(3):ubr(3), coarray(x1,x2,x3).le.0)
>
> call random_number(candidate) ! 0 .le. candidate .lt. 1
> step = nint(candidate*2-1) ! step = [-1 0 1]
> array (x1,x2,x3) = coarray (x1+step(1),x2+step(2),x3+step(3))
>
> end do
>
> compiles fine with Intel compiler with "-warn all"
> switched on. Are you saying this is in violation
> of the standard?

It violates the following constraint - and a standard conforming
compiler is required to diagnose it:

C825 A reference to a nonpure procedure shall not appear within a
DO CONCURRENT construct.

A compiler might accept it as vendor extension and only reject it with,
e.g. "-stand f08". But I strongly suspect that it is a bug in the compiler.

Tobias

PS: Also gfortran wrongly accepts it. For some reasons, the pureness
check is never reached for intrinsic subroutines [only for DO
CONCURRENT]. The bug is now tracked as PR 56519.

timprince

unread,

Mar 4, 2013, 10:53:46 AM3/4/13

to

On 3/1/2013 11:30 AM, Richard Maine wrote:
> Anton Shterenlikht <me...@mech-cluster241.men.bris.ac.uk> wrote:
>
>> I presume that random_number is pure.
>
> No. Not at all. Quite the opposite. Random_number is about as impure as
> one can get. That's why it is a subroutine instead of a function. While
> a user can write impure functions, the standard intrinsic functions are
> all pure.
>
>> And I don't understand how do concurrent differs
>> from forall (p.114 of MFE). The two contructs
>> seem to do the same thing.
>
> No. Again not at all. Forall is an array assignment. That's *ALL*.
> Nothing other than the array assignment (or nesting other foralls and
> wheres) can be done in a forall. Do not be mislead by the fact that its
> syntax is somewhat reminiscent of a DO loop. It is not a general DO
> loop.
>
> DO concurrent, on the other hand, is indeed a DO loop. You can have all
> kinds of statements in a DO concurrent, basically as long as the
> different iterations don't interact in ways that make the order of
> iteration matter.
>
> Order matters very much in calls to random_number.
>

In particular, forall (putting the conditional as a mask expression)
requires the conditional to be applied at each step, subsequent to
completion of the previous assignment step. It would be up to a
compiler to check carefully to see that the same optimizations could be
performed as in the similar-looking do concurrent (in part, that the
forall doesn't include any assignments conflicting with the additional
rules for do concurrent, nor, like the random_number, with the rules of
both forall and do concurrent).
Intel Fortran makes a single-assignment forall preceded by !dir$ ivdep
roughly equivalent to a do concurrent, but the directive makes it an
extension from standard Fortran.
The do concurrent rules permit the compiler to translate as if it were a
counted do loop, so there shouldn't be loss in most cases relative to
the older do loop. OpenMP application is an evident exception; Intel
Fortran tries to facilitate auto-parallelization of do concurrent, so
that could be an alternative to OpenMP.

--
Tim Prince

Richard Maine

unread,

Mar 4, 2013, 11:03:48 AM3/4/13

to

Anton Shterenlikht <me...@mech-cluster241.men.bris.ac.uk> wrote:

> If a reference to a procedure that is not pure
> is not allowed, then this becomes a non standard
> compliant code, which the compiler should catch, right?

Just because somethnig is nonstandard, that does not mean that a
compiler should catch it. There are many, many things that are
nonstandard but will not realistically be caught by compilers. Some of
them are things that are literally impossible to catch at compile time.

There is a specific list of things that compilers are required to be
able to disgnose. That list is far smaller than the whole standard. A
good quality compiler should be able to diagnose more than this minimum
requirement, but it still won't be close to the entire standard.

As it happens, this particular error is in the list of things that
compilers are required to diagnose, as Tobias pointed out. So your
conclusion that a compiler should be able to catch this is correct, but
I felt it important to point out that the reasoning you presented above
is not, or at least is incomplete.

glen herrmannsfeldt

unread,

Mar 4, 2013, 11:58:47 AM3/4/13

to

Anton Shterenlikht <me...@mech-cluster241.men.bris.ac.uk> wrote:
> nos...@see.signature (Richard Maine) writes:

>>Anton Shterenlikht <me...@mech-cluster241.men.bris.ac.uk> wrote:

>>> I presume that random_number is pure.

>>No. Not at all. Quite the opposite. Random_number is about as impure as
>>one can get. That's why it is a subroutine instead of a function. While
>>a user can write impure functions, the standard intrinsic functions are
>>all pure.

> Ok, I see your point. The results will indeed depend
> on the order of execution. In other words, if the order
> of execution is unpredictable, then the results from
> random_number are also unpredictable, which is exactly
> what a user will want from random_number, right?

Well, sometimes one wants repeatable random numbers, especially
for debugging, which is why a seed is allowed to be specified.

But yes, sometimes one might really want them randomly
assigned to array elements, in which case it would be fine
if it did them out of order. Though there is still the problem
with possible non-reentrancy of random_number. You might,
for example, get the same number twice if you call it concurrently.

> Although I can see that in some applications it might
> be necessary to predict what random_number will return.

> To quote MFE (p.359): "by using this construct, the
> programmer asserts that there are no interdependencies between
> loop iterations." Sure there are inter-dependencies
> in this loop, but such that I don't care about.
> I don't care which random number is being used
> in which iteration. Is it then up to me to deal
> with the consequences?

Notice that there is no exception for "I don't care about it."

It might even be that with that statement that the compiler
is allowed to call random_number once and use the value for
all the assignments.

-- glen

michael...@compuserve.com

unread,

Mar 4, 2013, 12:12:35 PM3/4/13

to

On Monday, March 4, 2013 5:58:47 PM UTC+1, glen herrmannsfeldt wrote:
>
>
>
> It might even be that with that statement that the compiler
>
> is allowed to call random_number once and use the value for
>
> all the assignments.
>

But see bullet 4 on the following page where the ghastly details are spelled out: a reference to a procedure that is not pure [is required to be detected]

Regards,

Mike Metcalf

glen herrmannsfeldt

unread,

Mar 4, 2013, 12:47:46 PM3/4/13

to

Yes, and after that, if you decide to run the program anyway,
(some compilers will still generate an object module even when
an error is reported) then see what happens.

Is it allowed to be just a warning?

-- glen

michael...@compuserve.com

unread,

Mar 5, 2013, 3:46:57 AM3/5/13

to

C825 A reference to a nonpure procedure shall not appear within a DO CONCURRENT construct.

which means it would be non-standard to accept it.

Regards,

Mike Metcalf

robert....@oracle.com

unread,

Mar 5, 2013, 5:02:26 AM3/5/13

to

On Monday, March 4, 2013 9:47:46 AM UTC-8, glen herrmannsfeldt wrote:

The relevant part of the standard is the Conformance clause.
In Fortran 2008, the Conformance clause is Clause 1.5. It
states

A processor conforms to this part of ISO/IEC 1539 if
. . .
(3) it contains the capability to detect and report
the use within a submitted program unit of an
additional form or relationship that is not
permitted by the numbered syntax rules or
constraints, including the deleted features
specified in Annex B
. . .

The Conformance clause does not require a conforming processor
to report any use of extensions by default. A conforming
processor must have the capability to report the use of the
specified additional forms or relationships, but the standard
does not say how to enable that capability.

An example of syntax rule that is not a numbered syntax rule
is the requirement that a free form line shall contain at most
132 characters (see Clause 3.3.2.1). Because the requirement
is not a numbered syntax rule or constraint, a conforming
processor is allowed to accept lines longer than 132 characters
without reporting their appearance. The requirement is a
limitation on programs, not processors.

Bob Corbett