Anton Shterenlikht wrote:
> And I don't understand how do concurrent differs
> from forall (p.114 of MFE). The two contructs
> seem to do the same thing.
Said bluntly, those who added FORALL thought that it would act like DO
CONCURRENT but it doesn't. DO CONCURRENT was then added as "FORALL but
done correctly".
Less enigmatic:
FORALL is a fancy assignment statement and a bit limited in what can be
in the forall-body. Like all all assignment statements, the right side
is first evaluated and then assigned to the left. If the compiler does
not see whether there is interdependence between the left and right side
of the equation, it has to generate a temporary array. Temporary arrays
may cause memory problems and make the execution slower.
That's similar to whole-array/array-section assignments such as:
A(m:n) = B
Also in that case, the right side has to be evaluated first. [If A is a
pointer and B a pointer or target, a temporary is needed. For instance,
one might have: "A => B(n:1:-1)".]
With DO CONCURRENT, the user guarantees that execution will give the
same result, independent of the index order. The standard helps by
posing some constraints, which the compiler can and must diagnose. If
the compiler cannot check, it assumes with DO CONCURRENT that there is
no issue while with FORALL it assumes the worst and creates a temporary.
In addition, DO CONCURRENT allows a lot of constructs in the body while
FORALL is more limited.
In terms of performance: If the compiler does not generate a temporary
variable, all are likely to have the same performance. Actually, they
might even generate exactly the same assembler code. As FORALL is not
that widely used, it is likely that the compiler does not always detect
whether a temporary is needed or not, even if it could. For
array-section/whole-array assignments, those are usually a bit better
optimized as they occur more often; still, if the LHS or RHS is a
pointer, the alias analysis is very difficult.
For normal loops and for do concurrent, the user automatically writes
the loop such that no temporary is needed (unless there are array
sections in the body of the loop). DO CONCURRENT allows in principle
some more optimizations, but I do not think that this is currently
really used for optimization. Some compilers might use the DO CONCURRENT
information when autoparallelization is used. DO CONCURRENT also helps
with manual parallelization such as with OpenMP as it highlights the
that a loop can be parallelized and the constraints of the compiler
helps to ensure that certain nonparallelizable constructs are not in the
code.
I personally would avoid FORALL; in particular, I do find it less
readable than other constructs. Thus, I would either use
whole-array/array-section constructs or DO loop – and if DO CONCURRENT
if the loop allows it and all compilers support it.
But if you find FORALL more readable, you should use it. Similarly,
whole-arrays/array-sections can be more readable than DO loops. Usually,
code maintainability is more important than performance, especially as
one easily guesses wrongly which version is faster. In particular, if on
the left side is a nonpointer, nontarget variable which is not also on
the right side and no impure functions are called, compilers shouldn't
generate temporary variables. But even if not, it changing it to a DO
(CONCURRENT) loop only makes sense if either the array is very large
(memory issues) or in a hot loop (performance). (Replacing POINTER but
something else is often also a good idea.)
If you use the gfortran compiler, -Warray-temporaries tells you when the
compiler uses a temporary array in assignments and FORALL. (And when it
inserts code which might do copy-in/copy-out in procedure calls. For the
latter, -fcheck=array-temps tells at run time whether it actually did a
copy-in/copy-out.)
Tobias