DO CONCURRENT and "private" variables

town...@astro.wisc.edu

unread,

Jan 14, 2014, 4:51:10 PM1/14/14

to

Hi folks --

I'm writing to ask for help in understanding how DO CONCURRENT works.

Consider the following test program:

----
program test_concurrent

implicit none

integer, parameter :: n = 100

real :: a(100)
real :: b
integer :: i

do concurrent (i = 1:n)
b = 0.5*i
a(i) = b
end do

do i = 1,n
print *,i,a(i)
end do

end program test_concurrent
----

I've been having some back-and-forth discussion with a colleague about whether this is valid and does what is expected. The source of confusion is the variable b. Is it "private" to each iteration, so that if the loop is run in parallel then each thread gets a separate copy? Or is there just one b, and the different threads tread on each other's toes writing to it.

From reading John Reid's "New Features in Fortran 2008" document, I note that one restriction on do concurrent is that (paraphrasing) 'a variable referenced in an iteration was previously defined in the same iteration'. This seems to describe b, so it looks like the program is OK.

However, my colleague has been reading this Xeon Phi book:

http://www.amazon.com/gp/product/0124104142/ref=oh_details_o00_s00_i00?ie=UTF8&psc=1

In his words: "And it [this book] is the source of the statement that the definition of "do concurrent" doesn't make clear what happens for "private" vars. You and I and anyone with any sense would expect that private vars would be automatically duplicated for each instance of the iteration, but apparently that's not part of the standard! ? And in the book, they say that as of the time of writing (2013) ifort just mapped "do concurrent" to an openmp parallel do. but without automatic private for vars set in the loop."

So, confusion abounds! I'd appreciate any pointers on the correct behavior of do concurrent.

Best wishes,

Rich

Dan Nagle

unread,

Jan 14, 2014, 5:10:31 PM1/14/14

to

On 2014-01-14 21:51:10 +0000, town...@astro.wisc.edu said:

> program test_concurrent
>
> implicit none
>
> integer, parameter :: n = 100
>
> real :: a(100)
> real :: b
> integer :: i
>
> do concurrent (i = 1:n)
> b = 0.5*i
> a(i) = b
> end do
>
> do i = 1,n
> print *,i,a(i)
> end do
>
> end program test_concurrent

> So, confusion abounds! I'd appreciate any pointers on the correct
> behavior of do concurrent.

Since b is defined in each iteration before it is used, I'd say
you are within the spirit of the do concurrent. Whether you're
within the letter of the law is another matter.

Just to be safe, you might place a block around the body
of the do concurrent, as per note 8.11 at [178:18+]
in 10-007r1.

So

> do concurrent (i = 1:n)

block
real :: b

> b = 0.5*i
> a(i) = b

end block
> end do

I think then you'd be absolutely safe (and more clear
of your intentions as well).

--
Cheers!

Dan Nagle

glen herrmannsfeldt

unread,

Jan 14, 2014, 5:29:34 PM1/14/14

to

Dan Nagle <danl...@me.com> wrote:
> On 2014-01-14 21:51:10 +0000, town...@astro.wisc.edu said:

>> program test_concurrent
>> implicit none
>> integer, parameter :: n = 100
>> real :: a(100)
>> real :: b
>> integer :: i
>>

>> b = 0.5*i
>> a(i) = b
>> end do

Well, for one, any compiler with the any non-zero amount of
optimization will optimize B away.

>> do i = 1,n
>> print *,i,a(i)
>> end do
>> end program test_concurrent

>> So, confusion abounds! I'd appreciate any pointers on the correct
>> behavior of do concurrent.

> Since b is defined in each iteration before it is used, I'd say
> you are within the spirit of the do concurrent. Whether you're
> within the letter of the law is another matter.

I think I agree, but I would be much less sure in the case
of OpenMP. Well, for OpenMP it is up to the programmer to
get the private variables correct. For DO CONCURRENT, I would
say it the compiler should get it right. Maybe DO IN ANY ORDER
would have made more sense, but not be as readable.

> Just to be safe, you might place a block around the body
> of the do concurrent, as per note 8.11 at [178:18+]
> in 10-007r1.

(snip)

-- glen

Richard Maine

unread,

Jan 14, 2014, 8:07:25 PM1/14/14

to

glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
[about DO CONCURRENT]

> Maybe DO IN ANY ORDER
> would have made more sense, but not be as readable.

But it also would miss the point. I doubt that such a new feature would
have introduced just so that the compiler could do the loop iterations
in any order. That would provide no obvious benefit. "Any order" still
implies that there is an order, that is that the iterations are done one
at a time. To my knowledge, the reason for DO CONCURRENT is to
facilitate parallel operation, that is concurrency. Thus the term. When
the iterations are done concurrently, they don't *HAVE* an order.

--
Richard Maine
email: last name at domain . net
domain: summer-triangle

Ian Harvey

unread,

Jan 14, 2014, 9:19:44 PM1/14/14

to

On 2014-01-15 8:51 AM, town...@astro.wisc.edu wrote:
...

> I've been having some back-and-forth discussion with a colleague
> about whether this is valid and does what is expected. The source of
> confusion is the variable b. Is it "private" to each iteration, so
> that if the loop is run in parallel then each thread gets a separate
> copy? Or is there just one b, and the different threads tread on each
> other's toes writing to it.
>
> From reading John Reid's "New Features in Fortran 2008" document, I
> note that one restriction on do concurrent is that (paraphrasing) 'a
> variable referenced in an iteration was previously defined in the
> same iteration'. This seems to describe b, so it looks like the
> program is OK.
>
> However, my colleague has been reading this Xeon Phi book:
>
> http://www.amazon.com/gp/product/0124104142/ref=oh_details_o00_s00_i00?ie=UTF8&psc=1
>
> In his words: "And it [this book] is the source of the statement
> that the definition of "do concurrent" doesn't make clear what
> happens for "private" vars. You and I and anyone with any sense
> would expect that private vars would be automatically duplicated for
> each instance of the iteration, but apparently that's not part of the
> standard! ? And in the book, they say that as of the time of
> writing (2013) ifort just mapped "do concurrent" to an openmp
> parallel do. but without automatic private for vars set in the
> loop."
>
> So, confusion abounds! I'd appreciate any pointers on the correct
> behavior of do concurrent.

Beyond what Dan says, I think your example is legal - it is within both
the spirit and letter of the law.

For what its worth (and with considerable room for error on my part
given my knowledge of assembly and the way that optimization can
obfuscate things) - a comparison of the assembly output from the current
release of the Intel compiler (with the appropriate switches to force
parallelization of the do concurrent loop) against openmp variants of
your code shows that the compiler appears to regards b as "private" (or
perhaps more accurately - "not shared").

I wouldn't be surprised if initial implementations of DO CONCURRENT got
some aspects of the behaviour wrong.

glen herrmannsfeldt

unread,

Jan 14, 2014, 10:15:48 PM1/14/14

to

Richard Maine <nos...@see.signature> wrote:
> glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> [about DO CONCURRENT]
>> Maybe DO IN ANY ORDER
>> would have made more sense, but not be as readable.

> But it also would miss the point. I doubt that such a new feature would
> have introduced just so that the compiler could do the loop iterations
> in any order. That would provide no obvious benefit. "Any order" still
> implies that there is an order, that is that the iterations are done one
> at a time. To my knowledge, the reason for DO CONCURRENT is to
> facilitate parallel operation, that is concurrency. Thus the term. When
> the iterations are done concurrently, they don't *HAVE* an order.

You don't say whether the OP code should work or not. If it does
it concurrently, as written, then B gets overwritten and the result
is wrong. (Well, often enough.)

The favorite IBM terms (I am not sure about the current CS meanings)
are reentrant, serially reusable, refreshable, or the lack of those.

Each instance is assumed to have its own registers, but memory
locations are shared.

-- glen

Richard Maine

unread,

Jan 14, 2014, 10:55:39 PM1/14/14

to

glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

> Richard Maine <nos...@see.signature> wrote:
> > glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> > [about DO CONCURRENT]
> >> Maybe DO IN ANY ORDER
> >> would have made more sense, but not be as readable.
>
> > But it also would miss the point. I doubt that such a new feature would
> > have introduced just so that the compiler could do the loop iterations
> > in any order. That would provide no obvious benefit. "Any order" still
> > implies that there is an order, that is that the iterations are done one
> > at a time. To my knowledge, the reason for DO CONCURRENT is to
> > facilitate parallel operation, that is concurrency. Thus the term. When
> > the iterations are done concurrently, they don't *HAVE* an order.
>
> You don't say whether the OP code should work or not.

I don't say because I don't know. Admitedly, that doesn't always keep me
from mouthing off, but I do try to minimize it. :-)

> If it does
> it concurrently, as written, then B gets overwritten and the result
> is wrong. (Well, often enough.)

I'll not take that as a definitive answer unless supported with specific
citations from the standard. I'm not saying you are wrong - I really
don't know. But I don't think I'll take your explanation of what you
think the word "concurrent" ought to imply as a definitive reading of
the standard. Perhaps B might be considered private to each iteration; I
don't know.

I have not studied the standard's treatment of that feature well enough
to say whether it guarantees that the OP's code is valid or not, or
possibly whether it "ought" to be valid but the standard fails to
adequately cover it and thus needs fixing. I could believe any of those
three possibilities.

You might correctly notice a lot of " I don't knows" above. But the fact
that I don't claim to know everything (or even very much) about the
details of the DO CONCURRENT specification in the standard doesn't keep
me from commenting about one part that I do think I know - that the
feature was intended to support concurrent execution. That's all I said;
I'll stand by that part.

robert....@oracle.com

unread,

Jan 15, 2014, 4:20:05 AM1/15/14

to

On Tuesday, January 14, 2014 5:07:25 PM UTC-8, Richard Maine wrote:

> glen herrmannsfeldt wrote:
>
> [about DO CONCURRENT]
> > Maybe DO IN ANY ORDER
> > would have made more sense, but not be as readable.
>
> But it also would miss the point. I doubt that such a new feature would
> have introduced just so that the compiler could do the loop iterations
> in any order. That would provide no obvious benefit. "Any order" still
> implies that there is an order, that is that the iterations are done one
> at a time. To my knowledge, the reason for DO CONCURRENT is to
> facilitate parallel operation, that is concurrency. Thus the term. When
> the iterations are done concurrently, they don't *HAVE* an order.

The semantics of DO CONCURRENT are DO IN ANY ORDER
plus some additional restrictions and some additional
flexibility. The goal certainly was to facilitate
parallel execution, but for complicated cases, I
expect most compilers to fall back on serial
execution with DO IN ANY ORDER semantics.

Bob Corbett

Anton Shterenlikht

unread,

Jan 15, 2014, 5:50:31 AM1/15/14

to

town...@astro.wisc.edu writes:

>program test_concurrent
> implicit none

> integer, parameter :: n =3D 100

> real :: a(100)
> real :: b
> integer :: i

> do concurrent (i =3D 1:n)
> b =3D 0.5*i
> a(i) =3D b
> end do
> do i =3D 1,n

> print *,i,a(i)
> end do
>end program test_concurrent
>

>I've been having some back-and-forth discussion with a colleague about whet=
>her this is valid and does what is expected. The source of confusion is the=
> variable b. Is it "private" to each iteration, so that if the loop is run =
>in parallel then each thread gets a separate copy? Or is there just one b, =

>and the different threads tread on each other's toes writing to it.
>

>From reading John Reid's "New Features in Fortran 2008" document, I note th=
>at one restriction on do concurrent is that (paraphrasing) 'a variable refe=
>renced in an iteration was previously defined in the same iteration'. This =

>seems to describe b, so it looks like the program is OK.

The standard is very clear on this - 10-107r1, p. 178, lines 5-7:

*quote*
A variable that is referenced in an iteration shall neither
be previously defined during that iteration,
or shall not be defined or become undefined during
any other iteration. A variable that is defined or
becomes undefined by more than one iteration becomes
undefined when the loop terminates.
*end quote*

Implementations may and will vary. Comparison with OpenMP
cannot be used to judge the standard compliance of the Fortran code.

Your code is standard compliant even without the BLOCK construct.

Anton

glen herrmannsfeldt

unread,

Jan 15, 2014, 6:33:41 AM1/15/14

to

robert....@oracle.com wrote:
> On Tuesday, January 14, 2014 5:07:25 PM UTC-8, Richard Maine wrote:

>> [about DO CONCURRENT]
(snip, I wrote)

>> > Maybe DO IN ANY ORDER
>> > would have made more sense, but not be as readable.

>> But it also would miss the point. I doubt that such a new feature would
>> have introduced just so that the compiler could do the loop iterations
>> in any order. That would provide no obvious benefit. "Any order" still
>> implies that there is an order, that is that the iterations are done one
>> at a time. To my knowledge, the reason for DO CONCURRENT is to
>> facilitate parallel operation, that is concurrency. Thus the term. When
>> the iterations are done concurrently, they don't *HAVE* an order.

> The semantics of DO CONCURRENT are DO IN ANY ORDER
> plus some additional restrictions and some additional
> flexibility. The goal certainly was to facilitate
> parallel execution, but for complicated cases, I
> expect most compilers to fall back on serial
> execution with DO IN ANY ORDER semantics.

The complication of both regular array assignment and FORALL is
that they are defined not to change the left hand side until the
whole expression on the right has been evaluated. If the compiler
can't determine that there is no overlap or aliasing, a temporary
is required.

As a side note, and from an earlier time, PL/I took a different
choice. For PL/I, array assignment is defined to be element by
element such that changed elements are used immediately.

If the programmer knows that array elements can be evaluated
in any order, but the compiler isn't able to determine that,
it is nice to have a way to tell the compiler that order doesn't
matter.

Note that OpenMP has a way to tell the compiler, when doing
concurrent evaluation, that each thread/task/whatever should have
its own copy of intermediate variables. As I indicated, and Robert
noted, the result is AS IF the trips through the loop were done in
any order. That there are no dependencies between them, in either
array element access or intermediate variables. (And, as I noted
earlier, in many cases B will be optimized away.) When dependencies
have been removed, including intermediate variables, then
concurrent execution is done.

Finally, from the standard for DO CONCURRENT: "The executions
may occur in any order." (and which I didn't read before the
previous post.)

-- glen

town...@astro.wisc.edu

unread,

Jan 15, 2014, 10:48:31 AM1/15/14

to

On Wednesday, January 15, 2014 4:50:31 AM UTC-6, Anton Shterenlikht wrote:

It may be *compliant*, but the standard appears to be insufficient to determine what the output of this code will be. Will the code print out 100 numbers, starting at 0.5 and increasing by increments of 0.5? If it doesn't, is the compiler violating the standard?

Richard Maine

unread,

Jan 15, 2014, 10:57:52 AM1/15/14

to

Anton Shterenlikht <me...@mech-cluster241.men.bris.ac.uk> wrote:

> The standard is very clear on this - 10-107r1, p. 178, lines 5-7:
>
> *quote*
> A variable that is referenced in an iteration shall neither
> be previously defined during that iteration,
> or shall not be defined or become undefined during
> any other iteration. A variable that is defined or
> becomes undefined by more than one iteration becomes
> undefined when the loop terminates.
> *end quote*

"neither" -> "either"

michael...@compuserve.com

unread,

Jan 15, 2014, 11:09:10 AM1/15/14

to

On Tuesday, January 14, 2014 10:51:10 PM UTC+1, town...@astro.wisc.edu wrote:
>
>
> From reading John Reid's "New Features in Fortran 2008" document, I note that >one restriction on do concurrent is that (paraphrasing) 'a variable referenced >in an iteration was previously defined in the same iteration'. This seems to >describe b, so it looks like the program is OK.
>

I note that John Reid's most recent rendering of this rule, on p. 360 of MFE, is: "By using do concurrent the programmer guarantees [that] any variable referenced is either previously defined in the same iteration,
or its value is not affected by any other iteration;"

HTH

Mike Metcalf

Richard Maine

unread,

Jan 15, 2014, 11:53:33 AM1/15/14

to

<town...@astro.wisc.edu> wrote:

> On Wednesday, January 15, 2014 4:50:31 AM UTC-6, Anton Shterenlikht wrote:

> > The standard is very clear on this - 10-107r1, p. 178, lines 5-7:

> > *quote*
> >
> > A variable that is referenced in an iteration shall neither
> > be previously defined during that iteration,
> > or shall not be defined or become undefined during
> > any other iteration. A variable that is defined or
> > becomes undefined by more than one iteration becomes
> > undefined when the loop terminates.
> > *end quote*

[with my previously noted correction of "neither"->"either"]

> > Your code is standard compliant even without the BLOCK construct.

> It may be *compliant*, but the standard appears to be insufficient to
> determine what the output of this code will be. Will the code print out
> 100 numbers, starting at 0.5 and increasing by increments of 0.5? If it
> doesn't, is the compiler violating the standard?

I'm finding myself agreeing with Rich Townsend here. Looks to me like
the intent is to allow this sort of code, but I' don't see that the
cited words do it. Maybe other words do, but these words alone are not
enough. I haven't researched enough to be sure about the possible other
words.

If execution can indeed be concurrent, then it looks to me like the
words as posted don't provide an interpretation of what the result would
be. In that case, the code would be non-conforming because there's a bit
quite early in the standard that says code is non-conforming if the
standard doesn't provide an interpretation of it. We don't generally
like to rely on that rule, preferring to make it explicit that something
is nonconforming instead of having to deduce it that way; there have
been cases before where that kind of argument was used to justify that
something was nonconforming, but an erratum was published to make it
clearer by adding an explicit prohibition.

If the model in the standard is that the iterations are done in any
order, but not concurrently, then all would be well defined, although as
I mentioned before, there woukdn't be much reason for the construct if
that were the end of it.

Possibly (and please note that word, as I haven't researched enough to
be at all sure of it), the standard defines execution as though it were
in any order, but with the idea that the restrictions facilitate
allowing concurrent implementation. In that case, a concurrent
implementation would have to basically make B private to each iteration.
(Optimizing B out would have essentially the same effect).

Per Robert's note, just because the standard is intended to facilitate
concurrent implementation doesn't necessarily mean that all compilers
are required to or will manage to do so.

Ron Shepard

unread,

Jan 15, 2014, 12:43:51 PM1/15/14

to

In article <1lfiabo.oqxtaqh020gN%nos...@see.signature>,

nos...@see.signature (Richard Maine) wrote:

> If the model in the standard is that the iterations are done in any
> order, but not concurrently, then all would be well defined, although as
> I mentioned before, there woukdn't be much reason for the construct if
> that were the end of it.

I disagree with the last part of that sentence. Those semantics are
what allows compilers to interchange loop orders or to stripmine
loops in order to optimize register usage and memory access. This
functionality is what we programmers have wanted starting back in
the 80's during vector processing (i.e. pipelining) was becoming
more or less standard. Of course vector processing can be viewed as
a form of "concurrency", but now that term usually implies multiple
execution threads or multiple processes.

$.02 -Ron Shepard

town...@astro.wisc.edu

unread,

Jan 15, 2014, 1:12:25 PM1/15/14

to

FWIW, the colleague I mentioned in my original post was hoping this would be the case -- he wanted a code construct that could easily be vectorized by compilers. Hence his disappointment when he found out that, at least for the Intel compiler, DO CONCURRENT maps on to OMP rather than onto SSE thingamajigs.

cheers,

Rich

town...@astro.wisc.edu

unread,

Jan 15, 2014, 1:16:47 PM1/15/14

to

The key word in this sentence is "or". If the variable referenced has been previously defined in the same iteration, then we are OK as far as the standard goes. But will the variable still have the value it was defined with?

cheers,

Rich

PS Didn't realize you folks have another "... Explained" book out. I've just ordered it -- this will make my 4th book in the 'series'!

Ian Harvey

unread,

Jan 15, 2014, 3:24:20 PM1/15/14

to

On 2014-01-16 3:53 AM, Richard Maine wrote:
...

> If the model in the standard is that the iterations are done in any
> order, but not concurrently, then all would be well defined, although as
> I mentioned before, there woukdn't be much reason for the construct if
> that were the end of it.
>
> Possibly (and please note that word, as I haven't researched enough to
> be at all sure of it), the standard defines execution as though it were
> in any order, but with the idea that the restrictions facilitate
> allowing concurrent implementation. In that case, a concurrent
> implementation would have to basically make B private to each iteration.
> (Optimizing B out would have essentially the same effect).

It is this.

The possibility of asynchronous execution of statements is only
mentioned in the context of images (and perhaps asynchronous IO).
Otherwise execution within an image is a sequence in time of actions,
actions being a consequence of statements (2.3.5).

The restrictions on do concurrent are such that a conforming program
cannot really tell whether execution is concurrent or just in some
haphazard order (though practically you could probably draw pretty
sensible conclusions from the ordering of output records in sequential
file, but that ordering is explicitly undefined, so alternatively
perhaps the processor is just rolling dice behind your back).

Consequently, if a processor is able to determine (or arrange) that
concurrent execution would give the same outcome as execution of the
iterations in any order, then it can take advantage of that.

> Per Robert's note, just because the standard is intended to facilitate
> concurrent implementation doesn't necessarily mean that all compilers
> are required to or will manage to do so.

It's easy enough to construct DO CONCURRENT examples that would be quite
difficult for compilers to parallelize. The rules are such that a
decision about whether a particular variable is private or shared (to
use the openmp terminology) might have to be made at runtime.

glen herrmannsfeldt

unread,

Jan 15, 2014, 8:56:15 PM1/15/14

to

Ron Shepard <ron-s...@nospam.comcast.net> wrote:

(snip)

> I disagree with the last part of that sentence. Those semantics are
> what allows compilers to interchange loop orders or to stripmine
> loops in order to optimize register usage and memory access. This
> functionality is what we programmers have wanted starting back in
> the 80's during vector processing (i.e. pipelining) was becoming
> more or less standard. Of course vector processing can be viewed as
> a form of "concurrency", but now that term usually implies multiple
> execution threads or multiple processes.

As I understand it, FORALL was meant to work with vector
processors, such as those from Cray. If the array fits into
a vector register, you can do the whole computation in registers,
and not need any temporary (in memory) arrays. But vector registers
were often 64 elements long, and arrays often much larger.
If the array is larger than a vector register, the compiler
still has to either verify no overlap or use a temporary.

A way to promise to the compiler that there is no interdependence
is really useful.

-- glen

Dick Hendrickson

unread,

Jan 15, 2014, 10:57:27 PM1/15/14

to

On 1/15/14 7:56 PM, glen herrmannsfeldt wrote:
> Ron Shepard<ron-s...@nospam.comcast.net> wrote:
>
> (snip)
>> I disagree with the last part of that sentence. Those semantics are
>> what allows compilers to interchange loop orders or to stripmine
>> loops in order to optimize register usage and memory access. This
>> functionality is what we programmers have wanted starting back in
>> the 80's during vector processing (i.e. pipelining) was becoming
>> more or less standard. Of course vector processing can be viewed as
>> a form of "concurrency", but now that term usually implies multiple
>> execution threads or multiple processes.
>
> As I understand it, FORALL was meant to work with vector
> processors, such as those from Cray. If the array fits into

NO, FORALL wasn't designed for a Cray nor for a vector register machine.
Fitting things into vector registers wasn't the goal. Some "vector"
machines at the time were memory-to-memory with "whatever" vector
length. FORALL was just an almost darn good idea that had a 20-20
hindsight flaw.

Register size wasn't the issue. The problem (IMO) was that FORALL
didn't impose any requirements on the programmer to prevent iteration to
iteration dependencies between variable definition and use.
Essentially, an optimizing compiler that did out-of-order load/stores,
had to deduce that there were no multi-statement ambiguities between
array element [re]definition and [re]use. This was the same problem
that compilers faced for normal DO loop vectorization or unrolling. The
FORALL syntax offered no guarantees that an optimizer could rely on.
The new DO CONCURRENT requires the programmer to promise not to do
anything bad. That let the compiler do out-of-order load/stores (ie
"vectorization") even if it can't prove that it's safe.

Dick Hendrickson

glen herrmannsfeldt

unread,

Jan 15, 2014, 11:41:13 PM1/15/14

to

Dick Hendrickson <dick.hen...@att.net> wrote:

(snip, I wrote)

>> As I understand it, FORALL was meant to work with vector
>> processors, such as those from Cray. If the array fits into

> NO, FORALL wasn't designed for a Cray nor for a vector register machine.
> Fitting things into vector registers wasn't the goal. Some "vector"
> machines at the time were memory-to-memory with "whatever" vector
> length. FORALL was just an almost darn good idea that had a 20-20
> hindsight flaw.

I am trying to remember how the Star-100 works. But it never got
very popular, as it wasn't very fast running actual useful programs.

> Register size wasn't the issue. The problem (IMO) was that FORALL
> didn't impose any requirements on the programmer to prevent iteration to
> iteration dependencies between variable definition and use.

Yes.

> Essentially, an optimizing compiler that did out-of-order load/stores,
> had to deduce that there were no multi-statement ambiguities between
> array element [re]definition and [re]use. This was the same problem
> that compilers faced for normal DO loop vectorization or unrolling. The
> FORALL syntax offered no guarantees that an optimizer could rely on.
> The new DO CONCURRENT requires the programmer to promise not to do
> anything bad. That let the compiler do out-of-order load/stores (ie
> "vectorization") even if it can't prove that it's safe.

Or use a temporary. But the limit on a good pipelined processor
is memory bandwidth, which means avoid temporary copies.

And at the time when the fast vector processors used vector
registers.

-- glen

Steve Lionel

unread,

Jan 20, 2014, 2:41:06 PM1/20/14

to

On 1/15/2014 1:12 PM, town...@astro.wisc.edu wrote:
> FWIW, the colleague I mentioned in my original post was hoping this would be the case -- he wanted a code construct that could easily be vectorized by compilers. Hence his disappointment when he found out that, at least for the Intel compiler, DO CONCURRENT maps on to OMP rather than onto SSE thingamajigs.

If vectorization is primarily what is wanted, I suggest the use of the
OpenMP 4.0 SIMD directives, or for the Intel compiler, its !DIR$ SIMD
directive, to "enhance" vectorization. DO CONCURRENT is for parallelism.

--
Steve Lionel
Developer Products Division
Intel Corporation
Merrimack, NH

For email address, replace "invalid" with "com"

User communities for Intel Software Development Products
http://software.intel.com/en-us/forums/
Intel Software Development Products Support
http://software.intel.com/sites/support/
My Fortran blog
http://www.intel.com/software/drfortran

Refer to http://software.intel.com/en-us/articles/optimization-notice
for more information regarding performance and optimization choices in
Intel software products.