Global array operations: a performance hit?

deltaquattro

unread,

Jun 17, 2008, 11:41:47 AM6/17/08

to

Hi,

I was wondering whether global array operations, introduced in f90,
can have a negative impact on performance. Compare:

do i=1, ntheta
r(0,i) = rhub
r(nr+1,i) = rmax
dt(0,i) = 0.0
dft(0,i) = 0.0
dfr(0,i) = 0.0
dt(nr+1,i) = 0.0
dft(nr+1,i) = 0.0
dfr(nr+1,i) = 0.0
end do

with:

r(0,:) = rhub
r(nr+1,:) = rmax
dt(0,:) = 0.0
dft(0,:) = 0.0
dfr(0,:) = 0.0
dt(nr+1,:) = 0.0
dft(nr+1,:) = 0.0
dfr(nr+1,:) = 0.0

I found the execution time of the latter to be higher than the former,
as if many DO loops were executed instead than just one. Why use
global array operations then? Isn't better to stick to old plain DO
loops? Thanks,

regards,

deltaquattro

Dennis Wassel

unread,

Jun 17, 2008, 1:21:31 PM6/17/08

to

On 17 Jun., 17:41, deltaquattro <deltaquat...@gmail.com> wrote:
> Hi,
>
> I was wondering whether global array operations, introduced in f90,
> can have a negative impact on performance.
>

> [snip]

>
> I found the execution time of the latter to be higher than the former,
> as if many DO loops were executed instead than just one. Why use
> global array operations then? Isn't better to stick to old plain DO
> loops? Thanks,
>
> regards,
>
> deltaquattro

This is quite a strange observation and raises some questions:

1) What optimisation options did you use?

2) Which compiler did you use?
The gcc 4.0 and 4.1 Fortran compilers for instance are still pretty
much in their infancy, so one would expect bugs and strange behaviour
there. Use 4.2 or 4.3 instead, if you use gfortran.

3) How did you measure execution time?
I find that accuarate timing on a computer is a nontrivial task. The
'time' command on my machine shows up to 200% variance. I can only
assume you used some clever and appropriately precise way of
measuring.

I'm not a compiler specialist but AFAIK, array operations should not
usually be slower than explicit loop constructs.

Why? When using array operations like -say- x = MATMUL(A,b) in
contrast to two nested DO-loops, the compiler has a greater amount of
information at hand about what it is you want to do, which allows it
to use more aggressive optimisation methods to generate code, or to
generate calls to (more or less optimised) runtime libraries; the
latter is done by all compilers I know.
Additionally, the gfortran compiler has the '-fexternal-blas' option
which tells the compiler to automagically generate calls to an
optimised vendor BLAS for certain array operations, instead of using
the runtime library. I've never tried this, but using a tuned ATLAS
library will surely speed things up nicely.

A second benefit of array operations is their conciseness. Take the
MATMUL example again: A single call opposed to two nested DO-loops. Or
think of copying part of an array into another array, anything really!
IMHO, a lot of scientific code completely disregards maintainability
issues for the sake of the highest possible degree of code
optimisation.
Using array operations makes your code more concise, more readable and
therefore easier to maintain in the long run! It *should* also
improve, or at least not hurt, performance.

Cheers,
Dennis

James Van Buskirk

unread,

Jun 17, 2008, 1:55:04 PM6/17/08

to

"deltaquattro" <deltaq...@gmail.com> wrote in message
news:9c706700-2861-4d17...@2g2000hsn.googlegroups.com...

> do i=1, ntheta
> r(0,i) = rhub
> r(nr+1,i) = rmax
> dt(0,i) = 0.0
> dft(0,i) = 0.0
> dfr(0,i) = 0.0
> dt(nr+1,i) = 0.0
> dft(nr+1,i) = 0.0
> dfr(nr+1,i) = 0.0
> end do

> r(0,:) = rhub

> r(nr+1,:) = rmax
> dt(0,:) = 0.0
> dft(0,:) = 0.0
> dfr(0,:) = 0.0
> dt(nr+1,:) = 0.0
> dft(nr+1,:) = 0.0
> dfr(nr+1,:) = 0.0

> I found the execution time of the latter to be higher than the former,
> as if many DO loops were executed instead than just one. Why use
> global array operations then? Isn't better to stick to old plain DO
> loops? Thanks,

Normally an initialization loop like this one would be faster as
separate loops than one fused loop because it's faster to access
memory consecutively rather than jumping around as implied by the
fused loop. However in this case the loops appear to be setting
boundary values so they are traversing rows rather than columns of
the arrays. As a consequence the code jumps around in memory no
matter what the compiler does and loop fusion can win out because
it implies less loop overhead which otherwise would be of negligible
importance compared to memory access considerations (assuming that
the data set is too large to fit in cache).

One thing to investigate is whether the r(i,j), dt(i,j), dft(i,j),
and dfr(i,j) always get accessed together. If so, you could group
them as a derived type and the above loop could go 4X as fast as
the structure of arrays code listed above.

--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

Richard Maine

unread,

Jun 17, 2008, 2:31:06 PM6/17/08

to

deltaquattro <deltaq...@gmail.com> wrote:

> I was wondering whether global array operations, introduced in f90,
> can have a negative impact on performance. Compare:
>

[elided initialization with DO loops and whole array operations]

> I found the execution time of the latter to be higher than the former,
> as if many DO loops were executed instead than just one. Why use
> global array operations then? Isn't better to stick to old plain DO
> loops? Thanks,

Note that the usual terminology is something more like "whole array
operations" or even just an unmodified "array operations" instead of
"global".

The main reasons are clarity and conciseness. If it doesn't help clarity
and conciseness, don't do it. That is, no doubt, an oversimplification;
there are exceptions, etc. But its a good first approximation. Every
once in a while they might also get you faster execution, but if that is
your primary reason for using them, and you don't have specific
knowledge of exactly why to expect faster execution from your paticular
case, then your efforts are probably misplaced.

Execution time is actually not the sole measure of code "goodness". In
many cases, it isn't even particularly high on the list of important
things. Sometimes it isn't on the list at all. Other times it is at the
very top of the list. All generalizations are false, your mileage may
vary, etc.

In that regard, is execution time of an initialization such as this
actually significant in your code? While possible, that would be
unusual, and might suggest that the choice of algorithms is less than
ideal. There can be efficient algorithms like that, but they are rare.
As Dennis says, it can be tricky to even measure execution times
precisely enough to time initializations like this. I'm supposing that
perhaps you are just using this as an example of more "interesting"
cases.

In answer to Dennis, by the way, it is *VERY* common for whole array
operations to be slower than DO loops. No, it is not all all strange. It
is much closer to the usual state of afairs. There are a whole host of
reasons.

1. Compilers have had over 5 decades of time to develop techniques of
optimizing loops. Progress has been made in that time. There has only
been about a decade or two (some work preceeded the f90 standard; other
compilers didn't really start until later) of significant work on
optimizing array expressions. Things have improved and are still
improving, but it just is not at the level of experience of DO-loop
optimization.

2. Array temporaries are often a big deal in whole-array expressions. A
naive (aka straightforward) applicaion of the rules very often involves
such array temporaries, which are expensive in time. The compiler has to
do a fair amount of work to figure out whether they can be elided. See
point 1. That's probably not the case for your example, but it is a
common one.

3. Your example illustrates the problem of "loop fusion". The naive
(again aka straightforward) application of the rules for your code
example *DOES* imply separate loop for each array operation (complete
with all loop overhead). That's how the operations are defined. It is an
optimization for the compiler to recognize when it can usefully fuse
these multiple loops. See point 1.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain

Dennis Wassel

unread,

Jun 17, 2008, 4:05:47 PM6/17/08

to

On 17 Jun., 20:31, nos...@see.signature (Richard Maine) wrote:
>
> [snipety-snip]

>
> In answer to Dennis, by the way, it is *VERY* common for whole array
> operations to be slower than DO loops. No, it is not all all strange. It
> is much closer to the usual state of afairs. There are a whole host of
> reasons.

Beats me.
After all, when using array operations you have additional information
either readily available, or you are able to extract it fairly easy,
which is not always the case in DO loops. Talk about aliasing,
strides, contingent memory locations etc. But then again, "See point
1".
I actually find this hard to believe, but given that I'm rather new to
the delightful post-77 Fortran world and haven't really done any
serious benchmarking on array operations vs DO loops, I'll gladly
trust your judgement on this. Thanks for enlightening us!

> 1. Compilers have had over 5 decades of time to develop techniques of
> optimizing loops. Progress has been made in that time. There has only
> been about a decade or two (some work preceeded the f90 standard; other
> compilers didn't really start until later) of significant work on
> optimizing array expressions. Things have improved and are still
> improving, but it just is not at the level of experience of DO-loop
> optimization.

OK, here's my newfound corner of gcc development that I feel like
doing, as soon as I have more time on my hands than right now. After
all, despite my earlier ramblings about conciseness and
maintainability, performance DOES matter in many cases that are
relevant to me :)

> 2. Array temporaries are often a big deal in whole-array expressions. A
> naive (aka straightforward) applicaion of the rules very often involves
> such array temporaries, which are expensive in time. The compiler has to
> do a fair amount of work to figure out whether they can be elided. See
> point 1. That's probably not the case for your example, but it is a
> common one.

The Intel compiler (10.1, maybe earlier versions as well) throws a
warning at runtime if it finds itself needing to create an array
temporary; I found myself changing pieces of my code due to those
warnings.
Gonna have a look if the gfortran guys already have a feature request
about this...

Steven G. Kargl

unread,

Jun 17, 2008, 4:41:49 PM6/17/08

to

In article <9e1269be-de8c-4971...@i76g2000hsf.googlegroups.com>,

Dennis Wassel <dennis...@googlemail.com> writes:
> On 17 Jun., 20:31, nos...@see.signature (Richard Maine) wrote:
>
>> 1. Compilers have had over 5 decades of time to develop techniques of
>> optimizing loops. Progress has been made in that time. There has only
>> been about a decade or two (some work preceeded the f90 standard; other
>> compilers didn't really start until later) of significant work on
>> optimizing array expressions. Things have improved and are still
>> improving, but it just is not at the level of experience of DO-loop
>> optimization.
>
> OK, here's my newfound corner of gcc development that I feel like
> doing, as soon as I have more time on my hands than right now. After
> all, despite my earlier ramblings about conciseness and
> maintainability, performance DOES matter in many cases that are
> relevant to me :)

For starters, you can see what gfortran does by using the
-fdump-tree-original option. Try it with

subroutine po(x,y)
real x(3,3), y(3,3)
x = 1.
y = 0.
x = matmul(x,y)
end subroutine po

If you're really curious about the internal goop, use -fdump-tree-all.

>> 2. Array temporaries are often a big deal in whole-array expressions. A
>> naive (aka straightforward) applicaion of the rules very often involves
>> such array temporaries, which are expensive in time. The compiler has to
>> do a fair amount of work to figure out whether they can be elided. See
>> point 1. That's probably not the case for your example, but it is a
>> common one.
>
> The Intel compiler (10.1, maybe earlier versions as well) throws a
> warning at runtime if it finds itself needing to create an array
> temporary; I found myself changing pieces of my code due to those
> warnings.
> Gonna have a look if the gfortran guys already have a feature request
> about this...

There is currently no warning and AFAIK no request for such a feature.
gfortran has fairly decent dependency analysis, but in certain situation
it will err on the safe side and generate a temporary array even if
it isn't necessarily needed.

--
Steve
http://troutmask.apl.washington.edu/~kargl/

Dennis Wassel

unread,

Jun 17, 2008, 5:21:53 PM6/17/08

to

On 17 Jun., 22:41, ka...@troutmask.apl.washington.edu (Steven G.
Kargl) wrote:
> In article <9e1269be-de8c-4971-857d-e95bf639d...@i76g2000hsf.googlegroups.com>,

>
> For starters, you can see what gfortran does by using the
> -fdump-tree-original option. Try it with
>
> subroutine po(x,y)
> real x(3,3), y(3,3)
> x = 1.
> y = 0.
> x = matmul(x,y)
> end subroutine po
>
> If you're really curious about the internal goop, use -fdump-tree-all.

This sounds dangerously like "don't do this, unless you *really* want
to".
Right'o, gimme something to fill my umpteen MBs of terminal buffer :)

> There is currently no warning and AFAIK no request for such a feature.
> gfortran has fairly decent dependency analysis, but in certain situation
> it will err on the safe side and generate a temporary array even if
> it isn't necessarily needed.

There is a feature request -- PR 29952 created by Tobias in late 2006:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29952

I think this would be a cool feature, but then again, fixing bugs is
probably more important right now.

glen herrmannsfeldt

unread,

Jun 17, 2008, 8:15:46 PM6/17/08

to

James Van Buskirk wrote:
(snip of example with DO loops)

>>r(0,:) = rhub
>>r(nr+1,:) = rmax
>>dt(0,:) = 0.0
>>dft(0,:) = 0.0
>>dfr(0,:) = 0.0
>>dt(nr+1,:) = 0.0
>>dft(nr+1,:) = 0.0
>>dfr(nr+1,:) = 0.0

>>I found the execution time of the latter to be higher than the former,
>>as if many DO loops were executed instead than just one. Why use
>>global array operations then? Isn't better to stick to old plain DO
>>loops? Thanks,

> Normally an initialization loop like this one would be faster as
> separate loops than one fused loop because it's faster to access
> memory consecutively rather than jumping around as implied by the
> fused loop. However in this case the loops appear to be setting
> boundary values so they are traversing rows rather than columns of
> the arrays. As a consequence the code jumps around in memory no
> matter what the compiler does and loop fusion can win out because
> it implies less loop overhead which otherwise would be of negligible
> importance compared to memory access considerations (assuming that
> the data set is too large to fit in cache).

The cache effect can be complicated in cases like this.
If the different statements are on the same elements of
the same array, then a single loop helps them stay in cache.

If speed is that important, you might try reversing the
subscript order (in the whole program). Well, the general rule
is to arrange the subscripts such that the leftmost subscript
changes fastest in array operations. That is the order they
are stored in memory, the order they will be done in array
operations, and the order for I/O if just an array name is
specified.

> One thing to investigate is whether the r(i,j), dt(i,j), dft(i,j),
> and dfr(i,j) always get accessed together. If so, you could group
> them as a derived type and the above loop could go 4X as fast as
> the structure of arrays code listed above.

The old struct of array vs. array of struct trick.

http://coding.derkeiler.com/Archive/Fortran/comp.lang.fortran/2005-06/msg00590.html

-- glen

glen herrmannsfeldt

unread,

Jun 17, 2008, 8:36:27 PM6/17/08

to

Dennis Wassel wrote:
(snip, and previously snipped DO vs. array expressions)

> I'm not a compiler specialist but AFAIK, array operations should not
> usually be slower than explicit loop constructs.

To make a fair comparison, it should be separate DO loops
vs. array operations, and separate DO loops vs.
a single DO loop. Then you can separate the difference
due to memory access patterns and actual instructions.

My usual rule is that simple array operations are better than
(or at least as good as) DO loops, but more complicated ones
are slower. (Especially if temporary arrays are used.)

-- glen

Craig Powers

unread,

Jun 17, 2008, 7:40:10 PM6/17/08

to

glen herrmannsfeldt wrote:
>
> If speed is that important, you might try reversing the
> subscript order (in the whole program).

I suspect, given the context, that outside of this initialization, the
subscripts are accessed in the correct order.

glen herrmannsfeldt

unread,

Jun 17, 2008, 9:34:34 PM6/17/08

to

That is the problem with posting only a small part.

Though I might wonder if this is such a small part, how one
could notice the speed difference. It seemed worth a reminder,
just in case.

-- glen

robert....@sun.com

unread,

Jun 17, 2008, 10:04:07 PM6/17/08

to

An assignment of a scalar to an array should never use a
temporary array. Even the simplest compiler should get that
right.

Bob Corbett

deltaquattro

unread,

Jun 18, 2008, 4:20:25 AM6/18/08

to

On 17 Giu, 19:21, Dennis Wassel <dennis.was...@googlemail.com> wrote:
> On 17 Jun., 17:41, deltaquattro <deltaquat...@gmail.com> wrote:

[..]

>
> This is quite a strange observation and raises some questions:
>
> 1) What optimisation options did you use?
>

None.

> 2) Which compiler did you use?

Compaq Visual Fortran.

>
> 3) How did you measure execution time?
> I find that accuarate timing on a computer is a nontrivial task. The
> 'time' command on my machine shows up to 200% variance. I can only
> assume you used some clever and appropriately precise way of
> measuring.
>

When Richard told you exactly the same, you didn't ask which "clever
and appropriately precise way of measuring" he used, but you quickly
thanked him for "enlightening us". I can only assume that for you
using "some clever and appropriate precise way of measuring" means
asking for Richard's agreement on the subject.

> I'm not a compiler specialist but AFAIK, array operations should not
> usually be slower than explicit loop constructs.

Well, then you could just try some example for yourself and see what
happens. My experience comes from different CFD codes I wrote using
whole array operations and single DO loops, and with my compiler I
often found significant execution time differences in real life codes
on real life cases. That's enough for me to start asking questions on
the ng, and yes thanks, I know enough Fortran to reverse index
ordering outside initialization loops: that was just an example.

>
> Cheers,
> Dennis

Best regards,

deltaquattro

Dennis Wassel

unread,

Jun 18, 2008, 5:41:55 AM6/18/08

to

On 18 Jun., 10:20, deltaquattro <deltaquat...@gmail.com> wrote:
>
> > 3) How did you measure execution time?
> > I find that accuarate timing on a computer is a nontrivial task. The
> > 'time' command on my machine shows up to 200% variance. I can only
> > assume you used some clever and appropriately precise way of
> > measuring.
>
> When Richard told you exactly the same, you didn't ask which "clever
> and appropriately precise way of measuring" he used, but you quickly
> thanked him for "enlightening us". I can only assume that for you
> using "some clever and appropriate precise way of measuring" means
> asking for Richard's agreement on the subject.

Hey, easy on the noobs!
That's outright flaming, when all I was trying to do is to help!
Had you earlier made any suggestion that you know yourself around
Fortran coding very well, thank you, I might as well have left
answering to the experts.
Instead you *now* post something I can summarise as "STFU n00b", which
tends to scare away new members. Not gonna cow me down, but others
would have left for good by now.

But then again, who am I telling? Of course you know all that!

> > I'm not a compiler specialist but AFAIK, array operations should not
> > usually be slower than explicit loop constructs.
>
> Well, then you could just try some example for yourself and see what
> happens. My experience comes from different CFD codes I wrote using
> whole array operations and single DO loops, and with my compiler I
> often found significant execution time differences in real life codes
> on real life cases.
>

That would have been insightful.

>
> Best regards,
>
> deltaquattro

deltaquattro

unread,

Jun 18, 2008, 8:18:15 AM6/18/08

to

On 18 Giu, 11:41, Dennis Wassel <dennis.was...@googlemail.com> wrote:
> On 18 Jun., 10:20, deltaquattro <deltaquat...@gmail.com> wrote:
>
>
>
> > > 3) How did you measure execution time?
> > > I find that accuarate timing on a computer is a nontrivial task. The
> > > 'time' command on my machine shows up to 200% variance. I can only
> > > assume you used some clever and appropriately precise way of
> > > measuring.
>
> > When Richard told you exactly the same, you didn't ask which "clever
> > and appropriately precise way of measuring" he used, but you quickly
> > thanked him for "enlightening us". I can only assume that for you
> > using "some clever and appropriate precise way of measuring" means
> > asking for Richard's agreement on the subject.
>
> Hey, easy on the noobs!
> That's outright flaming, when all I was trying to do is to help!
> Had you earlier made any suggestion that you know yourself around
> Fortran coding very well, thank you, I might as well have left
> answering to the experts.
> Instead you *now* post something I can summarise as "STFU n00b", which
> tends to scare away new members. Not gonna cow me down, but others
> would have left for good by now.
>
> But then again, who am I telling? Of course you know all that!
>

I'm not flaming and I don't think you are a novice: on the contrary,
since you talked about the possibility of you being involved in gcc
development, I think you know Fortran way better than me.
Anyway, I would never say "STFU n00b" to anyone under any
circumstances, if I understand what that means (I'm not sure since
because English is not my mother tongue, and I don't live in an
English speaking country).
The point is that when I posted a genuine doubt about speed of whole
array operations, you replied asking me if I have been clever enough
in doing my timings. When Richard confirmed my doubts, you were ready
to accept his words. To me, it seemed as if you were dismissing my
doubts just because I'm not an expert as Richard, robin, glenn, Paul,
Steve and all the many other guys on this newsgroup.
If this isn't so, then I apologize to you and to the rest of the
newsgroup: I have misinterpreted your words, maybe also because of my
imperfect knowledge of English.

Best Regards,

deltaquattro

Dennis Wassel

unread,

Jun 18, 2008, 9:47:36 AM6/18/08

to

*sigh* I'm afraid it sounded exactly like that.

Sorry if I sounded as if I just dismissed your doubts as pointless, I
am in no position to do that; I wanted to point out that proper timing
is nontrivial, and whether you're aware of that. Apparently you are!
With Richard, his post (and google, I admit!) gives me some idea about
his background, so I'll tend to trust in what this guy says.

Besides, the fact that array operations may indeed be slower than
explicit loops was (and still is) hard for me to believe. Learned
something today!

> If this isn't so, then I apologize to you and to the rest of the
> newsgroup: I have misinterpreted your words, maybe also because of my
> imperfect knowledge of English.
>
> Best Regards,
>
> deltaquattro

No offence meant, and none taken (I hope)!

BTT:
Is (compiler) optimisation of array operations that much of a
nontrivial task or just something that nobody has come around to
doing, yet? Judging from their mailing list, gfortran seems to have
many construction sites and a number of them apparently with higher
priorities.

But deltaquattro reported this against Compaq, and I also remember
someone from the gfortran mailing list mentioning that ifort shines on
DO-loop optimisation. Alas, I really feel like digging into this, but
got to get my current project done first.

Cheers,
Dennis

Richard Maine

unread,

Jun 18, 2008, 11:34:23 AM6/18/08

to

deltaquattro <deltaq...@gmail.com> wrote:

> On 17 Giu, 19:21, Dennis Wassel <dennis.was...@googlemail.com> wrote:
> > On 17 Jun., 17:41, deltaquattro <deltaquat...@gmail.com> wrote:

> > 3) How did you measure execution time?
> > I find that accuarate timing on a computer is a nontrivial task. The
> > 'time' command on my machine shows up to 200% variance. I can only
> > assume you used some clever and appropriately precise way of
> > measuring.
> >
> When Richard told you exactly the same, you didn't ask which "clever
> and appropriately precise way of measuring" he used, but you quickly
> thanked him for "enlightening us".

As long as I'm being mentioned by name, I'll note that I thought I was
agreeing with Dennis on this particular point/question. I don't recall

answering it at all, but rather reinforcing it when I said:

>>> As Dennis says, it can be tricky to even measure execution times
>>> precisely enough to time initializations like this.

I did disagree on other points, including the larger one about array
operations often being slower (or not). But for the specific case here,
where it is just an initialization, and thus presumably pretty quick, I
had exactly the same question, even if I stated it implicitly instead of
with a question mark. I just didn't really need to ask because I agreed
with your (deltaquattro's) general observation, even though I wondered
about how this particular case was measured.

Craig Powers

unread,

Jun 18, 2008, 1:08:26 PM6/18/08

to

It might be part of an outer loop that is executed many times. I had
something of this nature happen in one of my programs---not down to the
subscripting, but simple initialization-to-zero. A bunch of stuff
needed to be initialized to zero on each pass through the outer loop,
and doing it using DO loops (which also, incidentally, cut out
initialization of unused portions of a ragged array) realized a
substantial savings in program execution time for certain input sets.

> It seemed worth a reminder, just in case.

Certainly true.

glen herrmannsfeldt

unread,

Jun 18, 2008, 6:00:39 PM6/18/08

to

Dennis Wassel wrote:
(snip)

> Is (compiler) optimisation of array operations that much of a
> nontrivial task or just something that nobody has come around to
> doing, yet? Judging from their mailing list, gfortran seems to have
> many construction sites and a number of them apparently with higher
> priorities.

If you do exactly the same thing in both cases, the compiler
should do pretty well, except in cases that modify the source array.
There are cases where you know that no array elements are used
after they are modified (using DO loops), but the compiler doesn't
(using array expressions).

Using vector subscripts as an example, say you have arrays X and Y,
both dimensioned N, where X has a permutation of the numbers from
1 to N.

DO I=1,N
Y(I)=Y(X(I))
ENDDO

reorders Y based on the permutation, where you know that
the values in X are unique.

Y=Y(X)

does it using vector subscripts, but the compiler does not
know that the values are unique, so a temporary array is
needed. If you read in X from a file, there is no way
that the compiler could possibly know the values are unique.
If you generated X directly, it is unlikely that even the
best optimizer would figure it out.

-- glen

Dennis Wassel

unread,

Jun 19, 2008, 4:16:34 AM6/19/08

to

On 19 Jun., 00:00, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> Dennis Wassel wrote:
>
> (snip)
>
> > Is (compiler) optimisation of array operations that much of a
> > nontrivial task or just something that nobody has come around to
> > doing, yet? Judging from their mailing list, gfortran seems to have
> > many construction sites and a number of them apparently with higher
> > priorities.
>
> If you do exactly the same thing in both cases, the compiler
> should do pretty well, except in cases that modify the source array.
> There are cases where you know that no array elements are used
> after they are modified (using DO loops), but the compiler doesn't
> (using array expressions).
>
> Using vector subscripts as an example, say you have arrays X and Y,
> both dimensioned N, where X has a permutation of the numbers from
> 1 to N.
>
> DO I=1,N
> Y(I)=Y(X(I))
> ENDDO

*Errrrrm*
Am I getting something wrong? This won't work, consider for instance X
= (2 1).
Except for the special case X(I) >= I \forall I, this will produce
garbage (and for permutations, that special case is only satisfied by
the identity).
Point is, I'm fairly sure you need temporaries here as well.