is there a performance difference when I execute an operation on an
array, versus splitting it up in do-loops which execute the same
operation element-by-element?
I am asking because in Matlab, there is a big performance impact, and I
am wondering how I should program.
If the answer is compiler-dependent, I would narrow it down to gfortran
and ifort.
Once again, thanks in advance!
Daniel
The short answer (on the performance part of the question) is yes. There
frequently is a difference.
The long answer is no, most compilers expand the code into intermediate
language before optimizing, and the same transformations are applied
either way. You will need to try it out yourself, study the compiler
reports, besides taking advantage of your search tool if you want to see
input from others.
ifort may have more built-in templates which recognize patterns arising
from array operations such as maxloc, in order to optimize them,
particularly for the newer instruction sets. With DO loops, ifort
places more reliance on directives than other current compilers.
You seem to be excluding the common situations where multiple operations
are applied to the same data, and loop fusion may be required for
efficient operation with array assignment. The compilers you mention
give less attention to this than certain others.
Array assignments for multiple rank arrays challenge compilers to find
the appropriate loop nesting, but this may not be an issue if your style
with nested DO loops is to put the nesting backwards.
On the question of how you should program, the first priority should be
clarity of expression and ease of finding and correcting bugs.
--
Tim Prince
In matlab, the loops are interpreted, while the array operations are run
as compiled code. In fortran, all code is compiled. Hence, the
difference is probably small, at least for low levels of optimization.
That being said, the compiler (speaking generally here) can probably
optimize more easily for array expressions. For instance, on intel CPUs,
SSE instructions can speed up computation a lot for array expressions.
Also, it makes for less code to write and read. I'd do the following:
1) Write array expressions where that is simple
2) Resort to loops where that is more clear
3) Measure what part of your code takes the most time - optimize this part
Of course, before screwing around with 3), try adding flags to see if
the compiler can magically do the job for you :)
Cheers
Paul
> Hi
>
> is there a performance difference when I execute an operation on an
> array, versus splitting it up in do-loops which execute the same
> operation element-by-element?
>
> I am asking because in Matlab, there is a big performance impact, and I
> am wondering how I should program.
MatLab is an interpretor so its atomic actions are quite different.
Assigning to a whole
array is only slightly more expensive that assigning to a single array
element and
could even be cheaper if the subscript calculation takes some amount of
time, like
looking the subscript up and converting it to a suitable form. As well
each statement
in the looping structure will generate an interpreter action that takes time.
But in a compiled language the array assignment will require a small number of
machine level iteration statements in addition to the literal assignment to
the array elements. There is a good chance that the machine instructions for
the compiled iteration will be the same as the ones the compiler
generates for the
array assignment.
In other words, you learn almost nothing useful from the timings of
interpeted languages
like MatLab. Bytecode languages have the same issues with the confusion
that some
do not have atomic array operations so do not display the hage
difference between
array assignment and programmer loops for the same effect.
> If the answer is compiler-dependent, I would narrow it down to gfortran
> and ifort.
The difference will be between compiled and interpreted rather than
between compilers.
Confusion sets in when compilation to interpreted bytecodes is tossed
into the mix.
If you are running into cache and/or paging issues then the intepreters
may have
atomic array operations that are cache/paging aware so it becomes even more
confusing but you are not likely to be using an interpretors for other reasons
by then. A careful compiler could also be using the same tricks for its array
assignments. So learn the gory details of each and every system by RTFMVC (read
the fine manual very carefully). And even then some of your coffee time buddies
will ask whether you considered so irrelevant techinical issue
carefully because
someone they met (six to ten times removed) said somthing.
Only a slight take regarding Matlab; even in Matlab "it depends". The
JIT compiler now can in some instances even make the conclusion there
less conclusive of "big" impacts...
Other than that Matlab is no longer as purely a step-by-step interpreter
as it once was, I'll concur w/ the other posters.
--
I think there are two separate issues in this question that are
getting confused. The first is the difference between a compiled
and an interpreted language. Yes, Matlab might be able to execute
array operations faster than do loop expressions, but that is
primarily because of how the interpreter works, and that issue does
not apply to fortran.
In fortran, the issue is how scalar operations within nested do
loops are optimized compared to the equivalent array operations. In
a simple case, a scalar expression might require using, say, six
registers to hold intermediate results. When written as an array
expression, the compiler might need to allocate space for six
intermediate arrays to hold that same information. Using six
registers is very fast compared to allocating, accessing, and
deallocating six arrays in memory. The same issue applies to six
cached memory locations compared to six arrays that are too large to
be cached. The reason for this is that fortran semantics requires
the expression to be computed "as if" the entire right hand side is
evaluated before any assignment occurs. The optimizer is allowed to
simplify this and to avoid the unnecessary memory allocations, but
this ability has been slow coming over the past 20 years. It is
better now than 20 years ago, but in many cases the programmer can
still do a better job by writing do loops than by relying on the
compiler to optimize the array expression.
The best thing to do in fortran is to write the code both ways, time
the results on a collection of compilers, and choose the best source
code form based on these results. But that takes a lot of work. So
people usually just settle on what looks like the clearest
expression of the algorithm that performs acceptably well on one
compiler, and hope for the best elsewhere.
$.02 -Ron Shepard
> On 8/7/2011 5:01 AM, Daniel H wrote:
> > is there a performance difference when I execute an operation on an
> > array, versus splitting it up in do-loops which execute the same
> > operation element-by-element?
> >
> > I am asking because in Matlab, there is a big performance impact, and I
> > am wondering how I should program.
> ...
>
> Only a slight take regarding Matlab; even in Matlab "it depends".
One point that is at least implicit in all the other posts might not
have struck you. Granted all the qualifications about how "it depends",
and granted that one should usually go first for things like clarity of
code, note that when there is a difference in performance between loops
and array expressions in Fortran, it is most common for the difference
to be in the opposite direction from Matlab's; the loops will more often
be faster than the array operations in Fortran.
This certainly is not always true. Doing the loops in particularly
inefficient ways can certainly kill performance, so back to "it
depends". But the most common issues are completely unrelated to the
issues you see in Matlab and happen to be in the opposite direction.
Array expressions in Fortran can result in relatively expensive use of
temporary arrays, with allocation, deallocation, and data copying.
Fortran array expressions can result in multiple loops (under the hood),
multiplying the loop overhead, in cases where you'd write things as a
single DO loop. Optimizers can sometimes fix these things, but
optimizers for array expressions often don't seem to do as well as
optimizers for DO loops, which have had half a century of attention.
In closing, let me reemphasize the point about going for clarity. I have
seen (many times) people write horribly messy array expressions to
replace relatively simple DO loops just because they thought that DO
loops were somehow deprecated and should be avoided. They are then
surprised when the messy array version is slower. Don't do that. If you
can replace a messy DO loop with a simple array expression, go ahead.
But if your transformations make the code messier, don't be entirely
surprised if they also make it slower... and even if the messier version
is faster, contemplate whether that is worth the likely extra
maintenance/debugging cost of messy code.
--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain
Agreed on the general ideas; interestingly the same result can hold in
Matlab; not infrequently the holy grail of "vectorization" there in
complex operations that leads to truly complex Matlab expressions falls
prey to similar problems and the straightahead loop solution wins. It
becomes more and more the case as JIT continues to improve that what
used to be so just ain't necessarily so w/ Matlab; generally clearly
written code wins there as well alhtough obviously not always.
Matlab lets you get away w/ lazy declarations in many more places than
does Fortran even w/ the new features; this can transparently end up in
many copies and reallocations that are, indeed, bottlenecks where a
preallocation step can save much.
--
I usually take an empirical approach to questions like these: try it.
>> > is there a performance difference when I execute an operation on an
>> > array, versus splitting it up in do-loops which execute the same
>> > operation element-by-element?
(snip)
> One point that is at least implicit in all the other posts might not
> have struck you. Granted all the qualifications about how "it depends",
> and granted that one should usually go first for things like clarity of
> code, note that when there is a difference in performance between loops
> and array expressions in Fortran, it is most common for the difference
> to be in the opposite direction from Matlab's; the loops will more often
> be faster than the array operations in Fortran.
> This certainly is not always true. Doing the loops in particularly
> inefficient ways can certainly kill performance, so back to "it
> depends". But the most common issues are completely unrelated to the
> issues you see in Matlab and happen to be in the opposite direction.
> Array expressions in Fortran can result in relatively expensive use of
> temporary arrays, with allocation, deallocation, and data copying.
> Fortran array expressions can result in multiple loops (under the hood),
> multiplying the loop overhead, in cases where you'd write things as a
> single DO loop. Optimizers can sometimes fix these things, but
> optimizers for array expressions often don't seem to do as well as
> optimizers for DO loops, which have had half a century of attention.
And that is true when the expressions do a similar computation.
There are a number of cases where the array expression requires
a lot more computation.
If you use MERGE, or any of the functions with a MASK argument,
you might do much more actual calculation than would be naturally
done in a DO loop case. If you exit the DO loops early, that is
likely faster than evaluating a logical expression, and then
using ANY to test if one case is true.
> In closing, let me reemphasize the point about going for clarity. I have
> seen (many times) people write horribly messy array expressions to
> replace relatively simple DO loops just because they thought that DO
> loops were somehow deprecated and should be avoided. They are then
> surprised when the messy array version is slower. Don't do that. If you
> can replace a messy DO loop with a simple array expression, go ahead.
> But if your transformations make the code messier, don't be entirely
> surprised if they also make it slower... and even if the messier version
> is faster, contemplate whether that is worth the likely extra
> maintenance/debugging cost of messy code.
Well, there is the fun of finding an array expression that happens
to give the appropriate result, and is especially unobvious.
But don't do that if you want it to run fast, or if you want
anyone else to understand it.
My usual rule is that simple array expresions are faster than
DO loops, complicated array expressions are slower.
And an example from many years ago, from a book on how not
to do it:
DO 1 I=1,N
DO 1 J=1,N
1 X(J,I)=(I/J)*(J/I)
-- glen