F90 performace problem on SGI (vs f77)

robert somerville

unread,

Sep 16, 2002, 10:16:47 PM9/16/02

to

I am finding that f77 coding on matricies sometimes vastly out
performs F90 coding on SGI IRIX6.5 with latest compilers,
for example, given:

q1(2100,2000), q2(2000,2000)

this coding is twice as slow as the f77 coding:

q2(1:2000,:) = q1(1:2000,:)

as compared to f77 coding

do j = 1,2000
do i=1,2000
q2(i,j) = q1(i,j)
enddo
enddo

is this a problem with the compiler, or the F90 newbie ????

thanks
Robert somerville

Michael Metcalf

unread,

Sep 17, 2002, 3:26:52 AM9/17/02

to

"robert somerville" <some...@telus.net> wrote in message
news:3D8690AE...@telus.net...

> I am finding that f77 coding on matricies sometimes vastly out
> performs F90 coding on SGI IRIX6.5 with latest compilers,
> for example, given:
>
> q1(2100,2000), q2(2000,2000)

^

Do you really mean 2100? If it's supposed to be 2000, have you tried

q2 = q1 ?

Regards,

Mike Metcalf

robert somerville

unread,

Sep 17, 2002, 6:30:17 AM9/17/02

to

no, q1's row dimension (2100), because of math derivation, needs to be
bigger than q2's (2000) , which is used in the next step of the
solution. In this case the f77 is twice as fast as the f90 way i
originally coded it, at least according to the profiler

> Regards,
>
> Mike Metcalf

Jan C. Vorbrüggen

unread,

Sep 17, 2002, 7:34:43 AM9/17/02

to

This looks like a problem with the compiler. However, this is an area
where compilers have improved considerably in the past years. Are you
sure you have an up-to-date version of the compiler in question?

Jan

Jan C. Vorbrüggen

unread,

Sep 17, 2002, 8:07:08 AM9/17/02

to

Attached you'll find a little test program. Compiling the two files seperately
/fast and linking them with CVF 6.6A, I get the following results:

Loop count:10
6.960000 for array operation
6.760000 for loop

Loop count:20
13.62000 for array operation
13.50900 for loop

Loop count:30
20.46000 for array operation
20.25900 for loop

Loop count:30
20.40000 for array operation
20.36900 for loop

So there's a ~2% difference between the two. If I leave out the call to
do_nothing, the run time is always 0.0, as expected 8-|.

Jan

program test

implicit none

real(8) q1(2100,2100), q2(2000,2000)
integer start, stop, rate
integer i, j, c, count

write (*, "(A)", advance="no") "Loop count:"
read (*, *) count

call system_clock (count_rate = rate)

call random (q2)

call system_clock (count = start)

do c = 1, count
q1(51:2050, 51:2050) = q2
call do_nothing (q1)
enddo

call system_clock (count = stop)

write (*, *) FLOAT(stop-start)/FLOAT(rate), " for array operation"

call system_clock (count = start)

do c = 1, count

do j = 1,2000
do i=1,2000

q1 (i+50, j+50) = q2 (i, j)
enddo
enddo
call do_nothing (q1)
enddo

call system_clock (count = stop)

write (*, *) FLOAT(stop-start)/FLOAT(rate), " for loop"

end program test
!
! must be in seperate file and compiled seperately....
!
subroutine do_nothing (x)
real(8) x(*)

return

end

Rob Somerville

unread,

Sep 17, 2002, 8:13:18 AM9/17/02

to

yes it is the very latest production SGI compiler release :

f90 -version
MIPSpro Compilers: Version 7.3.1.3m

Dr Ivan D. Reid

unread,

Sep 17, 2002, 9:04:25 AM9/17/02

to

On Tue, 17 Sep 2002 02:16:47 GMT, robert somerville <some...@telus.net>
wrote in <3D8690AE...@telus.net>:

>q1(2100,2000), q2(2000,2000)

I'd suspect that you're triggering an unnecessary copy to
an intermediate at some level. Since you're copying to the whole
of q2, does

q2 = q1(1,2000,:)

run any differently. Alternately, generate a machine-code listing and
inspect that for unnecessary activity.

--
Ivan Reid, Electronic & Computer Eng., Brunel Uni. Ivan...@brunel.ac.uk
KotPT -- "for stupidity above and beyond the call of duty".

Bil Kleb

unread,

Sep 17, 2002, 9:08:38 AM9/17/02

to

Rob Somerville wrote:
>
> yes it is the very latest production SGI compiler release

We've found SGI compilers to be quiet poor for certain F90
array constructs. See

http://groups.google.com/groups?th=234c8f81140343dc

for some amazing numbers. (Previous versions of the SGI
compiler were giving F90/F77 near 100 for both the work
and io tests!)

--
Bil

Richard Maine

unread,

Sep 17, 2002, 10:57:29 AM9/17/02

to

robert somerville <some...@telus.net> writes:

> I am finding that f77 coding on matricies sometimes vastly out
> performs F90 coding on SGI IRIX6.5 with latest compilers,

> for example,...

> as compared to f77 coding
>
> do j = 1,2000
> do i=1,2000
> q2(i,j) = q1(i,j)
> enddo
> enddo

I'll leave the performance comments to others. I see there have been
some such comments. I'll restrict my comment to noting that the above
*IS* f90 coding. It is perfectly legitimate f90 code. (Indeed, it is
not actually legitimate f77 code, do/enddo being an extension of f77).

Both whole array operations and loops are equally legitimate f90
constructs and they both have their places in f90 code. Characterizing
loops as non-f90 is simply incorrect. One might as well characterize
all names shorter than 8 characters as being non-f90.

--
Richard Maine | Good judgment comes from experience;
email: my last name at host.domain | experience comes from bad judgment.
host: altair, domain: dfrc.nasa.gov | -- Mark Twain

Rob Somerville

unread,

Sep 17, 2002, 3:32:24 PM9/17/02

to

well, today I can't reproduce my claim of yesterday :-( , although i duplicated
it several times! possibly interaction of system load & the profiler ???

in any case my results are relatively comparable in relative speeds to yours!
Althougth my slight modification to your code results in a segmentation
fault,which may indicate a problem of some sort with our compiler :

./test14
Loop count:5
Segmentation fault

-----------------------------

program test

implicit none

real*4, pointer,dimension(:,:) :: q1, q2
real*4, allocatable,target,dimension(:,:) :: x1, x2

integer start, stop, rate
integer i, j, c, count

allocate(x1(6200,6000),x2(6000,6000))

q1 => x1
q2 => x2

write (*, "(A)", advance="no") "Loop count:"
read (*, *) count

call system_clock (count_rate = rate)

call random_number (q1)

call system_clock (count = start)

do c = 1, count

q2(1:6000, :) = q1(1:6000,:)
! call do_nothing (q1)
enddo

call system_clock (count = stop)

write (*, *) FLOAT(stop-start)/FLOAT(rate), " for array operation"

call system_clock (count = start)

do c = 1, count

do j = 1,6000
do i=1,6000
q2 (i, j) = q1 (i, j)
enddo
enddo
! call do_nothing (q1)
enddo

call system_clock (count = stop)

write (*, *) FLOAT(stop-start)/FLOAT(rate), " for loop"

end program test

j...@watson.ibm.com

unread,

Sep 19, 2002, 1:05:35 AM9/19/02

to

In article <ueu1ko7...@altair.dfrc.nasa.gov>,
on 17 Sep 2002 07:57:29 -0700,

Richard Maine <nos...@see.signature> writes:
>robert somerville <some...@telus.net> writes:
>
>> I am finding that f77 coding on matricies sometimes vastly out
>> performs F90 coding on SGI IRIX6.5 with latest compilers,
>> for example,...
>> as compared to f77 coding
>>
>> do j = 1,2000
>> do i=1,2000
>> q2(i,j) = q1(i,j)
>> enddo
>> enddo
>
>I'll leave the performance comments to others. I see there have been
>some such comments. I'll restrict my comment to noting that the above
>*IS* f90 coding. It is perfectly legitimate f90 code. (Indeed, it is
>not actually legitimate f77 code, do/enddo being an extension of f77).
>
>Both whole array operations and loops are equally legitimate f90
>constructs and they both have their places in f90 code. Characterizing
>loops as non-f90 is simply incorrect. One might as well characterize
>all names shorter than 8 characters as being non-f90.

It seemed clear enough to me. Would you prefer "f77 style"
and "f90 style"?
You seem a bit defensive about the fact that f90 style code
is often slower. In this case this may be due to the fact that f90
array operations were defined (unwisely in my opinion) so as to
sometimes require temporary arrays. Compilers are not always clever
enough to eliminate these temporary arrays when they are not needed.
This can add considerable overhead. Some compilers (xlf for example)
have a compiler option which tells the compiler that temporary arrays
are not needed which may allow them to generate better code.
James B. Shearer

Richard Maine

unread,

Sep 19, 2002, 10:51:54 AM9/19/02

to

j...@watson.ibm.com writes:

> In article <ueu1ko7...@altair.dfrc.nasa.gov>,
> on 17 Sep 2002 07:57:29 -0700,
> Richard Maine <nos...@see.signature> writes:

> >Both whole array operations and loops are equally legitimate f90
> >constructs and they both have their places in f90 code. Characterizing

> >loops as non-f90 is simply incorrect....

> It seemed clear enough to me. Would you prefer "f77 style"
> and "f90 style"? You seem a bit defensive about the fact that f90 style code
> is often slower.

I'm afraid you miss my whole point. *BOTH* styles are f90 styles.
This is like asking whether you drink liquid or coffee in the morning
- it is not a meangful distinction because one is a subset of the
other (well, usually :-). It is true that whole array operations are
not f77-style, but it is misleading to say that DO loops are not f90 style.

There exist people who seem to think it is somehow improper to use DO
loops in f90 - that one must use array operations, no matter how slow
or awkward. If so, that is their personal style choice - it is
*NOT* a style choice defined by the language. (Nor is it a style
choice that I personally like).

Yes, whole-array operations are sometimes slower. I do not argue that
point at all. I agree with that point. I argue only that the use of
DO loops does not make a code "f77-style".

Kenneth H. Fairfield

unread,

Sep 19, 2002, 5:17:10 PM9/19/02

to

Rob Somerville wrote:

> well, today I can't reproduce my claim of yesterday :-( , although i duplicated
> it several times! possibly interaction of system load & the profiler ???
>
> in any case my results are relatively comparable in relative speeds to yours!
> Althougth my slight modification to your code results in a segmentation

================================

If I read the enclosed code correctly, your "slight modification"
completely changed Jan's example code: since you've commented-out the
calls to the procedure do_nothing, the compiler is free to optimize
your whole program away!!! And most good compilers will do that...

Create a skeleton do_nothing subroutine, like that provided in
Jan's example, remove the comments, and rerun your tests. I can't
guarantee that you'll get the preformance skewing you saw originally,

-Ken
--
I don't speak for Intel, Intel doesn't speak for me...

Ken Fairfield
D1C Automation VMS System Support
kenneth.h.fairfield#intel.com

j...@watson.ibm.com

unread,

Sep 27, 2002, 9:19:22 PM9/27/02

to

In article <ueznuey...@altair.dfrc.nasa.gov>,
on 19 Sep 2002 07:51:54 -0700,
Richard Maine <nos...@see.signature> writes:

<snip>

>There exist people who seem to think it is somehow improper to use DO
>loops in f90 - that one must use array operations, no matter how slow
>or awkward. If so, that is their personal style choice - it is
>*NOT* a style choice defined by the language. (Nor is it a style
>choice that I personally like).

Considering all the f90 propaganda about what an advance f90
was on f77, it is not surprising that some people think that if f90
provides another way of doing something that you could do in f77 then
the fortran 90 way must be better. The average user is not aware that
many f90 features were defined in ways that are difficult to implement
efficiently.
James B. Shearer

Richard Maine

unread,

Sep 28, 2002, 1:12:47 PM9/28/02

to

j...@watson.ibm.com writes:

> ...it is not surprising that some people think that if f90

> provides another way of doing something that you could do in f77 then
> the fortran 90 way must be better.

That is part of why I get pedantic about the point of terminology. I
think that the use of misleading terminology contributes to such
misconceptions. Describing DO loops vs whole array operations in
terms of f77 style vs f90 style is easily the topmost item on my list
of such misleading terminologies.

I repeat again that whole array operations are not "the f90 way".
They are one of the f90 ways. If one describes whole array operations
as "the f90 way" then one encourages exactly the kind of misconception
that you mention above. If one impartially says that f90 provides two
(or more, really) ways to do this, then that encourages the
consideration of which way is better for a given situation. I think
that your way of describing it encourages the proplem that you
then bemoan.

F90 also allows variable names up to 31 characters long, whereas f77
was limited to 6 (at least in the standard, though extensions were almost
universal in later years). That doesn't mean that the f90 way is to
use long variable names. Instead it means that you can choose (within
the limit of 31) how long a name is appropriate for a given situation.
Names pushing the 31-character limit are appropriate for some situations,
and 1-letter names are appropriate for others. (Names longer than 31 might
even be appropriate in a few situations, though not very many IMO - the
f2k CD raises this to...I think it was 63). The f90 way here is to
choose whatever length is appropriate based on criteria other than the
language.

Likewise, the f90 way is to choose either DO loops or whole array
operations as seems appropriate based on other criteria. The standard
itself offers no hint of preference. Any prejudicial labelling of
only one of these as "the f90 way" is a free (and misleading) extra.

--
Richard Maine
email: my last name at domain
domain: isomedia dot com