g95 -O2 test.f95 -o test.exe
to give the executable test.exe. Other optimization features are
available.
I would add that real(8) is non-portable Fortran.
Jimmy.
----- Original Message -----
From: "alexpatgoogle" <alexande...@gmx.net>
To: "gg95" <gg...@googlegroups.com>
Sent: Saturday, June 27, 2009 7:14 AM
Subject: Optimizer misunderstood?
>
> Hi all,
>
> the following example
>
> real(8) :: pic(1000,1000)
> real(8) :: picres(1000,1000)
> real(8) :: f(9,9)
> real(8) :: m(9,9)
>
>
> the code:
>
> m(:,:)=pic(900:909,900,909)*f(:,:)
> picres(905,905)=sum(m)
>
> takes twice the runtime as the one-liner
>
> picres(905,905)=sum(pic(900:909,900,909)*f(:,:))
>
>
> It is clear to me, that in the first case, data in principle gets
> copied in memory (to m), but I always thought that the optimizer would
> recognize that m is afterwards never used and would 'inline' this?
>
>
>
<cut>
program test
real(8) :: pic(1000,1000)
real(8) :: picres(1000,1000)
real(8) :: f(9,9)
real(8) :: m(9,9)
pic(:,:)=1.0
f(:,:)=2.0
!
call cpu_time(t1)
do i=1,10000000
m(:,:)=pic(901:909,901:909)*f(:,:)
picres(905,905)=sum(m)
end do
call cpu_time(t2)
write(*,*)(t2-t1)
call cpu_time(t1)
do i=1,10000000
picres(905,905)=sum(pic(901:909,901:909)*f(:,:))
end do
call cpu_time(t2)
write(*,*)(t2-t1)
stop
end program test
gives times 9.84/5.56 without the -O2 switch and 2.30/1.32 secs with
the -O2 switch. Both forms are speeded up by a factor of 4+, although the
one-liner is still faster. As you didn't get this speed-up some other
factors appear to be involved, eg as the arrays are large, it might be due
to cache memory hits/misses.
Jimmy.
----- Original Message -----
From: "alexpatgoogle" <alexande...@gmx.net>
To: "gg95" <gg...@googlegroups.com>
Sent: Thursday, July 02, 2009 7:16 AM
Subject: Re: Optimizer misunderstood?
>
> Hallo,
>
>
>>g95 -O2 test.f95 -o test.exe
>
> I tested several optimization options, e.g.:
> -ftree-vectorize -funroll-loops -malign-double -msse2 -fomit-frame-
> pointer -march=nocona
>
> but did not find an option which seems to speed up that special part
> (the oneliner is alway by far faster than the multi-line code).
>
>
><cut>