gfortran and Intel Fortran comparison

Sameer Marathe

unread,

Apr 24, 2011, 12:24:28 PM4/24/11

to gnu-f...@googlegroups.com

Few days ago someone had posted a link to the Computer Languages Benchmark Game (http://shootout.alioth.debian.org/). So when I saw that there were no gfortran implementations in the benchmark, I thought I would try it myself. I had a program for plotting the Mandelbrot set which I had to modify slightly to match the requirements of the benchmark game. I borrowed parts from the existing Fortran implementation on the benchmark website. I wrote a Python script similar to the benchmark that uses Popen and os.wait3 to test the performance.

To the cut a long story short, I found that:
1. With optimization turned off, gfortran performed much better than ifort.
2. At O3 optimization level, ifort performance was significantly better that O0 level but gfortran O3 optimization didn't improve the performance much compared to gfortran O0 level. With O3 optimization, ifort was twice as faster as gfortran.

Below are results from my script:

A> With optimization turned off, plotting Mandelbrot set for 1600 X 1600 pixel
N= 1600
Test count= 3
Running mandelbrot_ifort :test 1
Running mandelbrot_ifort :test 2
Running mandelbrot_ifort :test 3
mandelbrot_ifort :
Test count: 3
Avertage elapsed time: 14.0063262781 sec
Average user time: 12.126091 sec
Average system time: 0.0466693333333 sec
Running mandelbrot_gfort :test 1
Running mandelbrot_gfort :test 2
Running mandelbrot_gfort :test 3
mandelbrot_gfort :
Test count: 3
Avertage elapsed time: 5.63605896632 sec
Average user time: 4.75229666667 sec
Average system time: 0.029335 sec

2. With O3 optimization turned on:
N= 1600
Test count= 3
Running mandelbrot_ifort :test 1
Running mandelbrot_ifort :test 2
Running mandelbrot_ifort :test 3
mandelbrot_ifort :
Test count: 3
Avertage elapsed time: 2.64300870895 sec
Average user time: 2.19480366667 sec
Average system time: 0.0346683333333 sec
Running mandelbrot_gfort :test 1
Running mandelbrot_gfort :test 2
Running mandelbrot_gfort :test 3
mandelbrot_gfort :
Test count: 3
Avertage elapsed time: 5.04399307569 sec
Average user time: 4.305602 sec
Average system time: 0.0386686666667 sec

Which leads to my question:
Q1>> I use the same source file with both compilers. I am guessing there must be something with my compiler options which I am not doing right. My makefile looks like:
---------------------------------------------------------------------------------------
#makefile to make mandelbrot_ifort and mandelbrot_gfort
outp = mandelbrot_ifort mandelbrot_gfort

all: $(outp)

mandelbrot_ifort: mandelbrot.f90
ifort -o $@ -warn all -static -O3 $?

mandelbrot_gfort: mandelbrot.f90
gfortran -o $@ -Wall -static -O3 $?
----------------------------------------------------------------------------------------
I am not using -fopenmp and -openmp because I have a single core processor machine and OpenMP only adds some overhead which actually degrades performance. I am not very familiar with gfortran (gcc) optimization options. I didn't use the -fast option on ifort because it also turns on -ipo (interprocedure optimization) for which I didn't find and equivalent in gfortran/gcc.

mandelbrot.tar.gz

Sameer Marathe

unread,

Apr 24, 2011, 12:36:29 PM4/24/11

to gnu-f...@googlegroups.com

Forgot to add:

Intel Fortran compiler version 11.1 (non-commercial free license on Linux)
gfortran/gcc version 4.4.3

Tobias Burnus

unread,

Apr 25, 2011, 5:05:18 AM4/25/11

to gnu-f...@googlegroups.com

On 24.04.2011 18:24, Sameer Marathe wrote:
> 2. At O3 optimization level, ifort performance was significantly
> better that O0 level but gfortran O3 optimization didn't improve the
> performance much compared to gfortran O0 level. With O3 optimization,
> ifort was twice as faster as gfortran.

First: It would be quite helpful to know which system (hardware,
operating system, 32bit or 64bit mode) you used for the benchmark.

I actually wonder why you didn't see that much improvement with -O3.
Usually, -O2/-O3 is significantly faster than -O0. Especially, if you
are on a 32bit system, you could try to use "-march=native" - otherwise,
GCC/gfortran generates code which also runs on rather old systems, which
are much less capable. (Intel's equivalent is -xHost, but the compiler
defaults to more capable computers thus it matters less.)

For older GCCs, I usually use "-O3 -march=native -ffast-math
-funroll-loops" as benchmark setting.
With GCC 4.6/4.7, I additionally use -finline-limit=600 -fwhole-program
-flto
With GCC 4.7, I use on top of the 4.6 settings: -fstack-arrays.

With those settings, I can lower the geometric mean execution time of
the Polyhedron benchmark with GCC 4.7 from 10.56s (=100%) to 9.52s
(90%); using Intel's floating-point math library (libimf), the timing
reduces further to 9.06s (85%), while with ifort 11.1 I get 9.86s (93%)
[with -fast, which implies -ipo -O3 -no-prec-div -static -xHost]. Note:
The performance of the single benchmarks varies hugely.

Cf.
https://userpage.physik.fu-berlin.de/~tburnus/gcc-trunk/benchmark/iff/
(Intel Core(TM)2 Duo CPU E8400 @ 3.00GHz and using CentOS Linux 5.5
(x86-64)).

Tobias

PS: I plan to have a look at your benchmark.

PcX

unread,

Apr 24, 2011, 1:55:21 PM4/24/11

to GNU Fortran

I remembered inter fortran compiler enabled fast math, loop unrolling
(default).
May you test gfortran using "-O3 -ffast-math -funroll-loops".

> mandelbrot.tar.gz
> 2KViewDownload

PcX

unread,

Apr 24, 2011, 12:54:50 PM4/24/11

to gnu-f...@googlegroups.com

I remembered inter fortran compiler enabled fast math, loop unrolling (default).
May you test gfortran using "-O3 -ffast-math -funroll-loops".

--
You received this message because you are subscribed to the Google Groups "GNU Fortran" group.
To post to this group, send email to gnu-f...@googlegroups.com.
To unsubscribe from this group, send email to gnu-fortran...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/gnu-fortran?hl=en.

-- 
Best Regards,
PcX

Transmogrifier

unread,

Apr 25, 2011, 9:24:21 AM4/25/11

to GNU Fortran

Thanks, Tobias.

My machine specs are:
32 bit Intel Pentium(M) processor 1.7 GHz, 512 MB RAM, 30 GB HDD
Running Ubuntu 10.04LTS, Linux kernel 2.6.32-31generic

I will try some of the options you have mentioned.

> Cf.https://userpage.physik.fu-berlin.de/~tburnus/gcc-trunk/benchmark/iff/

Transmogrifier

unread,

Apr 25, 2011, 9:25:29 AM4/25/11

to GNU Fortran

Thanks PcX. I will try those options.

Transmogrifier

unread,

Apr 25, 2011, 11:26:36 PM4/25/11

to GNU Fortran

Thanks for all your suggestions. Here are my modified make commands
with the no-optimization commands commented.

mandelbrot_ifort: mandelbrot.f90
# ifort -o $@ -warn all -static -xHost -O0 $?
ifort -o $@ -warn all -fast -unroll $?

mandelbrot_gfort: mandelbrot.f90
# gfortran -o $@ -Wall -static -march=native -O0 $?
gfortran -o $@ -Wall -march=native -static -limf -ffast-math -funroll-
loops -O3 $?

I got sort of comparable results now with no-optimization and much
better results on gfortran with optimization.

Below are the results from my tester script
1> Without optimization (N = 1600, average over 5 test runs)
mandelbrot_ifort :
Avertage elapsed time: 6.30165338516 sec
Average user time: 6.1283828 sec
Average system time: 0.0360018 sec

mandelbrot_gfort :
Avertage elapsed time: 4.67135157585 sec
Average user time: 4.455478 sec
Average system time: 0.0104004 sec

2> With optimization (N=1600, average over 5 test runs)
mandelbrot_ifort :
Avertage elapsed time: 2.56649298668 sec
Average user time: 2.4657538 sec
Average system time: 0.0168006 sec

mandelbrot_gfort :
Avertage elapsed time: 0.655246210098 sec
Average user time: 0.597637 sec
Average system time: 0.0120004 sec

I also get un-resolved externals warnings when -ipo is invoked by -
fast for ifort
ipo: warning #11020: unresolved __rel_iplt_end
Referenced in libc.a(elf-init.o)
ipo: warning #11020: unresolved __rel_iplt_start
Referenced in libc.a(elf-init.o)

I am guessing these are not called by my code otherwise it wouldn't
run.

I would like to know your opinion about whether the optimization
options for both compilers are equivalent enough to make the
comparisons balanced and valid. I am going to try some other
implementations from the Computer Languages Benchmark site and also
try the Polyhedron Benchmark.

Thanks again.

Reply all

Reply to author

Forward