benchmarking GPL

43 views
Skip to first unread message

john...@googlemail.com

unread,
Jun 1, 2009, 7:38:24 AM6/1/09
to
Dear All,

In celebration of the arrival of the new GNAT GPL
(20090519) I decided to do some benchmarking
to see how the optimizer's coming along.
I was slightly more than rather pleased with
the results.

Results in gory detail are appended below.

I compared 6 (smallish) programs, written in
both Ada and Fortran. Four of them are in C
also. All of the routines can be found at:

http://web.am.qub.ac.uk/users/j.parker/bench_depository/

4 compilers are used:

gcc 4.3.4,
Intel Fortran ifort 11.0 (latest version),
GNAT GPL (20090519), and
gfortran (based on gcc version 4.3.2).

Operating system: Debian Linux, Lenny.
Processor: Intel x86-64 (Xeon X5460 @ 3.16GHz).

The Intel Fortran (ifort) is an aggressive
optimizing compiler, especially on numerical
linear algebra. Its use here is reassuring:
if our gcc-family results on numerical
calculations were suboptimal by a large factor,
ifort would likely let us know. It also
has the easiest optimization flags: its a
simple choice between -O3, -fast, -ipo
and a few other things that make almost no
difference.

Two of the 6 programs I wrote
myself: an FFT benchmark called fft1tst.adb,
and a jacobi eigendecomposition called
jacobi_eigen_bench_1.adb. Both are
accompanied by near identical fortran
versions. This exhausted my entire supply of
inter-language benchmarking routines, so 4
of the test programs I downloaded from a
depository of small benchmarking routines:
http://shootout.alioth.debian.org/gp4/
I downloaded C, Fortran, Ada versions of:
nsievebits, nbody, binarytree, mandelbrot.
I made small modifications to 2 of the Ada
programs. In one case I degraded the Ada code
to a slower, older version so that it was
identical to the C version I was comparing
with. In the other case I replaced a packed
boolean array with an array of unsigned
ints. These were the only changes to any of
the programs, so the exercise mostly amounted
to finding good optimization flags. I tried
to find good optimization flags for the C and
Fortran compilations too, and managed to speed
up a few of them after several attempts. The
original Ada test programs were written by
Pat Rogers and Pascal Obry. (Thanks!)

Inter-language benchmarks shouldn't be taken too
seriously, but I learned a few useful things:
the -mfpmath=387 and -mfpmath=387,sse flags came
as real surprize to me. In several cases they
made all the difference. I also noticed that
GNAT seems to be be doing a better job of
optimizing operations on packed boolean arrays,
and a better job on some linear algebra
problems. Unless I'm mistaken, in the lin alg case
(jacobi_eigen below) the improvement over the
old days is almost a factor of 2. Thanks_gnat!

cheers,
jonathan


FFT1TST2:

Compilation Commands:

gnatmake fft1tst2.adb -O3 -gnatNp -march=native
gfortran fft1tst2.f -O3 -march=native -o fft1tst2
ifort fft1tst2.f -O3 -WB -xT -o fft1tst2

Execute:

time ./fft1tst2

Running Time (using 8192 data points):

gnat: 2.748 seconds
gfortran: 2.812 seconds
ifort: 2.816 seconds

Running Time (using 4096 data points):

gnat: 1.000 seconds
gfortran: 1.004 seconds
ifort: 1.088 seconds

Running Time (using 1024 data points):

gnat: 0.168 seconds
gfortran: 0.184 seconds
ifort: 0.176 seconds

Notes:

The Ada and the Fortran77 FFT's were written over 17
years ago. Both are Radix 4 fast fourier transforms.
These versions were written for benchmarking rather
than ultimate speed: the idea was to make the 2
the same wherever possible, for language/compiler
comparisons.

IF I use -ffast-math in the GNAT compilation it runs
exactly the same speed as the gfortran (very slightly
slower), so I suspect -ffast-math is always used by
gfortran.


JACOBI_EIGEN

Compilation Commands:

gnatmake jacobi_eigen_bench_1.adb -O3 -gnatNp -march=native -
ffast-math -funroll-loops -o eig
gfortran jacobi_eigen.f90 -O3 -march=native -ffast-math -funroll-
loops -o eig
ifort jacobi_eigen.f90 -O3 -ipo -static -o eig

Execute:

time ./eig

Running Time (100x100 matrices (100 iterations)):

gnat: 1.636 seconds
gfortran: 1.720 seconds
ifort: 1.440 seconds

Running Time (1000x1000 matrices (1 iteration)):

gnat: 23.7 seconds
gfortran: 39.7 seconds
ifort: 37.9 seconds

Notes:

Matrix size and no of iterations has to be
typed in at the top of the 2 routines:
jacobi_eigen.f90, jacobi_eigen_bench_1.adb.

A year ago the GNAT executables for Jacobi
were much slower than gfortran, and ifort.

Don't know what happened in the 1000x1000
case, but I am not displeased.
Notice we are using the same compiler
flags in the gfortran and GNAT cases.

The number of arithmetical operations performed
by these routines is exactly proportional to
No_of_Rotations,
which is output on completion. The difference
between the Fortran No_of_Rotations and the Ada
No_of_Rotations is under 3% here.


NBODY:

Compilation Commands:

gnatmake nbody.adb -O3 -gnatNp -march=native -ffast-math -funroll-
loops
-ftracer -freorder-blocks-and-partition -
mfpmath=387,sse
gfortran nbody.f90 -O3 -march=native -funroll-all-loops -o nbody
ifort nbody.f90 -O3 -no-prec-div -o nbody
gcc nbody.c -O3 -o nbody

Execute:

time ./nbody 24000000

Running Time:

gnat: 4.908 seconds
gfortran: 5.602 seconds
ifort: 4.660 seconds
gcc: 4.472 seconds

Notes:

The obscure compilation flags
(-ftracer -freorder-blocks-and-partition)
are not needed if you write the inner loop of
nbody_pck.Advance a bit differently. I just
wanted to use the original version of nbody.adb.
nbody2.adb is more like the C version and has
simpler optimization flags, but runs at same
speed as nbody.adb.
The gcc C is about 9% faster than nbody.adb -
a small but interesting difference I don't
understand.


NSIEVEBITS:

Compilation Commands:

gnatmake nsievebits2.adb -O3 -gnatnp -march=native -funroll-loops -
ftracer
gfortran nsievebits.f90 -O3 -march=native -funroll-loops -o
nsievebits2
ifort nsievebits.f90 -O3 -ipo -static -o nsievebits2
gcc nsievebits.c -O3 -march=native -funroll-loops -o
nsievebits2

Execute:

time ./nsievebits2 11

Running Time:

gnat: 0.320 seconds
gfortran: 0.388 seconds
ifort: 0.372 seconds
gcc: 0.364 seconds

Notes:

nsievebits2.adb uses an array of unsigned
ints to replace the packed boolean array in
nsievebits.adb. Both methods could not be
more legitimate in this exercize. GNAT has
improved remarkably: the packed boolean array
version (nsievebits.adb) is now competitive
with the other languages.


MANDELBROT:

Compilation Commands:

gnatmake mandelbrot.adb -O3 -gnatnp -march=native -ffast-math
-funroll-loops -mfpmath=387
gfortran mandelbrot.f90 -O3 -march=native -funroll-loops -o
mandelbrot
ifort mandelbrot.f90 -O3 -ipo -static -o mandelbrot
gcc mandelbrot.c -O3 -march=native -ffast-math -funroll-
loops
-mfpmath=387 -o mandelbrot

Execute:

time ./mandelbrot 3000

Running Time:

print-to-screen disabled:
(present configuration.)
(these are meaningful timings.)

gnat: 0.980 seconds
gfortran: 1.112 seconds
ifort: 1.180 seconds
gcc 0.960 seconds

print-to-screen enabled
(timings don't mean much here):

gnat: 1.012 seconds
gfortran: 1.582 seconds
ifort: 1.376 seconds
gcc 1.000 seconds

Notes:

print-to-screen was enabled only to verify that
all 3 gave the same output.

The Fortran uses the original complex number
implementation of the mandelbrot inner loop.
I modified the Ada version was to use exactly
the same inner loop as the C version (even tho
the modification slowed down the Ada version).
In both the Ada and the C versions it was the
-mfpmath=387 flag (which I assume disables sse)
that did the trick speeding them up.


BINARYTREES:

Compilation Commands:

gnatmake binarytrees.adb -O3 -gnatnp -march=native -ftracer
gfortran binarytrees.f90 -O3 -march=native -o binarytrees
ifort binarytrees.f90 -fast -static -o binarytrees
gcc binarytrees.c -O3 -march=native -lm -o binarytrees

Execute:

time ./binarytrees 16

Running Time (fastest observed):

gnat: 1.232 seconds
gfortran: 1.084 seconds
ifort: 1.676 seconds
gcc 1.060 seconds

Notes:

Insensitive to optimization flags.

Reply all
Reply to author
Forward
0 new messages