benchmarking GPL

Skip to first unread message

Jun 1, 2009, 7:38:24 AM6/1/09
Dear All,

In celebration of the arrival of the new GNAT GPL
(20090519) I decided to do some benchmarking
to see how the optimizer's coming along.
I was slightly more than rather pleased with
the results.

Results in gory detail are appended below.

I compared 6 (smallish) programs, written in
both Ada and Fortran. Four of them are in C
also. All of the routines can be found at:

4 compilers are used:

gcc 4.3.4,
Intel Fortran ifort 11.0 (latest version),
GNAT GPL (20090519), and
gfortran (based on gcc version 4.3.2).

Operating system: Debian Linux, Lenny.
Processor: Intel x86-64 (Xeon X5460 @ 3.16GHz).

The Intel Fortran (ifort) is an aggressive
optimizing compiler, especially on numerical
linear algebra. Its use here is reassuring:
if our gcc-family results on numerical
calculations were suboptimal by a large factor,
ifort would likely let us know. It also
has the easiest optimization flags: its a
simple choice between -O3, -fast, -ipo
and a few other things that make almost no

Two of the 6 programs I wrote
myself: an FFT benchmark called fft1tst.adb,
and a jacobi eigendecomposition called
jacobi_eigen_bench_1.adb. Both are
accompanied by near identical fortran
versions. This exhausted my entire supply of
inter-language benchmarking routines, so 4
of the test programs I downloaded from a
depository of small benchmarking routines:
I downloaded C, Fortran, Ada versions of:
nsievebits, nbody, binarytree, mandelbrot.
I made small modifications to 2 of the Ada
programs. In one case I degraded the Ada code
to a slower, older version so that it was
identical to the C version I was comparing
with. In the other case I replaced a packed
boolean array with an array of unsigned
ints. These were the only changes to any of
the programs, so the exercise mostly amounted
to finding good optimization flags. I tried
to find good optimization flags for the C and
Fortran compilations too, and managed to speed
up a few of them after several attempts. The
original Ada test programs were written by
Pat Rogers and Pascal Obry. (Thanks!)

Inter-language benchmarks shouldn't be taken too
seriously, but I learned a few useful things:
the -mfpmath=387 and -mfpmath=387,sse flags came
as real surprize to me. In several cases they
made all the difference. I also noticed that
GNAT seems to be be doing a better job of
optimizing operations on packed boolean arrays,
and a better job on some linear algebra
problems. Unless I'm mistaken, in the lin alg case
(jacobi_eigen below) the improvement over the
old days is almost a factor of 2. Thanks_gnat!



Compilation Commands:

gnatmake fft1tst2.adb -O3 -gnatNp -march=native
gfortran fft1tst2.f -O3 -march=native -o fft1tst2
ifort fft1tst2.f -O3 -WB -xT -o fft1tst2


time ./fft1tst2

Running Time (using 8192 data points):

gnat: 2.748 seconds
gfortran: 2.812 seconds
ifort: 2.816 seconds

Running Time (using 4096 data points):

gnat: 1.000 seconds
gfortran: 1.004 seconds
ifort: 1.088 seconds

Running Time (using 1024 data points):

gnat: 0.168 seconds
gfortran: 0.184 seconds
ifort: 0.176 seconds


The Ada and the Fortran77 FFT's were written over 17
years ago. Both are Radix 4 fast fourier transforms.
These versions were written for benchmarking rather
than ultimate speed: the idea was to make the 2
the same wherever possible, for language/compiler

IF I use -ffast-math in the GNAT compilation it runs
exactly the same speed as the gfortran (very slightly
slower), so I suspect -ffast-math is always used by


Compilation Commands:

gnatmake jacobi_eigen_bench_1.adb -O3 -gnatNp -march=native -
ffast-math -funroll-loops -o eig
gfortran jacobi_eigen.f90 -O3 -march=native -ffast-math -funroll-
loops -o eig
ifort jacobi_eigen.f90 -O3 -ipo -static -o eig


time ./eig

Running Time (100x100 matrices (100 iterations)):

gnat: 1.636 seconds
gfortran: 1.720 seconds
ifort: 1.440 seconds

Running Time (1000x1000 matrices (1 iteration)):

gnat: 23.7 seconds
gfortran: 39.7 seconds
ifort: 37.9 seconds


Matrix size and no of iterations has to be
typed in at the top of the 2 routines:
jacobi_eigen.f90, jacobi_eigen_bench_1.adb.

A year ago the GNAT executables for Jacobi
were much slower than gfortran, and ifort.

Don't know what happened in the 1000x1000
case, but I am not displeased.
Notice we are using the same compiler
flags in the gfortran and GNAT cases.

The number of arithmetical operations performed
by these routines is exactly proportional to
which is output on completion. The difference
between the Fortran No_of_Rotations and the Ada
No_of_Rotations is under 3% here.


Compilation Commands:

gnatmake nbody.adb -O3 -gnatNp -march=native -ffast-math -funroll-
-ftracer -freorder-blocks-and-partition -
gfortran nbody.f90 -O3 -march=native -funroll-all-loops -o nbody
ifort nbody.f90 -O3 -no-prec-div -o nbody
gcc nbody.c -O3 -o nbody


time ./nbody 24000000

Running Time:

gnat: 4.908 seconds
gfortran: 5.602 seconds
ifort: 4.660 seconds
gcc: 4.472 seconds


The obscure compilation flags
(-ftracer -freorder-blocks-and-partition)
are not needed if you write the inner loop of
nbody_pck.Advance a bit differently. I just
wanted to use the original version of nbody.adb.
nbody2.adb is more like the C version and has
simpler optimization flags, but runs at same
speed as nbody.adb.
The gcc C is about 9% faster than nbody.adb -
a small but interesting difference I don't


Compilation Commands:

gnatmake nsievebits2.adb -O3 -gnatnp -march=native -funroll-loops -
gfortran nsievebits.f90 -O3 -march=native -funroll-loops -o
ifort nsievebits.f90 -O3 -ipo -static -o nsievebits2
gcc nsievebits.c -O3 -march=native -funroll-loops -o


time ./nsievebits2 11

Running Time:

gnat: 0.320 seconds
gfortran: 0.388 seconds
ifort: 0.372 seconds
gcc: 0.364 seconds


nsievebits2.adb uses an array of unsigned
ints to replace the packed boolean array in
nsievebits.adb. Both methods could not be
more legitimate in this exercize. GNAT has
improved remarkably: the packed boolean array
version (nsievebits.adb) is now competitive
with the other languages.


Compilation Commands:

gnatmake mandelbrot.adb -O3 -gnatnp -march=native -ffast-math
-funroll-loops -mfpmath=387
gfortran mandelbrot.f90 -O3 -march=native -funroll-loops -o
ifort mandelbrot.f90 -O3 -ipo -static -o mandelbrot
gcc mandelbrot.c -O3 -march=native -ffast-math -funroll-
-mfpmath=387 -o mandelbrot


time ./mandelbrot 3000

Running Time:

print-to-screen disabled:
(present configuration.)
(these are meaningful timings.)

gnat: 0.980 seconds
gfortran: 1.112 seconds
ifort: 1.180 seconds
gcc 0.960 seconds

print-to-screen enabled
(timings don't mean much here):

gnat: 1.012 seconds
gfortran: 1.582 seconds
ifort: 1.376 seconds
gcc 1.000 seconds


print-to-screen was enabled only to verify that
all 3 gave the same output.

The Fortran uses the original complex number
implementation of the mandelbrot inner loop.
I modified the Ada version was to use exactly
the same inner loop as the C version (even tho
the modification slowed down the Ada version).
In both the Ada and the C versions it was the
-mfpmath=387 flag (which I assume disables sse)
that did the trick speeding them up.


Compilation Commands:

gnatmake binarytrees.adb -O3 -gnatnp -march=native -ftracer
gfortran binarytrees.f90 -O3 -march=native -o binarytrees
ifort binarytrees.f90 -fast -static -o binarytrees
gcc binarytrees.c -O3 -march=native -lm -o binarytrees


time ./binarytrees 16

Running Time (fastest observed):

gnat: 1.232 seconds
gfortran: 1.084 seconds
ifort: 1.676 seconds
gcc 1.060 seconds


Insensitive to optimization flags.

Reply all
Reply to author
0 new messages