43 views

Skip to first unread message

Jun 1, 2009, 7:38:24 AM6/1/09

to

Dear All,

In celebration of the arrival of the new GNAT GPL

(20090519) I decided to do some benchmarking

to see how the optimizer's coming along.

I was slightly more than rather pleased with

the results.

Results in gory detail are appended below.

I compared 6 (smallish) programs, written in

both Ada and Fortran. Four of them are in C

also. All of the routines can be found at:

http://web.am.qub.ac.uk/users/j.parker/bench_depository/

4 compilers are used:

gcc 4.3.4,

Intel Fortran ifort 11.0 (latest version),

GNAT GPL (20090519), and

gfortran (based on gcc version 4.3.2).

Operating system: Debian Linux, Lenny.

Processor: Intel x86-64 (Xeon X5460 @ 3.16GHz).

The Intel Fortran (ifort) is an aggressive

optimizing compiler, especially on numerical

linear algebra. Its use here is reassuring:

if our gcc-family results on numerical

calculations were suboptimal by a large factor,

ifort would likely let us know. It also

has the easiest optimization flags: its a

simple choice between -O3, -fast, -ipo

and a few other things that make almost no

difference.

Two of the 6 programs I wrote

myself: an FFT benchmark called fft1tst.adb,

and a jacobi eigendecomposition called

jacobi_eigen_bench_1.adb. Both are

accompanied by near identical fortran

versions. This exhausted my entire supply of

inter-language benchmarking routines, so 4

of the test programs I downloaded from a

depository of small benchmarking routines:

http://shootout.alioth.debian.org/gp4/

I downloaded C, Fortran, Ada versions of:

nsievebits, nbody, binarytree, mandelbrot.

I made small modifications to 2 of the Ada

programs. In one case I degraded the Ada code

to a slower, older version so that it was

identical to the C version I was comparing

with. In the other case I replaced a packed

boolean array with an array of unsigned

ints. These were the only changes to any of

the programs, so the exercise mostly amounted

to finding good optimization flags. I tried

to find good optimization flags for the C and

Fortran compilations too, and managed to speed

up a few of them after several attempts. The

original Ada test programs were written by

Pat Rogers and Pascal Obry. (Thanks!)

Inter-language benchmarks shouldn't be taken too

seriously, but I learned a few useful things:

the -mfpmath=387 and -mfpmath=387,sse flags came

as real surprize to me. In several cases they

made all the difference. I also noticed that

GNAT seems to be be doing a better job of

optimizing operations on packed boolean arrays,

and a better job on some linear algebra

problems. Unless I'm mistaken, in the lin alg case

(jacobi_eigen below) the improvement over the

old days is almost a factor of 2. Thanks_gnat!

cheers,

jonathan

FFT1TST2:

Compilation Commands:

gnatmake fft1tst2.adb -O3 -gnatNp -march=native

gfortran fft1tst2.f -O3 -march=native -o fft1tst2

ifort fft1tst2.f -O3 -WB -xT -o fft1tst2

Execute:

time ./fft1tst2

Running Time (using 8192 data points):

gnat: 2.748 seconds

gfortran: 2.812 seconds

ifort: 2.816 seconds

Running Time (using 4096 data points):

gnat: 1.000 seconds

gfortran: 1.004 seconds

ifort: 1.088 seconds

Running Time (using 1024 data points):

gnat: 0.168 seconds

gfortran: 0.184 seconds

ifort: 0.176 seconds

Notes:

The Ada and the Fortran77 FFT's were written over 17

years ago. Both are Radix 4 fast fourier transforms.

These versions were written for benchmarking rather

than ultimate speed: the idea was to make the 2

the same wherever possible, for language/compiler

comparisons.

IF I use -ffast-math in the GNAT compilation it runs

exactly the same speed as the gfortran (very slightly

slower), so I suspect -ffast-math is always used by

gfortran.

JACOBI_EIGEN

Compilation Commands:

gnatmake jacobi_eigen_bench_1.adb -O3 -gnatNp -march=native -

ffast-math -funroll-loops -o eig

gfortran jacobi_eigen.f90 -O3 -march=native -ffast-math -funroll-

loops -o eig

ifort jacobi_eigen.f90 -O3 -ipo -static -o eig

Execute:

time ./eig

Running Time (100x100 matrices (100 iterations)):

gnat: 1.636 seconds

gfortran: 1.720 seconds

ifort: 1.440 seconds

Running Time (1000x1000 matrices (1 iteration)):

gnat: 23.7 seconds

gfortran: 39.7 seconds

ifort: 37.9 seconds

Notes:

Matrix size and no of iterations has to be

typed in at the top of the 2 routines:

jacobi_eigen.f90, jacobi_eigen_bench_1.adb.

A year ago the GNAT executables for Jacobi

were much slower than gfortran, and ifort.

Don't know what happened in the 1000x1000

case, but I am not displeased.

Notice we are using the same compiler

flags in the gfortran and GNAT cases.

The number of arithmetical operations performed

by these routines is exactly proportional to

No_of_Rotations,

which is output on completion. The difference

between the Fortran No_of_Rotations and the Ada

No_of_Rotations is under 3% here.

NBODY:

Compilation Commands:

gnatmake nbody.adb -O3 -gnatNp -march=native -ffast-math -funroll-

loops

-ftracer -freorder-blocks-and-partition -

mfpmath=387,sse

gfortran nbody.f90 -O3 -march=native -funroll-all-loops -o nbody

ifort nbody.f90 -O3 -no-prec-div -o nbody

gcc nbody.c -O3 -o nbody

Execute:

time ./nbody 24000000

Running Time:

gnat: 4.908 seconds

gfortran: 5.602 seconds

ifort: 4.660 seconds

gcc: 4.472 seconds

Notes:

The obscure compilation flags

(-ftracer -freorder-blocks-and-partition)

are not needed if you write the inner loop of

nbody_pck.Advance a bit differently. I just

wanted to use the original version of nbody.adb.

nbody2.adb is more like the C version and has

simpler optimization flags, but runs at same

speed as nbody.adb.

The gcc C is about 9% faster than nbody.adb -

a small but interesting difference I don't

understand.

NSIEVEBITS:

Compilation Commands:

gnatmake nsievebits2.adb -O3 -gnatnp -march=native -funroll-loops -

ftracer

gfortran nsievebits.f90 -O3 -march=native -funroll-loops -o

nsievebits2

ifort nsievebits.f90 -O3 -ipo -static -o nsievebits2

gcc nsievebits.c -O3 -march=native -funroll-loops -o

nsievebits2

Execute:

time ./nsievebits2 11

Running Time:

gnat: 0.320 seconds

gfortran: 0.388 seconds

ifort: 0.372 seconds

gcc: 0.364 seconds

Notes:

nsievebits2.adb uses an array of unsigned

ints to replace the packed boolean array in

nsievebits.adb. Both methods could not be

more legitimate in this exercize. GNAT has

improved remarkably: the packed boolean array

version (nsievebits.adb) is now competitive

with the other languages.

MANDELBROT:

Compilation Commands:

gnatmake mandelbrot.adb -O3 -gnatnp -march=native -ffast-math

-funroll-loops -mfpmath=387

gfortran mandelbrot.f90 -O3 -march=native -funroll-loops -o

mandelbrot

ifort mandelbrot.f90 -O3 -ipo -static -o mandelbrot

gcc mandelbrot.c -O3 -march=native -ffast-math -funroll-

loops

-mfpmath=387 -o mandelbrot

Execute:

time ./mandelbrot 3000

Running Time:

print-to-screen disabled:

(present configuration.)

(these are meaningful timings.)

gnat: 0.980 seconds

gfortran: 1.112 seconds

ifort: 1.180 seconds

gcc 0.960 seconds

print-to-screen enabled

(timings don't mean much here):

gnat: 1.012 seconds

gfortran: 1.582 seconds

ifort: 1.376 seconds

gcc 1.000 seconds

Notes:

print-to-screen was enabled only to verify that

all 3 gave the same output.

The Fortran uses the original complex number

implementation of the mandelbrot inner loop.

I modified the Ada version was to use exactly

the same inner loop as the C version (even tho

the modification slowed down the Ada version).

In both the Ada and the C versions it was the

-mfpmath=387 flag (which I assume disables sse)

that did the trick speeding them up.

BINARYTREES:

Compilation Commands:

gnatmake binarytrees.adb -O3 -gnatnp -march=native -ftracer

gfortran binarytrees.f90 -O3 -march=native -o binarytrees

ifort binarytrees.f90 -fast -static -o binarytrees

gcc binarytrees.c -O3 -march=native -lm -o binarytrees

Execute:

time ./binarytrees 16

Running Time (fastest observed):

gnat: 1.232 seconds

gfortran: 1.084 seconds

ifort: 1.676 seconds

gcc 1.060 seconds

Notes:

Insensitive to optimization flags.

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu