softfp vs. hardfp - povray benchmark

Alexander Holler

unread,

Mar 12, 2011, 9:05:49 AM3/12/11

to beagl...@googlegroups.com

Hello,

beeing curious about hardfp, I've used the povray benchmark to get some
numbers. I've used povray 2.6.1, see
http://www.povray.org/download/benchmark.php for an explanation. I think
that will give an impression about how much applications with heavy
floating point usage might gain from hardfp.

I've run those tests using a BeagleBoard C4 ((w/o XM, 720 MHz) using the
same (vanilla) kernel 2.6.37.3, both systems where on the same usb-hd,
using different (ext4-)partitions with the same size.

The whole softfp-system was compiled using

CFLAGS="-Os -pipe -mtune=cortex-a8 -mcpu=cortex-a8 -mfpu=neon
-mfloat-abi=softfp -fomit-frame-pointer"
CXXFLAGS="${CFLAGS} -std=gnu++0x -fvisibility-inlines-hidden"
CFLAGS="${CFLAGS} -std=gnu99"
LDFLAGS="-Wl,-O1 -Wl,--enable-new-dtags -Wl,--sort-common -Wl,--as-needed"

and the hardfp-system was compiled using

CFLAGS="-Os -pipe -mtune=cortex-a8 -mcpu=cortex-a8 -mfpu=neon
-mfloat-abi=hard -fomit-frame-pointer"
CXXFLAGS="${CFLAGS} -std=gnu++0x -fvisibility-inlines-hidden"
CFLAGS="${CFLAGS} -std=gnu99"
LDFLAGS="-Wl,-O1 -Wl,--enable-new-dtags -Wl,--sort-common -Wl,--as-needed"

All package-versions where the same and the same patches (if any) where
used. The gcc version was 4.5.2, binutils was 2.21 and glibc was 2.11.2.

Here are the times for "time povray benchmark.ini":

softfp:
Total Time: 10 hours 39 minutes 23 seconds (38363 seconds)
real 639m23.292s
user 639m17.914s
sys 0m0.430s

hardfp:
Total Time: 10 hours 3 minutes 25 seconds (36205 seconds)
real 603m24.803s
user 603m21.188s
sys 0m0.422s

Beeing curious about the compiler optimisations I've done the same
benchmark on the same systems just using -O3 instead of -Os to compile
povray:

softfp:
Total Time: 9 hours 49 minutes 29 seconds (35369 seconds)
real 589m29.634s
user 589m24.016s
sys 0m0.422s

hardfp:
Total Time: 9 hours 22 minutes 13 seconds (33733 seconds)
real 562m12.603s
user 562m9.320s
sys 0m0.469s

So it looks like using hardfp instead of softp might gain about 5-6 %
for applications which are heavily using floating points.

I don't want to interpret if -Os, -O2 or -O3 might be better for your
use case, those optimizations could have heavy implications, escpecially
in regard to floating point and using the fastest optimizations won't
fit allways.

Regards,

Alexander Holler

PS: Before someone asks why I'm using -std=gnu++0x, I'm using it because
c++0x offers some new nice to have features, especially in regard to
"perfect forwarding", and I think almost all c++-programs might benefit
from that, if those new features are used e.g. by the STL. I haven't
checked if those new features are already used somewhere in the standard
libraries (or templates), but ...
Be aware, using -std=gnu++0x actually breaks compilation of some few
c++-programs.

Måns Rullgård

unread,

Mar 12, 2011, 9:20:41 AM3/12/11

to beagl...@googlegroups.com

Alexander Holler <hol...@ahsoftware.de> writes:

> Hello,
>
> beeing curious about hardfp, I've used the povray benchmark to get
> some numbers. I've used povray 2.6.1, see
> http://www.povray.org/download/benchmark.php for an explanation. I
> think that will give an impression about how much applications with
> heavy floating point usage might gain from hardfp.
>

For applications passing floats to and from functions a lot, yes.
Povray appears to be one of these.

> I don't want to interpret if -Os, -O2 or -O3 might be better for your
> use case, those optimizations could have heavy implications,
> escpecially in regard to floating point and using the fastest
> optimizations won't fit allways.

If you want fast, get a Panda:

Total Time: 1 hours 36 minutes 38 seconds (5798 seconds)

For reference, on an Intel Core i7 940 (2.93GHz):

Total Time: 0 hours 19 minutes 8 seconds (1148 seconds)

--
Måns Rullgård
ma...@mansr.com

Alexander Holler

unread,

Mar 12, 2011, 9:43:59 AM3/12/11

to beagl...@googlegroups.com

Hello,

Am 12.03.2011 15:20, schrieb Måns Rullgård:
> If you want fast, get a Panda:

Besides that I only was interested in some numbers for hardfp vs. softp,
using a Panda would transport the benchmark back to 1970. ;)

Regards,

Alexander Holler

Laurent GONZALEZ

unread,

Mar 12, 2011, 9:45:29 AM3/12/11

to beagl...@googlegroups.com

It's hard to believe that beagle is more than 4 time slower than Panda.
More over the difference between hardfp and softfp, is so small that I
will bet that the beagle does not use hardfp at all. But I might be
misleading there ...

Mans, do you confirm the beagle figures ? Just out of curiosity, do you
have numbers for an Atom based system ?

Måns Rullgård

unread,

Mar 12, 2011, 10:00:55 AM3/12/11

to beagl...@googlegroups.com

Laurent GONZALEZ <macma...@gmail.com> writes:

Yes, I got similar figures on a Beagle C3. The huge difference is due
to the non-pipelined VFP in Cortex-A8. The hard vs softfp difference
should also be much smaller on A9.

The Nvidia Tegra2 with VFP3-D16 gets this result, also at 1GHz:

Total Time: 1 hours 43 minutes 48 seconds (6228 seconds)

It is a little slower than the Panda but not much.

> Just out of curiosity, do you have numbers for an Atom based system ?

I do not.

--
Måns Rullgård
ma...@mansr.com

Reply all

Reply to author

Forward