Just following up:
Thanks to folks for the feedback and suggestions. I tried the suggested options and I even "hacked" the nbench benchmarks (which use all doubles in their C code) to internally use all floats only everywhere. It might be of some interest that there was essentially no effect in doing these things. FP performance still lagged an old x86 clone at 1/2 the clock speed.
The floating point performance is important for many of the applications in 3D graphics and robotics for which I had been considering the OMAP 3. I often have to write code that handles LU decompositions, 3D transformations, etc. in real-time. So, the fact that the processor is so slow (relative to it's integer performance) seems odd. I'm grateful that the Beagleboard is helping me evaluate it thoroughly.
Any other ideas? Is there a compiler branch somewhere that will let this new "SIMD 128bit pipelined FP unit" that is in there somewhere beat out an AMD K6/233 from 12 years ago? It would seem with such a touted (reading ARMs website) hardware FP unit, that the gap between FP performance and INT performance would not be so large.
So, I'm still a bit puzzled unless compiler support is so immature for Neon that we're not seeing anything like the real performance.
-Sincerely,
Todd Pack
-------------- Original message ----------------------
From: Måns Rullgård <ma...@mansr.com>
>
> rtp...@comcast.net writes:
>
> > Hello Folks,
> > I built nbench for my beagleboard and compiled with flags that one
> > would be led to believe would enable floating point operation:
> >
> > -mcpu=cortex-a8 -mfloat-abi=softpf -mfpu=neon
>
> Try adding -ffast-math -fno-math-errno
>
> On the Cortex-A8, double-precision floating-point maths is not
> pipelined, and neither is single-precision if full IEEE compliance is
> required. The flags above should let the compiler generate
> floating-point code that can execute in the pipelined NEON unit for
> single-precision maths.
>
>
> Honestly, how often does anyone run code even resembling those
> benchmarks?
>
>
> That baseline is hardly relevant these days.
> Hello Folks,
>
> Just following up:
>
> Thanks to folks for the feedback and suggestions. I tried the
> suggested options and I even "hacked" the nbench benchmarks (which use
> all doubles in their C code) to internally use all floats only
> everywhere. It might be of some interest that there was essentially no
> effect in doing these things. FP performance still lagged an old x86
> clone at 1/2 the clock speed.
>
> The floating point performance is important for many of the
> applications in 3D graphics and robotics for which I had been
> considering the OMAP 3. I often have to write code that handles LU
> decompositions, 3D transformations, etc. in real-time. So, the fact
> that the processor is so slow (relative to it's integer performance)
> seems odd. I'm grateful that the Beagleboard is helping me evaluate it
> thoroughly.
>
> Any other ideas? Is there a compiler branch somewhere that will let
> this new "SIMD 128bit pipelined FP unit" that is in there somewhere
> beat out an AMD K6/233 from 12 years ago? It would seem with such a
> touted (reading ARMs website) hardware FP unit, that the gap between
> FP performance and INT performance would not be so large.
You have to make sure that what ARM calls runfast mode is enabled for
normal FP instructions to execute in the NEON pipeline. This includes
disabling FP exceptions and selecting the proper rounding mode. The
details should be in the manual.
> So, I'm still a bit puzzled unless compiler support is so immature for
> Neon that we're not seeing anything like the real performance.
Compilers are certainly not very good at using the vector operations
the NEON unit is capable of.
--
Måns Rullgård
ma...@mansr.com
This is described here:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0133c/index.html
Note I did not test it.
This was discussed on IRC yesterday:
http://www.beagleboard.org/irclogs/index.php?date=2008-10-28#T17:54:14
Laurent