Hi Russel.
Wow. Not as simple a question as I had thought. There are three toolchains involved here; the two Xilinx-sourced cross-chains, and the Ubuntu-sourced toolchain on snickerdoodle itself. All have some options that make sense, others that don't, and some that are simply pointless in this context.
The Ubuntu toolchain is unavoidable, as it would have been used to build everything in the distro. The good news is that it is configured to use hardware FP. The bad news is that it's the stripped-down, 16-register VFPv3-d16 version.
Zynq's ARMv7 has the full 32-register VFPv3 FPU, no? It's also not tuned for the Cortex-A9 processor. I have no idea what practical effect either of those has on real performance, but it's not the preferred configuration. When I have the time, I'll look into Yocto (or just
follow this blog).
I compared the three compilers' -v outputs. arm-linux-gnueabihf (both Xilinx's and Ubuntu's) is from Linaro, arm-xilinx-linux-gnueabi is from Mentor Graphics' Sourcery Codebench. There's much disagreement between the options. I've attached the full outputs, and excerpted the CPU/FPU bits below.
The Xilinx/Sourcery version uses
gcc version 4.9.2 (Sourcery CodeBench Lite 2015.05-17)
--with-arch=armv5te
--with-arch=armv7-a
--with-cpu=cortex-a9
--with-float=softfp
--with-fpu=neon-fp16
while the Xilinx/Linaro version uses
gcc version 4.9.2 20140904 (prerelease) (crosstool-NG linaro-1.13.1-4.9-2014.09 - Linaro GCC 4.9-2014.09)
--enable-multiarch
--with-arch=armv7-a
--with-float=hard
--with-fpu=vfpv3-d16
--with-tune=cortex-a9
and the Ubuntu/Linaro version is different from both of those:
gcc version 4.8.2 (Ubuntu/Linaro 4.8.2-19ubuntu1)
--enable-multiarch
--with-arch=armv7-a
--with-float=hard
--with-fpu=vfpv3-d16
Ideally, it seems, snickerdoodle code would use
--with-cpu=cortex-a9
--with-float=hard
--with-fpu=vfpv3 if you need double-precision math
or
--with-fpu=neon if you don't.
Hmm. In fact, the SDK builds the snickerdoodle BSP using -mcpu=cortex-a9 -mfpu=vfpv3 -mfloat-abi=hard.
Okay, bottom line, it probably doesn't matter which toolchain, as long as you add the CPU/FPU compiler switches you want.
Whew.
-Nick