I recently installed AMD64 6.2 Release on 2 PowerEdge servers, both with
dual core Xeon (3070 and 5110). I noticed when I was updating the
sources that it was compiling as an Athlonxp by default. I was wondering
if I should change the CPUTYPE in make.conf to something else. I read at
some places that it is not recommended because it could cause problems
but I thought it would be interesting to start the debate here. Please
note that I would prefer not to go with the -STABLE or -CURRENT branch
because these a going to be essential productions servers.
Thank you for your opinions,
Martin
-Garrett
Since I use FreeBSD 7.0 AMD64 which is stable now and it's frozen for
making RELENG_7 I didn't set the CPUTYPE in the make.conf , so I
wonder how come you got athlon-xp cpu arch and I didn't see GCC 4.2
doing it in my server.
--
Regards,
-Abdullah Ibn Hamad Al-Marri
Arab Portal
http://www.WeArab.Net/
I extensively benchmarked different compiler options on Xeon 5160 (3.0
GHz Core2) with gcc-4.1.2 and gcc-4.2.
Apart from very minor differences the best was plain "-O3
-finline-limit=xxx" where xxx was different by code, some code ran
faster with 400 and other code with 750 (both beating the 600
default). The inline limit made a bigger difference than most of the
other options and I actually ended up compiling parts of my code with
a differen inline-limit than others.
The result was within a percent of all highly tuned CPU-specific
options like -march=k8 -msse3 -mfpmath=sse -ffast-math, and I went
through most iterations. This means that locking your code to one
x86_64 implementation and locking out either AMD or Intel is not worth
the trouble.
Testing was done on gcc-4.2.1 and later partially verified with
gcc-4.2. Gcc-4.2 was a little slower overall but the same options
were about the same speed.
I also tested with Intel's icc 9.0 which didn't even come close to
either gcc, even if you were willing to wait 10 times as long for
compilation to finish (for inter-object file optimizations). No
inlining limit would bring Intel's icc code size down to close what
gcc had and subsequently performance was bad.
gcc-3.4 was blown out of the water by gcc-4, too.
Martin
--
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <crac...@cons.org> http://www.cons.org/cracauer/
FreeBSD - where you want to go, today. http://www.freebsd.org/
Using what benchmark? That makes a *lot* of difference.
> The result was within a percent of all highly tuned CPU-specific
> options like -march=k8 -msse3 -mfpmath=sse -ffast-math, and I went
> through most iterations. This means that locking your code to one
> x86_64 implementation and locking out either AMD or Intel is not worth
> the trouble.
I don't think you've reached the correct conclusions. In particular,
note that doing -mtune instead of -march won't lock you to a specific
CPU, but will instead choose instructions/sequences optimized for your
CPU. So it's a minor win with no downside.
With the x86_64bit architecture, you have three choices: unset (x86 +
MMX/SSE/SSE2), nocona (intel, with SSE3) and athlon64 (amd, with
3dNOW!). So changing your Xeon to nocona will just enablie SSE3. The
SSE3 extensions are mostly things for doing "horizontal" computations
inside the SSE register file. So unless your benchmark was doing lots
of work on arrays of floats, it's unlikely you actually tested the
SSE3 extensions, in which case all you did was test -mtune. Without
testing the extra instructions, we don't know whether using them is
worth the trouble or not, and you didn't say what your test was.
3dNOW! is an alternative, instead of an extension, to SSE/SSE2 (and
maybe SSE3). People who hack such things tell me it's much spiffier
than the SSE instructions, so possibly enabling it would cause those
instructions to be used instead of the SSE instructions the compiler
currently uses. But you didn't test this case, so we don't know how
much difference it would make, and hence whether or not it's worth
locking your code to AMD to get it.
Thanks,
<mike
--
Mike Meyer <m...@mired.org> http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.
A proprietory application. I should have mentioned it does almost no
floating point, so the assumption to drop sse might not be valid.
However, the differences in compiler flags were generally huge and and
the simpler ones came out on or near the top.