Tested in:
My laptop: model 6 / family 15 (core 2 duo T5300).
My desktop is family 15 / model 6 (pentium D 930).
The "lahf_lm" feature is present in both according to /proc/cpuinfo.
Note that the laptop is "low end" core 2 (in the sense it has no VT
extensions). The pentium D is "high end" (in the sense it has VT
extensions --- low end would be pentium D 8xx). Maybe that makes a
difference?
OTOH, the kvm 64 bit virtual cpu (kvm 72) doesn't seem to know about
the "lahf_lm" (meaning, it won't report it in cpuid, even if the host
processor has it. I assume the instructions would work anyway.)
Gonzalo
I do, but I never benchmarked it... only tested compilation and tests,
sorry. When I did the patch for nocona the purpose was twofold (a) to
make it work (same 64bit vs. 32bit issue) and (b) make sure
-march=nocona and/or -mtune=nocona flags are passed to gcc... the
latter, however, doesn't seem to be the case any more.
Maybe someone can write down a step-by-step guide on how to run
mpirbench to benchmark mpir? Even better if the code could be included
in mpir and "make bench" would "just work(tm)".
Gonzalo
To use
1) untar it
2) ./runbench <path-to-mpir>
This figures out wheter to use "mpir.h" and "libmpir.a" (default) but
it will also fall back to "gmp.h" and "libgmp.a". I tested this with
mpir-0.9.0 (gmp naming), but I guess it may also work with gmp-4.2.1
itself...
This is not reentrant, so test one install at a time, though...
----
Here are my results:
Q9550 (core 2 quad, 2.83GHz, 6M+6M L2 cache):
mpir-0.9.0:
MPIRbench.base.multiply result: 49051
MPIRbench.base.divide result: 26039
MPIRbench.base result: 35739
MPIRbench.app result: 2201.4
MPIRbench result: 8869.8
trunk@1739:
MPIRbench.base.multiply result: 59932
MPIRbench.base.divide result: 26957
MPIRbench.base result: 40194
MPIRbench.app result: 2986.1
MPIRbench result: 10956
For the kvm cpu (running in that same machine) I get 8821 and 10904
respectively, which is roughly 99.5% of the score for the real cpu
(nice)
----------
Pentium D930 (nocona, dual core, 3.00GHz, 2M+2M L2 cache)
mpir-0.9.0:
MPIRbench.base.multiply result: 14913
MPIRbench.base.divide result: 9561.3
MPIRbench.base result: 11941
MPIRbench.app result: 808.53
MPIRbench result: 3107.2
trunk@1738:
MPIRbench.base.multiply result: 23238
MPIRbench.base.divide result: 10681
MPIRbench.base result: 15755
MPIRbench.app result: 1225.1
MPIRbench result: 4393.2
------------
However, I tried hacking config.guess so that my cpu returns "nocona"
instead --- just because I wanted to benchmark it that way (and also
would be a test of what happens when compiling in nocona w/o lahf).
But now, the configure step fails.
The variables are:
using ABI="64"
CC="gcc -std=gnu99"
CFLAGS="-O2 -m64 -march=nocona -mtune=nocona"
CPPFLAGS=""
MPN_PATH=" x86_64/core2 x86_64 generic"
(so, it seems to be using core2 code, after all...)
The first errors that show up in configure output are:
checking for struct pst_processor.psp_iticksperclktick... no
=yes: command not found: HAVE_NATIVE_%3
=yes: command not found: HAVE_NATIVE_%2
=yes: command not found: HAVE_NATIVE_%3
=yes: command not found: HAVE_NATIVE_%2
And later:
checking size of unsigned short... 0
checking size of unsigned... 0
checking size of unsigned long... 0
checking size of mp_limb_t... 0
configure: error: Oops, mp_limb_t doesn't seem to work
Here it stops...
Gonzalo
Is it in svn? If so, where should I look?
Okay. Here's the idea if you want to try it:
We use the lahf/sahf instruction because the Intel architecture has
some weird dependencies on the carry bit with the inc/dec instructions
(which results in pipeline stalls whenever inc/dec is used with
adc/sbb). Torbjorn suggested using rcx as the counter register and
using the jrcxz instruction for the loop control. You can use the lea
instruction to modify rcx that way it doesn't touch the carry flag,
and the lea instruction execute on an address port, saving an ALU
port.
--jwm