lahf/sahf on Intel64?

Cactus

unread,

Mar 14, 2009, 12:40:10 PM3/14/09

to mpir-devel

The Core2 assembler code for mpn_add_n (and mpn_sub_n?) uses the lahf
and sahf op codes to save and restore the carry flag from rax but my
Intel documents say that this is only valid in 64 bit mode for some
but not all Intel 64-bit processors. To quote:

"It is valid in 64-bit mode only if CPUID.80000001H:ECX.LAHF-SAHF
[bit 0] = 1"

I am hence wondering if this is safe?

As far as I can see, since rax isn't used in the routine except at the
very end, the instructions:

save carry to a: sbb rax, rax
load carry from a: add rax, rax

would do just as well.

But I don't know about speed differences here.

Brian

Jason Moxham

unread,

Mar 14, 2009, 12:45:53 PM3/14/09

to mpir-...@googlegroups.com

I pretty sure all core2 cpus have lahf,sahf , it's just some Pentium D dont
have it . You can test the lahf_lm feature bit in cpuid to see if it's got it

Cactus

unread,

Mar 14, 2009, 12:55:09 PM3/14/09

to mpir-devel

My Intel manuals definitely say that not all 64-bit processors have
these instructions.

And I don't know how we are ensuring that MPIR doesn't end up on any
systems without these instructions.

I don't know what the speed comparison and instruction sequencing
issues would be, but doesn't it make sense to use something that we
know always works?

Brian

Gonzalo Tornaria

unread,

Mar 14, 2009, 1:20:25 PM3/14/09

to mpir-...@googlegroups.com

On Sat, Mar 14, 2009 at 1:45 PM, Jason Moxham <ja...@njkfrudils.plus.com> wrote:
>
>
> I pretty sure all core2 cpus have lahf,sahf , it's just some Pentium D dont
> have it . You can test the lahf_lm feature bit in cpuid to see if it's got it

Tested in:

My laptop: model 6 / family 15 (core 2 duo T5300).
My desktop is family 15 / model 6 (pentium D 930).

The "lahf_lm" feature is present in both according to /proc/cpuinfo.

Note that the laptop is "low end" core 2 (in the sense it has no VT
extensions). The pentium D is "high end" (in the sense it has VT
extensions --- low end would be pentium D 8xx). Maybe that makes a
difference?

OTOH, the kvm 64 bit virtual cpu (kvm 72) doesn't seem to know about
the "lahf_lm" (meaning, it won't report it in cpuid, even if the host
processor has it. I assume the instructions would work anyway.)

Gonzalo

Jason Moxham

unread,

Mar 14, 2009, 1:41:58 PM3/14/09

to mpir-...@googlegroups.com

Early Intel CPUs with Intel 64 lacked LAHF and SAHF instructions available in
AMD64 until introduction of Pentium 4 G1 step in December 2005. LAHF and SAHF
are load and store instructions, respectively, for certain status flags.
These instructions are used for virtualization and floating-point condition
handling.

I'll find out model numbers soon

Bill Hart

unread,

Mar 14, 2009, 1:53:04 PM3/14/09

to mpir-...@googlegroups.com

This problem is quite difficult to deal with. I thought about it on
the way home, and I don't want to have config.guess return:

nocona-lahf-unknown-gnu-linux

How does the 32 bit code decide if MMX is available etc? I suppose
config.guess returns p4mmx. That's just nasty. Feature flags should be
dealt with separately. I don't even really like passing flags to the
yasm code because it means doctoring the yasm build system, which is
already complicated enough. Hmm, how does it distinguish processors
which support SSE in 32 bit mode? Surely it doesn't do p4mmxsse.

If we could exclude all 64 bit models not having support for LAHF by
having config.guess identify them as x86_64 only, i.e. not amd64,
core2, nocona, etc, then they can just run with the default x86_64
code.

We need to remember to make all changes in fat.c as well as config.guess.

Of course this doesn't help Brian much, who probably wants the core 2
code to run on all 64 bit intel processors so that he doesn't need a
separate build project for them. IIt may also slow those processors
down (though who knows, really).

I'm open to other suggestions, but so far nothing else looks appealing.

Bill.

2009/3/14 Jason Moxham <ja...@njkfrudils.plus.com>:

Jason Moxham

unread,

Mar 14, 2009, 1:56:27 PM3/14/09

to mpir-...@googlegroups.com

On Saturday 14 March 2009 17:41:58 Jason Moxham wrote:
> Early Intel CPUs with Intel 64 lacked LAHF and SAHF instructions available
> in AMD64 until introduction of Pentium 4 G1 step in December 2005. LAHF and
> SAHF are load and store instructions, respectively, for certain status
> flags. These instructions are used for virtualization and floating-point
> condition handling.
>
>
> I'll find out model numbers soon
>

No need for MPIR-1.0.0 , all 64bit Pentium's default to nonoca which leads to
a generic C build.

Bill Hart

unread,

Mar 14, 2009, 1:59:00 PM3/14/09

to mpir-...@googlegroups.com

Are you sure about that:

case $host in
x86_64-*-* | i786-*-*)
path_64="x86_64/amd64 x86_64" ;;
k10-*-*)
path_64="x86_64/amd64/k10 x86_64/amd64 x86_64" ;;
nocona-*-* | core2-*-*)
<<-------------------------------------
path_64="x86_64/core2 x86_64" ;;
<<-------------------------------------
esac

Seems to use the core2 code.

Bill.

2009/3/14 Jason Moxham <ja...@njkfrudils.plus.com>:
>

Cactus

unread,

Mar 14, 2009, 1:59:44 PM3/14/09

to mpir-devel

On Mar 14, 5:53 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
> This problem is quite difficult to deal with. I thought about it on
> the way home, and I don't want to have config.guess return:
>
> nocona-lahf-unknown-gnu-linux
>
> How does the 32 bit code decide if MMX is available etc? I suppose
> config.guess returns p4mmx. That's just nasty. Feature flags should be
> dealt with separately. I don't even really like passing flags to the
> yasm code because it means doctoring the yasm build system, which is
> already complicated enough. Hmm, how does it distinguish processors
> which support SSE in 32 bit mode? Surely it doesn't do p4mmxsse.
>
> If we could exclude all 64 bit models not having support for LAHF by
> having config.guess identify them as x86_64 only, i.e. not amd64,
> core2, nocona, etc, then they can just run with the default x86_64
> code.
>
> We need to remember to make all changes in fat.c as well as config.guess.
>
> Of course this doesn't help Brian much, who probably wants the core 2
> code to run on all 64 bit intel processors so that he doesn't need a
> separate build project for them. IIt may also slow those processors
> down (though who knows, really).

I had hoped it was not going to be an issue that we had to deal with
but the speed of the code is changed by using different instructions.

It goes from from 2.75 cycles/limb to 3.0 cycles per limb.

Brian

Jason Moxham

unread,

Mar 14, 2009, 2:00:53 PM3/14/09

to mpir-...@googlegroups.com

On Saturday 14 March 2009 17:59:00 Bill Hart wrote:
> Are you sure about that:
>
> case $host in
> x86_64-*-* | i786-*-*)
> path_64="x86_64/amd64 x86_64" ;;
> k10-*-*)
> path_64="x86_64/amd64/k10 x86_64/amd64 x86_64" ;;
> nocona-*-* | core2-*-*)
> <<-------------------------------------
> path_64="x86_64/core2 x86_64" ;;
> <<-------------------------------------
> esac
>
>
> Seems to use the core2 code.

try with ./configure --build=nocona-unknown-gmu-linux and you get C
and done on a real Pentium D as well

Bill Hart

unread,

Mar 14, 2009, 2:02:14 PM3/14/09

to mpir-...@googlegroups.com

I think we should have 64 bit amd's identify as amd64 not x86_64, then
use i786 and/or x86_64 to identify these broken chips.

As for p4mmxsse2, the current configure.in seems to get around this by
assuming all p4's support sse2. Not sure if that is actually true. So
that is probably broken.

Bill.

2009/3/14 Bill Hart <goodwi...@googlemail.com>:

Bill Hart

unread,

Mar 14, 2009, 2:07:59 PM3/14/09

to mpir-...@googlegroups.com

I get:

wbhart@sage:~/mpir-test$ ./configure --build=nocona-unknown-gnu-linux
checking build system type... Invalid configuration
`nocona-unknown-gnu-linux': machine `nocona-unknown-gnu' not
recognized
configure: error: /bin/bash ./config.sub nocona-unknown-gnu-linux failed

I think config.sub is broken.

Jason Moxham

unread,

Mar 14, 2009, 2:08:39 PM3/14/09

to mpir-...@googlegroups.com

I was going to suggest a reorganization of the x86_64/32? because its a mess ,
but I was waiting until after MPIR-1.0.0 was release.

Do we want to change now or wait till after?

On Saturday 14 March 2009 18:02:14 Bill Hart wrote:
> I think we should have 64 bit amd's identify as amd64 not x86_64, then
> use i786 and/or x86_64 to identify these broken chips.
>
> As for p4mmxsse2, the current configure.in seems to get around this by
> assuming all p4's support sse2. Not sure if that is actually true. So
> that is probably broken.

sse2 was introduced with p4 , so they all have it

Jason Moxham

unread,

Mar 14, 2009, 2:10:14 PM3/14/09

to mpir-...@googlegroups.com

On Saturday 14 March 2009 18:07:59 Bill Hart wrote:
> I get:
>
> wbhart@sage:~/mpir-test$ ./configure --build=nocona-unknown-gnu-linux
> checking build system type... Invalid configuration
> `nocona-unknown-gnu-linux': machine `nocona-unknown-gnu' not
> recognized
> configure: error: /bin/bash ./config.sub nocona-unknown-gnu-linux failed
>
> I think config.sub is broken.

whoops linux-gnu not gnu-linux

Bill Hart

unread,

Mar 14, 2009, 2:13:00 PM3/14/09

to mpir-...@googlegroups.com

OK, but I'm still unclear why it doesn't pick up the files in the
core2 directory. That is what it should do based on the code that is
there. This means noconas are giving a generic C build, which I am
sure Gonzalo would have complained about by now if it was the case,
because he has a nocona.

Jason Moxham

unread,

Mar 14, 2009, 2:22:21 PM3/14/09

to mpir-...@googlegroups.com

On Saturday 14 March 2009 18:13:00 Bill Hart wrote:
> OK, but I'm still unclear why it doesn't pick up the files in the
> core2 directory. That is what it should do based on the code that is
> there. This means noconas are giving a generic C build, which I am
> sure Gonzalo would have complained about by now if it was the case,
> because he has a nocona.
>

Perhaps he uses fat build? , which would think its a core2

There some references missing in configure.in , and a shed load of
inconsistencies . I can fix but it means another round of testing

Bill Hart

unread,

Mar 14, 2009, 2:35:32 PM3/14/09

to mpir-...@googlegroups.com

OK, found the problem. Nocona now builds with core2 code. Can you
autoconf this and commit.

So now we need to exclude those broken models after all. :-)

No need for a full round of testing. We only need to check that
configure still works on all the machines we've tested for and just
randomly test a few full builds.

Bill Hart

unread,

Mar 14, 2009, 2:36:22 PM3/14/09

to mpir-...@googlegroups.com

And I agree, I think we should clean up the whole
config.guess/configfsf.sub and configure.in for a service release.

Bill.

2009/3/14 Bill Hart <goodwi...@googlemail.com>:

Jason Moxham

unread,

Mar 14, 2009, 3:03:58 PM3/14/09

to mpir-...@googlegroups.com

On Saturday 14 March 2009 18:35:32 Bill Hart wrote:
> OK, found the problem. Nocona now builds with core2 code. Can you
> autoconf this and commit.
>

I can install my old autotools on another machine, in a few hours.
The new autotools require ylwrap , which I added with
automake --add-missing
ylwrap is used for parralel makes (which I certainly do)
It's put it as a symbolic link though , I could copy the file itself

commited anyway

Cactus

unread,

Mar 14, 2009, 3:13:37 PM3/14/09

to mpir-devel

> > >> >> >> >> > Gonzalo- Hide quoted text -
>
> - Show quoted text -

I have just tested my GMP assembler code (based on Pierrick Gaudry's
code) for mpn_add_n and mpn_sub_n in MPIR on Core2 and it achieves the
same speed as the current code

So I can solve this on Windows without difficulty by simply using this
code.

It is very mature code so I don't see any problem in using it (I will
obviously test it).

Brian

Bill Hart

unread,

Mar 14, 2009, 3:13:42 PM3/14/09

to mpir-...@googlegroups.com

Yeah the newer versions of autotools work fine for me, they just break
the build on Darwin for reasons beyond me (actually I have a
theory....) . At least we know the old version doesn't screw anything
up.

Bill Hart

unread,

Mar 14, 2009, 3:14:50 PM3/14/09

to mpir-...@googlegroups.com

That's an excellent solution. Now we just need to fix linux so that it
still works on these systems. Oh joy!

Bill.

2009/3/14 Cactus <riem...@googlemail.com>:

Jason Moxham

unread,

Mar 14, 2009, 3:26:21 PM3/14/09

to mpir-...@googlegroups.com

On Saturday 14 March 2009 19:14:50 Bill Hart wrote:
> That's an excellent solution. Now we just need to fix linux so that it
> still works on these systems. Oh joy!
>
> Bill.
>
> 2009/3/14 Cactus <riem...@googlemail.com>:
> > On Mar 14, 7:03 pm, Jason Moxham <ja...@njkfrudils.plus.com> wrote:
> >> On Saturday 14 March 2009 18:35:32 Bill Hart wrote:
> >> > OK, found the problem. Nocona now builds with core2 code. Can you
> >> > autoconf this and commit.
> >>

did you use the right configure.in , we seem to have lost some stuff eg
GMPLINK

Bill Hart

unread,

Mar 14, 2009, 3:34:00 PM3/14/09

to mpir-...@googlegroups.com

Oh dear, I think I didn't. I'll back it out and try again.

Bill Hart

unread,

Mar 14, 2009, 3:42:33 PM3/14/09

to mpir-...@googlegroups.com

OK, can you try autotools and commit again. I went back to revision
1730 and made the changes again, this time hopefully without breaking
everything else.

Really odd it didn't tell me my revision was out of date. It usually
does if commits cross in the aether.

Bill.

2009/3/14 Bill Hart <goodwi...@googlemail.com>:

Jason Moxham

unread,

Mar 14, 2009, 3:51:42 PM3/14/09

to mpir-...@googlegroups.com

On Saturday 14 March 2009 19:42:33 Bill Hart wrote:
> OK, can you try autotools and commit again. I went back to revision
> 1730 and made the changes again, this time hopefully without breaking
> everything else.
>

done

> Really odd it didn't tell me my revision was out of date. It usually
> does if commits cross in the aether.
>

Mine wont let me commit unless I'm up to date

Bill Hart

unread,

Mar 14, 2009, 4:41:22 PM3/14/09

to mpir-...@googlegroups.com

Hmm, currently configure is well and truly broken on sage.math. Did
something go wrong during autoconf.

Bill Hart

unread,

Mar 14, 2009, 4:58:06 PM3/14/09

to mpir-...@googlegroups.com

Well that is bizarre. It is totally fine after deleting the directory
and checking it out again. Somehow my files had gotten corrupted. It's
all good now though.

configure, make, make check all pass on sage.math.

Cactus

unread,

Mar 14, 2009, 5:02:21 PM3/14/09

to mpir-devel

On Mar 14, 7:42 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
> OK, can you try autotools and commit again. I went back to revision
> 1730 and made the changes again, this time hopefully without breaking
> everything else.
>
> Really odd it didn't tell me my revision was out of date. It usually
> does if commits cross in the aether.
>
> Bill.
>

> 2009/3/14 Bill Hart <goodwillh...@googlemail.com>:

>
> > Oh dear, I think I didn't. I'll back it out and try again.
>
> > Bill.
>
> > 2009/3/14 Jason Moxham <ja...@njkfrudils.plus.com>:
>
> >> On Saturday 14 March 2009 19:14:50 Bill Hart wrote:
> >>> That's an excellent solution. Now we just need to fix linux so that it
> >>> still works on these systems. Oh joy!
>
> >>> Bill.
>

> >>> 2009/3/14 Cactus <rieman...@googlemail.com>:

I have built, tested and committed the revised Core2 assembler for
mpn_add_n and mpn_sub_n on Windows.

As this assembler also provides the *_nc function calls I have added
these as well on Windows.

Brian

Bill Hart

unread,

Mar 14, 2009, 7:11:07 PM3/14/09

to mpir-...@googlegroups.com

I think we should examine the feature flag in config.guess for
LAHF-SAHF. Here is an incomplete list of 64 bit Pentium 4's without
the feature:

Pentium 4 506 E0
Pentium 4 516 E0
Pentium 4 511 E0
Pentium 4 519K E0
Pentium 4 HT 521
Pentium 4 HT 531
Pentium 4 HT 541
Pentium 4 HT 551
Pentium 4 HT 561
Pentium 4 HT 571
Pentium 4 HT 3.2F
Pentium 4 HT 3.4F
Pentium 4 HT 3.6F
Pentium 4 HT 3.8F
Pentium 4 HT 517
Pentium 4 HT 524
All Family 15, Model 4, steppings N0, R0
Pentium 4 EE 3.73

Too messy to test for them all specially.

Can we add a directory called LAHF to core2 and have all nocona
processors use only core2 in the build whereas specifically core2
processors can use core2 and core2/lahf. It will be the simplest fix
for now until we can implement something more sophisticated.

Bill.

2009/3/14 Jason Moxham <ja...@njkfrudils.plus.com>:
>
>
>
>

Jason Moxham

unread,

Mar 14, 2009, 8:32:21 PM3/14/09

to mpir-...@googlegroups.com

Sounds good , give us 30 mins to do it

Bill Hart

unread,

Mar 14, 2009, 8:49:33 PM3/14/09

to mpir-...@googlegroups.com

We can do similar to Brian and put the old assembler in for add_n and
sub_n for nocona.

Heh, I've just been reading instruction sets for old CPU's. I think
the 8086 must have been the first main processor of Intel's to have a
mul instruction. I hadn't realised that.

I wonder if there were earlier processors made by other manufacturers,
but used in home computers or games machines that had them.

Bill.

2009/3/15 Jason Moxham <ja...@njkfrudils.plus.com>:

Jason Moxham

unread,

Mar 14, 2009, 8:55:40 PM3/14/09

to mpir-...@googlegroups.com

On Sunday 15 March 2009 00:49:33 Bill Hart wrote:
> We can do similar to Brian and put the old assembler in for add_n and
> sub_n for nocona.
>
> Heh, I've just been reading instruction sets for old CPU's. I think
> the 8086 must have been the first main processor of Intel's to have a
> mul instruction. I hadn't realised that.
>
> I wonder if there were earlier processors made by other manufacturers,
> but used in home computers or games machines that had them.
>
> Bill.
>

I had motorola 6809 in Dragon 32 , my first MUL , 8bit by 8bit to 16 bit
, must of been 1982/83

Bill Hart

unread,

Mar 14, 2009, 9:07:09 PM3/14/09

to mpir-...@googlegroups.com

Amazing that someone implemented transcendental functions on the 6502,
which didn't have even a multiply unit.

http://www.6502.org/source/

Bill.

Gonzalo Tornaria

unread,

Mar 14, 2009, 9:27:48 PM3/14/09

to mpir-...@googlegroups.com

On Sat, Mar 14, 2009 at 3:13 PM, Bill Hart <goodwi...@googlemail.com> wrote:
>
> OK, but I'm still unclear why it doesn't pick up the files in the
> core2 directory. That is what it should do based on the code that is
> there. This means noconas are giving a generic C build, which I am
> sure Gonzalo would have complained about by now if it was the case,
> because he has a nocona.

I do, but I never benchmarked it... only tested compilation and tests,
sorry. When I did the patch for nocona the purpose was twofold (a) to
make it work (same 64bit vs. 32bit issue) and (b) make sure
-march=nocona and/or -mtune=nocona flags are passed to gcc... the
latter, however, doesn't seem to be the case any more.

Maybe someone can write down a step-by-step guide on how to run
mpirbench to benchmark mpir? Even better if the code could be included
in mpir and "make bench" would "just work(tm)".

Gonzalo

Jason Moxham

unread,

Mar 14, 2009, 9:29:10 PM3/14/09

to mpir-...@googlegroups.com

On Sunday 15 March 2009 00:49:33 Bill Hart wrote:

> We can do similar to Brian and put the old assembler in for add_n and
> sub_n for nocona.
>

svn, and commited , the nocona with no lahf will use mpn/x86_64/add_n.as ,
which at the moment is GMP's old one.I assume you mean an other one.

Bill Hart

unread,

Mar 14, 2009, 9:32:23 PM3/14/09

to mpir-...@googlegroups.com

I was thinking of the original yasm one we had from converting
Pierrick Gaudry's code, i.e. what was there before we switched to your
code. It was probably in the amd64 directory before you revolutionised
it.

Gonzalo Tornaria

unread,

Mar 14, 2009, 9:49:38 PM3/14/09

to mpir-...@googlegroups.com

My nocona with lahf is now a "core2-unknown-linux-gnu". Thanks for
upgrading it for me... I hope it still keeps my office warm in winter
:-)

Gonzalo

On Sat, Mar 14, 2009 at 10:29 PM, Jason Moxham

Bill Hart

unread,

Mar 14, 2009, 9:55:21 PM3/14/09

to mpir-...@googlegroups.com

mpirbench will definitely be released as part of or very shortly after
mpir-1.0.0 (due out on Monday I believe), but as a separate package on
the website. It should "just work TM".

Roughly speaking here is what you have to do to get it going atm:

1) download the mpirbench-0.1.tar.gz from the files section of this
list (the one I uploaded - I'm not competing with the other three
versions there, they'll all be merged for the final release - and the
instructions will change again).

2) untar it

3) Edit library and include paths sent to gcc in the runbench script
to point to your mpir.h and libmpir (it's linked statically with mpir,
so no need to set LD_LIBRARY_PATH - though on Darwin this is not
possible).

4) compile qexpr.c with gcc to an executable qexpr and set your PATH
to include the location of the executable

5) type ./runbench

Bill.

2009/3/15 Gonzalo Tornaria <torn...@gmail.com>:

Jason Moxham

unread,

Mar 14, 2009, 10:25:47 PM3/14/09

to mpir-...@googlegroups.com

On Sunday 15 March 2009 01:32:23 Bill Hart wrote:
> I was thinking of the original yasm one we had from converting
> Pierrick Gaudry's code, i.e. what was there before we switched to your
> code. It was probably in the amd64 directory before you revolutionised
> it.
>
> Bill.
>

Done

Gonzalo Tornaria

unread,

Mar 15, 2009, 12:36:45 AM3/15/09

to mpir-...@googlegroups.com

Thanks for the help with mpirbench. I wouldn't call that "just work",
so I redid the script to be more "automatic". It's pretty small when
binaries are not included (hint, hint), so I'm attaching it (I hope it
will make it through the lists).

To use

1) untar it
2) ./runbench <path-to-mpir>

This figures out wheter to use "mpir.h" and "libmpir.a" (default) but
it will also fall back to "gmp.h" and "libgmp.a". I tested this with
mpir-0.9.0 (gmp naming), but I guess it may also work with gmp-4.2.1
itself...

This is not reentrant, so test one install at a time, though...

----

Here are my results:

Q9550 (core 2 quad, 2.83GHz, 6M+6M L2 cache):

mpir-0.9.0:
MPIRbench.base.multiply result: 49051
MPIRbench.base.divide result: 26039
MPIRbench.base result: 35739
MPIRbench.app result: 2201.4
MPIRbench result: 8869.8

trunk@1739:

MPIRbench.base.multiply result: 59932
MPIRbench.base.divide result: 26957
MPIRbench.base result: 40194
MPIRbench.app result: 2986.1
MPIRbench result: 10956

For the kvm cpu (running in that same machine) I get 8821 and 10904
respectively, which is roughly 99.5% of the score for the real cpu
(nice)

----------

Pentium D930 (nocona, dual core, 3.00GHz, 2M+2M L2 cache)

mpir-0.9.0:
MPIRbench.base.multiply result: 14913
MPIRbench.base.divide result: 9561.3
MPIRbench.base result: 11941
MPIRbench.app result: 808.53
MPIRbench result: 3107.2

trunk@1738:
MPIRbench.base.multiply result: 23238
MPIRbench.base.divide result: 10681
MPIRbench.base result: 15755
MPIRbench.app result: 1225.1
MPIRbench result: 4393.2

------------

mpirbench-0.1-gtl.tar.gz

Gonzalo Tornaria

unread,

Mar 15, 2009, 12:55:39 AM3/15/09

to mpir-...@googlegroups.com

As I mentioned, in current trunk@1739, my nocona is detected as a
"core2" by config.guess, since it does include lahf.

However, I tried hacking config.guess so that my cpu returns "nocona"
instead --- just because I wanted to benchmark it that way (and also
would be a test of what happens when compiling in nocona w/o lahf).
But now, the configure step fails.

The variables are:

using ABI="64"
CC="gcc -std=gnu99"
CFLAGS="-O2 -m64 -march=nocona -mtune=nocona"
CPPFLAGS=""
MPN_PATH=" x86_64/core2 x86_64 generic"

(so, it seems to be using core2 code, after all...)

The first errors that show up in configure output are:

checking for struct pst_processor.psp_iticksperclktick... no
=yes: command not found: HAVE_NATIVE_%3
=yes: command not found: HAVE_NATIVE_%2
=yes: command not found: HAVE_NATIVE_%3
=yes: command not found: HAVE_NATIVE_%2

And later:

checking size of unsigned short... 0
checking size of unsigned... 0
checking size of unsigned long... 0
checking size of mp_limb_t... 0
configure: error: Oops, mp_limb_t doesn't seem to work

Here it stops...

Gonzalo

Bill Hart

unread,

Mar 15, 2009, 2:05:04 AM3/15/09

to mpir-...@googlegroups.com

Damn. It's grepping for GLOBAL_FUNC and finding yasm macros and not
expanding them I think. Hopefully we can fix that.

Bill.

2009/3/15 Gonzalo Tornaria <torn...@gmail.com>:
>

Jason Moxham

unread,

Mar 15, 2009, 9:31:36 AM3/15/09

to mpir-...@googlegroups.com

Done ,
remove crlf from old add/sub_n and remove yasm macros from GLOBAL_FUNC names

Jason Moxham

unread,

Mar 15, 2009, 10:07:52 AM3/15/09

to mpir-...@googlegroups.com

On Sunday 15 March 2009 04:55:39 Gonzalo Tornaria wrote:
> As I mentioned, in current trunk@1739, my nocona is detected as a
> "core2" by config.guess, since it does include lahf.
>
> However, I tried hacking config.guess so that my cpu returns "nocona"
> instead --- just because I wanted to benchmark it that way (and also
> would be a test of what happens when compiling in nocona w/o lahf).
> But now, the configure step fails.

shouldn't need to hack config.guess ,if everything working as it should :)
then a
./configure --build=your_guess-unknown-linux-gnu
should do it
where the -unknown-linux-gnu is the last part of what config.guess returns

Jason Martin

unread,

Mar 15, 2009, 1:03:51 PM3/15/09

to mpir-...@googlegroups.com

Hi Guys,

Sorry for the late reply, but I've been camping for the last couple days...

I believe that I can rewrite the core2 code to avoid the lahf/sahf
instructions without any performance lost. If there is still an
interested or need, let me know and I'll have a go at it.

--jwm

Jason Moxham

unread,

Mar 15, 2009, 1:18:38 PM3/15/09

to mpir-...@googlegroups.com

On Sunday 15 March 2009 17:03:51 Jason Martin wrote:
> Hi Guys,
>
> Sorry for the late reply, but I've been camping for the last couple days...
>
> I believe that I can rewrite the core2 code to avoid the lahf/sahf
> instructions without any performance lost. If there is still an
> interested or need, let me know and I'll have a go at it.
>
> --jwm
>

I've got a 2c/l penryn add/sub but it does use lahf/sahf , probably also 2c/l
on core2.I have no idea if mine can written without lahf/sahf , I still
getting to grips with intel microarchitecture , compared with the amd it's
complicated.

Jason Martin

unread,

Mar 15, 2009, 1:29:30 PM3/15/09

to mpir-...@googlegroups.com

> On Sunday 15 March 2009 17:03:51 Jason Martin wrote:
>> Hi Guys,
>>
>> Sorry for the late reply, but I've been camping for the last couple days...
>>
>> I believe that I can rewrite the core2 code to avoid the lahf/sahf
>> instructions without any performance lost. If there is still an
>> interested or need, let me know and I'll have a go at it.
>>
>> --jwm
>>
>
> I've got a 2c/l penryn add/sub but it does use lahf/sahf , probably also 2c/l
> on core2.I have no idea if mine can written without lahf/sahf , I still
> getting to grips with intel microarchitecture , compared with the amd it's
> complicated.
>

Is it in svn? If so, where should I look?

Jason Moxham

unread,

Mar 15, 2009, 1:36:34 PM3/15/09

to mpir-...@googlegroups.com

No , I havent done the feedin/winddown code yet , just trying to get feel for
the arch.

>

Jason Martin

unread,

Mar 15, 2009, 1:47:38 PM3/15/09

to mpir-...@googlegroups.com

Okay. Here's the idea if you want to try it:

We use the lahf/sahf instruction because the Intel architecture has
some weird dependencies on the carry bit with the inc/dec instructions
(which results in pipeline stalls whenever inc/dec is used with
adc/sbb). Torbjorn suggested using rcx as the counter register and
using the jrcxz instruction for the loop control. You can use the lea
instruction to modify rcx that way it doesn't touch the carry flag,
and the lea instruction execute on an address port, saving an ALU
port.

--jwm

Gonzalo Tornaria

unread,

Mar 15, 2009, 4:02:01 PM3/15/09

to mpir-...@googlegroups.com

Thanks, it works ok now, both "core2" and "nocona". However, the
benchmark seems to favor using "nocona" for a very slight margin, even
if nocona will not include lahf/sahf code, but it includes core2 asm
code otherwise. This must be the gcc optimization with -mtune=nocona
instead of -mtune=core2.

I did 48 runs with each configuration, in an otherwise idle system.

The average score was:

config=nocona: avg =4654.82 (min=4619.2, max=4676.5)

config=core2: avg = 4632.86 (min=4575.1, max=4652.8)

unfortunately, I mistakenly deleted the log and I only kept the score
for the whole benchmark, I don't have separate scores for
multiply/divide, etc.

Note that I had previously reported a score of 4393.2 for that same
cpu. That was using gcc-4.1, and I've since upgraded to gcc 4.3.2 as
included in debian/lenny (stable).

Gonzalo

Reply all

Reply to author

Forward