Weird slowdown

Bill Hart

unread,

Jul 25, 2009, 12:01:17 AM7/25/09

to mpir-dev

I just did a bench on trunk and noticed that the first two or three
gcd and xgcd benches were about 20% slower.

I tracked down the exact place where the slowdown occurred. It was the
changes to mul.c between revisions 2162 and2163. But the bizarre thing
is that code only controls the unbalanced toom code. I put a trace in
and this code is never called in the benches in question.

I have turned the changes on and off many times and checked. It is not
a timing irregularity (the machine is totally unburdened atm). It is a
real effect.

This is on core2/penryn.

The only thing I can think of is that the library is being stitched
together differently by the linker and bloat is causing it to slow
down.

Bill.

Bill Hart

unread,

Jul 25, 2009, 12:30:34 AM7/25/09

to mpir-dev

It's worse with gcc 4.4 and reverting mul.c in the latest revision
doesn't help. It only helps at revision 2163.

Bill.

2009/7/25 Bill Hart <goodwi...@googlemail.com>:

Bill Hart

unread,

Jul 25, 2009, 3:13:08 AM7/25/09

to mpir-dev

It turns out that revision 2164 is fine if I use dynamic linking
instead of static linking, but not 2165, which is where I added David
Harvey's mulmid code. But that code is not ever used, even by test
code in 2165.

This looks just like the library gets to a certain size and then slows
dramatically. But what to do about it. We are going to keep adding
code and features for years. We can't observe some arbitrary hard
limit on size for performance reasons. That would just be stupid.

The only thing I can think of is to write very optimised gcd and xgcd
for small operands which are are largely self contained. But small
here is a couple of hundred limbs, and I don't really know any such
algorithms.

Bill Hart

unread,

Jul 25, 2009, 6:03:52 AM7/25/09

to mpir-dev

Exactly the same slowdown occurs between revs 2164 and 2165 on K8.

It's more than a 20% slowdown for both the 128x128 GCD and GCDEXT benchmarks.
I just don't get it. How can adding files which aren't even used,
cause it to slow down completely unrelated functions.

Yet it does.

It can't be cache related, as the two machines have completely
different cache structures.

Jason Moxham

unread,

Jul 25, 2009, 6:24:45 AM7/25/09

to mpir...@googlegroups.com

I've had the same problem but I never figured out what was going on , guessed
it may be function alignment , but when I changed the alignment nothing
changed.

Bill Hart

unread,

Jul 25, 2009, 7:48:10 AM7/25/09

to mpir...@googlegroups.com

It definitely seems to be a size problem of some kind. I went back to
revision 2165 and removed a static array of length 1000 limbs which
David Harvey had in mulmid_n.c. That was enough to make the bench
shoot back up again.

But doing this to the latest svn revision didn't work. So instead, I
remove the bgcd function, which is never actually used. This again
made the bench shoot back up to the right times again.

I wonder if we should start looking for stuff to remove from MPIR. The
first thing that can go is all the unused gcd functions.

We can also rationalise all the tc4_blah and tc7_blah and
strassen_blah functions.

There's also piles of helper functions like div_1, div_2, gcd_1,
gcd_2, which appear all over MPIR.

I also wonder whether we should get rid of some of the macros we have
in favour of functions. A lot of the bloat surely comes from stuff
being expanded out over and over again.

I did try things like compiling with -01 or -Os, but this just slows
things down way more.

This is such a bizarre problem.

Bill.

2009/7/25 Jason Moxham <ja...@njkfrudils.plus.com>:

Bill Hart

unread,

Jul 25, 2009, 8:19:37 AM7/25/09

to mpir...@googlegroups.com

It gets worse. Getting rid of all the unused gcd functions actually
makes the problem come back.

Just get rid of bgcd and the problem goes away. Get rid of bgcd, sgcd,
rgcd, hgcd, hgcd2 and it comes back.

It has to be some kind of alignment issue. But how does one control
alignment in a library?

Jason Moxham

unread,

Jul 25, 2009, 9:46:26 AM7/25/09

to mpir...@googlegroups.com

On Saturday 25 July 2009 13:19:37 Bill Hart wrote:
> It gets worse. Getting rid of all the unused gcd functions actually
> makes the problem come back.
>
> Just get rid of bgcd and the problem goes away. Get rid of bgcd, sgcd,
> rgcd, hgcd, hgcd2 and it comes back.
>
> It has to be some kind of alignment issue. But how does one control
> alignment in a library?
>
> Bill.
>

I know how to in assembler (just use an align directive before the function
label), and I suppose we could intercept gcc to do the same.
In a 4-way associative L1 instruction cache you can have up to 4 regions of
memory which have the same starting address(for the low 10 bits or
whatever) , if you have more than this then they will repeatedly get thrown
out of L1. Perhaps we are hitting this ? , It does seem doubtful , as I have
never heard of any other computational library hitting this problem

Bill Hart

unread,

Jul 25, 2009, 10:12:22 AM7/25/09

to mpir...@googlegroups.com

I don't know what the issue is. But I tried deleting all the unused
gcd functions again, getting rid of David Harvey's static array and
changed his itch function to a macro, and this time it worked. I'm
sure that's precisely what I did last time, but I don't think the
autoconf/automake combination worked.

Anyhow, the speed is back. Just as well I had been keeping track or we
might have lost it permanently. The library is just a few bytes past a
megabyte now. Highly suspicious size if you ask me.

Then again, when you statically link it with a program it is smaller than that.

Who knows. Anyhow, the working hypothesis for now is the library size
got too big.

Reply all

Reply to author

Forward