I tracked down the exact place where the slowdown occurred. It was the
changes to mul.c between revisions 2162 and2163. But the bizarre thing
is that code only controls the unbalanced toom code. I put a trace in
and this code is never called in the benches in question.
I have turned the changes on and off many times and checked. It is not
a timing irregularity (the machine is totally unburdened atm). It is a
real effect.
This is on core2/penryn.
The only thing I can think of is that the library is being stitched
together differently by the linker and bloat is causing it to slow
down.
Bill.
Bill.
2009/7/25 Bill Hart <goodwi...@googlemail.com>:
This looks just like the library gets to a certain size and then slows
dramatically. But what to do about it. We are going to keep adding
code and features for years. We can't observe some arbitrary hard
limit on size for performance reasons. That would just be stupid.
The only thing I can think of is to write very optimised gcd and xgcd
for small operands which are are largely self contained. But small
here is a couple of hundred limbs, and I don't really know any such
algorithms.
It's more than a 20% slowdown for both the 128x128 GCD and GCDEXT benchmarks.
I just don't get it. How can adding files which aren't even used,
cause it to slow down completely unrelated functions.
Yet it does.
It can't be cache related, as the two machines have completely
different cache structures.