factoring benchmarks

58 views
Skip to first unread message

Joel B. Mohler

unread,
Dec 19, 2007, 11:58:58 AM12/19/07
to sage-...@googlegroups.com
Ticket
http://trac.sagemath.org/sage_trac/ticket/1558
has some new code to use NTL to factor members of ZZ[x]. I'm doing some
benchmarking to confirm or deny the comments in polynomial_element.pyx factor
method. Here's a very brief summary of what I'm finding. I may or may not
provide more detail later to substantiate these claims. For the most part they
are not difficult to substantiate.

Here's my findings:

1) pari is faster with polynomial of degree 30-300. For higher degree, NTL wins
and wins big asymptotically -- degree 2000 random poly takes "forever" with pari
and 45 seconds with NTL.

2) pari seems to be a bit better when coefficients are larger but degree is
still low. For example, NTL is very fast for small degree (<10), but once we
start choosing larger coefficients (140 bits), pari becomes much more
competitive. However, this point seems fraught with special cases. I think the
point may be that pari integer arithmetic beats NTL integer arithmetic, but
naive testing with ntl.ZZ and pari('') are indicating otherwise.

3) My tests seem to indicate that NTL's performs better when there are small
factors, but it doesn't seem a decisive difference. This doesn't seem real
helpful for choosing an algorithm in general though. It could be a point to
keep in mind for more specialized uses of factoring when you might know more
about the poly in hand.

I'd be curious if there are any comments about this. I'm going to change the
criteria in ticket 1558 to make pari factor when degree is between 30 and 300
and otherwise let NTL factor.

--
Joel

John Cremona

unread,
Dec 19, 2007, 1:03:32 PM12/19/07
to sage-...@googlegroups.com
I'm surprised by your comment about integer arithmetic, since both
pari and NTL use gmp. AT least, they both *can* use gmp though it
might not be the default. Do you know of the pari and NTL you are
using are gmp-based?

John


--
John Cremona

Joel B. Mohler

unread,
Dec 19, 2007, 1:40:11 PM12/19/07
to sage-...@googlegroups.com
On Wed, Dec 19, 2007 at 06:03:32PM +0000, John Cremona wrote:
> I'm surprised by your comment about integer arithmetic, since both
> pari and NTL use gmp. AT least, they both *can* use gmp though it
> might not be the default. Do you know of the pari and NTL you are
> using are gmp-based?

Well, I know I'm using the default 2.9 build, and I know I see these timings:

sage: p=pari(1234157)
sage: r=ntl.ZZ(1234157)
sage: timeit p^300
10000 loops, best of 3: 33.8 µs per loop
sage: timeit r^300
10000 loops, best of 3: 21.7 µs per loop

I realize the exponentiation is a fairly narrow minded view of the whole of
integer arithmetic, but the difference there seems a little striking to both be
using gmp. However, I guess it really isn't fair since the pari is a gen
object, but a the NTL ZZ is known as an integer. So, here this might be more
fair to compare an ntl.ZZX with pari:

sage: s=ntl.ZZX([1234157])
sage: timeit s^300
10000 loops, best of 3: 94.6 µs per loop

I don't know, that seems bizarre to me!

--
Joel

Message has been deleted

Bill Hart

unread,
Dec 19, 2007, 7:19:21 PM12/19/07
to sage-devel
I get about 7us per loop on sage.math for Pari for the exponentiation.
So perhaps this is all architecture dependent. This would not surprise
me in the slightest.

At any rate, I suspect the algorithms used for factorisation are
implemented quite differently between NTL and Pari. Since both Pari
and NTL use GMP mpn's for underlying integer arithmetic in SAGE, I
think the algorithms being used for factorisation are much more
relevant than the speed of basic arithmetic, which should be the same
for both.

The other point is that both Pari and NTL use finite field stuff to
factor polynomials over the integers. So the speed of integer
arithmetic is almost irrelevant.

Having said all that, it is surprising that NTL is behind Pari for
polynomial factorisation, given the amount of work Victor put into
this. Can you try your example problems on sage.math?

Bill.

David Harvey

unread,
Dec 19, 2007, 7:24:01 PM12/19/07
to sage-...@googlegroups.com

On Dec 19, 2007, at 7:19 PM, Bill Hart wrote:

>
> I get about 7us per loop on sage.math for Pari for the exponentiation.
> So perhaps this is all architecture dependent. This would not surprise
> me in the slightest.
>
> At any rate, I suspect the algorithms used for factorisation are
> implemented quite differently between NTL and Pari. Since both Pari
> and NTL use GMP mpn's for underlying integer arithmetic in SAGE, I
> think the algorithms being used for factorisation are much more
> relevant than the speed of basic arithmetic, which should be the same
> for both.
>
> The other point is that both Pari and NTL use finite field stuff to
> factor polynomials over the integers. So the speed of integer
> arithmetic is almost irrelevant.
>
> Having said all that, it is surprising that NTL is behind Pari for
> polynomial factorisation, given the amount of work Victor put into
> this. Can you try your example problems on sage.math?

On the other hand, Shoup's research is mostly about the asymptotics
for very large degree problems, so perhaps he didn't bother trying to
optimise the small polynomial case very much (?)

david

Bill Hart

unread,
Dec 19, 2007, 7:38:53 PM12/19/07
to sage-devel
I get 6us per loop for the exponentiation in NTL on sage.math.

Bill Hart

unread,
Dec 19, 2007, 9:39:56 PM12/19/07
to sage-devel
Joel,

I agree with your summary. Here is a graph I plotted timing Pari (red)
vs NTL (blue) for factoring in ZZ, where there are at least two
polynomial factors.

http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/factor.png

The scale is log to the base 2. At about length 256 polynomials Pari
ran out of memory and the factorisation stopped. As far as I know both
NTL and Pari that I used are linked against and are using GMP with all
the appropriate patches.

Clearly NTL is asymptotically fast for polynomials of large length,
but Pari is faster for shorter polynomials. I didn't really take the
bit size past 2000 or so.

Bill.

Bill Hart

unread,
Dec 19, 2007, 9:44:43 PM12/19/07
to sage-devel
Oops, I mean factoring in ZZ[x].

I should also point out that for any given bitsize and polynomial
length the shortest time is taken in each case. This is of course a
stupid way to time factorisation, but it gives a starting point I
suppose.

Bill.

On 20 Dec, 02:39, Bill Hart <goodwillh...@googlemail.com> wrote:
> Joel,
>
> I agree with your summary. Here is a graph I plotted timing Pari (red)
> vs NTL (blue) for factoring in ZZ, where there are at least two
> polynomial factors.
>
> http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/fact...

Bill Hart

unread,
Dec 19, 2007, 9:47:38 PM12/19/07
to sage-devel
Oh, I also forgot to mention, the polynomial sizes are sizes of polys
a and b where we are factoring a*b. So roughly speaking you should
double the bitsizes and poly lengths if you want the sizes of the
polys being factored. I made no attempt to ensure a and b are
irreducible, so they may have factors themselves.

Bill.

On 20 Dec, 02:39, Bill Hart <goodwillh...@googlemail.com> wrote:
> Joel,
>
> I agree with your summary. Here is a graph I plotted timing Pari (red)
> vs NTL (blue) for factoring in ZZ, where there are at least two
> polynomial factors.
>
> http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/fact...

Bill Hart

unread,
Dec 19, 2007, 11:42:06 PM12/19/07
to sage-devel
Here is a graph that uses slightly bigger coefficients, up to 10000
bits.

http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/factor2.png

I don't see the speed increase you mentioned for large coefficients,
but this might be due to some special properties of the polynomials
you are factoring, or perhaps due to the particular architecture you
have. Prime field arithmetic is probably quite architecture dependent,
so that may contribute, I'm not sure.

Bill.

Joel B. Mohler

unread,
Dec 20, 2007, 5:10:28 AM12/20/07
to sage-...@googlegroups.com
On Wednesday 19 December 2007 23:42, Bill Hart wrote:
> Here is a graph that uses slightly bigger coefficients, up to 10000
> bits.
>
> http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/factor2.pn
>g
>
> I don't see the speed increase you mentioned for large coefficients,
> but this might be due to some special properties of the polynomials
> you are factoring, or perhaps due to the particular architecture you
> have. Prime field arithmetic is probably quite architecture dependent,
> so that may contribute, I'm not sure.

Yeah, in the end, I don't think I believe the speed increase for large
coefficients either. I think what I did observe was probably a result of
accidents with semi-random polynomials.

Here's a snippet from my patched branch:
Loading SAGE library. Current Mercurial branch is: uni-factoring
sage: R.<x>=ZZ[]
sage: f=x^2-1
sage: timeit f._factor_ntl() ######## NTL wins
10000 loops, best of 3: 159 µs per loop
sage: timeit f._factor_pari()
1000 loops, best of 3: 794 µs per loop
sage: f=R.random_element(6)
sage: timeit f._factor_ntl() ######### NTL wins
1000 loops, best of 3: 316 µs per loop
sage: timeit f._factor_pari()
1000 loops, best of 3: 533 µs per loop

I'm not seeing this NTL win on your chart. I'm wondering if this might be
because of data-conversion costs to pari. Of course, that data conversion is
a cost of the sage implementation so when making the cutoff choices for sage
we need to include the conversion costs.

--
Joel

Bill Hart

unread,
Dec 20, 2007, 10:56:24 AM12/20/07
to sage-devel
Hi Joel,

Yes, it occurred to me this morning that conversion costs might be
getting in the way.

Does SAGE use the GP interpreter or the Pari library to do this
computation? If it is using the Pari library, and the data conversion
isn't quadratic time, then I don't think this should be a problem.

The other thing is, you are factoring a single random polynomial,
which may be irreducible. Factoring generic polynomials may be much
faster, since one only needs to know it is irreducible and then one
need not do any factoring. I will prepare a graph comparing the time
to factor generic polynomials, though I don't in general think that is
such a great benchmark, unless one specifically wants to tune for that
situation (which could be a nice idea if one had a certain flag that
could be set to indicate that one expected the polynomials being
factored to be vaguely generic).

Bill.

On 20 Dec, 10:10, "Joel B. Mohler" <j...@kiwistrawberry.us> wrote:
> On Wednesday 19 December 2007 23:42, Bill Hart wrote:
>
> > Here is a graph that uses slightly bigger coefficients, up to 10000
> > bits.
>
> >http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/fact...
> >g
>
> > I don't see the speed increase you mentioned for large coefficients,
> > but this might be due to some special properties of the polynomials
> > you are factoring, or perhaps due to the particular architecture you
> > have. Prime field arithmetic is probably quite architecture dependent,
> > so that may contribute, I'm not sure.
>
> Yeah, in the end, I don't think I believe the speed increase for large
> coefficients either. I think what I did observe was probably a result of
> accidents with semi-random polynomials.
>
> Here's a snippet from my patched branch:
> Loading SAGE library. Current Mercurial branch is: uni-factoring
> sage: R.<x>=ZZ[]
> sage: f=x^2-1
> sage: timeit f._factor_ntl() ######## NTL wins
> 10000 loops, best of 3: 159 µs per loop
> sage: timeit f._factor_pari()
> 1000 loops, best of 3: 794 µs per loop
> sage: f=R.random_element(6)
> sage: timeit f._factor_ntl() ######### NTL wins
> 1000 loops, best of 3: 316 µs per loop
> sage: timeit f._factor_pari()
> 1000 loops, best of 3: 533 µs per loop

Joel B. Mohler

unread,
Dec 20, 2007, 11:22:02 AM12/20/07
to sage-...@googlegroups.com
On Thursday 20 December 2007 10:56, Bill Hart wrote:
> Does SAGE use the GP interpreter or the Pari library to do this
> computation? If it is using the Pari library, and the data conversion
> isn't quadratic time, then I don't think this should be a problem.

The library and I think the conversion is done correctly. It's a bit of
convoluted path in the sage code to get in this specific case though so there
might be other silly issues.

> The other thing is, you are factoring a single random polynomial,
> which may be irreducible. Factoring generic polynomials may be much
> faster, since one only needs to know it is irreducible and then one
> need not do any factoring. I will prepare a graph comparing the time
> to factor generic polynomials, though I don't in general think that is
> such a great benchmark, unless one specifically wants to tune for that
> situation (which could be a nice idea if one had a certain flag that
> could be set to indicate that one expected the polynomials being
> factored to be vaguely generic).

My observation is that random polynomials are almost always irreducible. This
fact seems in startling contrast to the case for integers! For instance:
sage: R.<x>=ZZ[]
sage: R.random_element(6,-2^23,2^23).factor()
# virtually always 1 factor
I've actually been doing tests of 3 different cases (not all of which made it
to the mailing list).
1) product of a bunch of small factors
2) product of 2 medium factors
3) random (irreducible by comments above)

NTL seems to gain a bit relative to pari when there are non-trivial factors,
but it doesn't seem too significant. It's main influence in my cutoff points
has been to let NTL factor small polys out to degree 30 rather than degree
10. But both are so competitive for degree 10-20 that it's not that big of a
deal.

--
Joel

Bill Hart

unread,
Dec 20, 2007, 11:32:58 AM12/20/07
to sage-devel
On 20 Dec, 16:22, "Joel B. Mohler" <j...@kiwistrawberry.us> wrote:
> On Thursday 20 December 2007 10:56, Bill Hart wrote:
>
> > Does SAGE use the GP interpreter or the Pari library to do this
> > computation? If it is using the Pari library, and the data conversion
> > isn't quadratic time, then I don't think this should be a problem.
>
> The library and I think the conversion is done correctly. It's a bit of
> convoluted path in the sage code to get in this specific case though so there
> might be other silly issues.
>
> > The other thing is, you are factoring a single random polynomial,
> > which may be irreducible. Factoring generic polynomials may be much
> > faster, since one only needs to know it is irreducible and then one
> > need not do any factoring. I will prepare a graph comparing the time
> > to factor generic polynomials, though I don't in general think that is
> > such a great benchmark, unless one specifically wants to tune for that
> > situation (which could be a nice idea if one had a certain flag that
> > could be set to indicate that one expected the polynomials being
> > factored to be vaguely generic).
>
> My observation is that random polynomials are almost always irreducible. This
> fact seems in startling contrast to the case for integers!

Yes I believe this.

> For instance:
> sage: R.<x>=ZZ[]
> sage: R.random_element(6,-2^23,2^23).factor()
> # virtually always 1 factor
> I've actually been doing tests of 3 different cases (not all of which made it
> to the mailing list).
> 1) product of a bunch of small factors
> 2) product of 2 medium factors
> 3) random (irreducible by comments above)
>
> NTL seems to gain a bit relative to pari when there are non-trivial factors,
> but it doesn't seem too significant.

That doesn't surprise me. NTL probably tries to find these.

> It's main influence in my cutoff points
> has been to let NTL factor small polys out to degree 30 rather than degree
> 10. But both are so competitive for degree 10-20 that it's not that big of a
> deal.

That's interesting. Pari has certainly yielded some surprises
recently.

The generic poly profile is currently running. I'll post the graph
when it is done, So far Pari at least, doesn't look much faster than
it was before.

Bill.

Bill Hart

unread,
Dec 20, 2007, 12:49:37 PM12/20/07
to sage-devel
Here is the graph for generic polynomials:

http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/factor-gen.png

At about 1000 bits and length 6, NTL does seem to get ahead again.

Bill.

Bill Hart

unread,
Dec 20, 2007, 1:46:49 PM12/20/07
to sage-devel
I've redone both graphs since the generic NTL profiling code in FLINT
was assuming that a length n polynomial had n+1 coefficients (my
student had done this due to some confusion about the way coefficients
were counted).

The graph for polynomials with two large factors (at least):

http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/factor.png

and the graph for generic polynomials:

http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/factor-gen.png

Hopefully these are correct now.

Bill.

On 20 Dec, 17:49, Bill Hart <goodwillh...@googlemail.com> wrote:
> Here is the graph for generic polynomials:
>
> http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/fact...

Bill Hart

unread,
Dec 20, 2007, 3:13:05 PM12/20/07
to sage-devel
For comparison, here is our favourite non-free closed source package
vs Pari and NTL for factoring with two large factors:

Magma vs Pari:

http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/magpari-factor.png

Magma vs NTL:

http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/magNTL-factor.png

Something that may not be fair here is that Magma factors the content,
so I have divided the polynomials used by their content before
factoring them. But I didn't include the time taken to divide by the
content in the timing. Given that computing the content should be
almost instant generically, I don't see this as much of a problem
though.

Bill.

On 20 Dec, 18:46, Bill Hart <goodwillh...@googlemail.com> wrote:
> I've redone both graphs since the generic NTL profiling code in FLINT
> was assuming that a length n polynomial had n+1 coefficients (my
> student had done this due to some confusion about the way coefficients
> were counted).
>
> The graph for polynomials with two large factors (at least):
>
> http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/fact...
>
> and the graph for generic polynomials:
>
> http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/fact...

William Stein

unread,
Dec 20, 2007, 3:19:22 PM12/20/07
to sage-...@googlegroups.com
On Dec 20, 2007 1:13 PM, Bill Hart <goodwi...@googlemail.com> wrote:
>
> For comparison, here is our favourite non-free closed source package
> vs Pari and NTL for factoring with two large factors:
>
> Magma vs Pari:
>
> http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/magpari-factor.png
>
> Magma vs NTL:
>
> http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/magNTL-factor.png
>
> Something that may not be fair here is that Magma factors the content,
> so I have divided the polynomials used by their content before
> factoring them. But I didn't include the time taken to divide by the
> content in the timing. Given that computing the content should be
> almost instant generically, I don't see this as much of a problem
> though.

Conclusion: Magma completely totally devastates both PARI and NTL,
except for some
small cases? That's what the plots seem to say.

I now really wonder how Maple and Mathematica do in comparison.

William

Bill Hart

unread,
Dec 20, 2007, 3:28:13 PM12/20/07
to sage-devel
And here we go for generic polynomials:

Magma vs Pari:

http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/magpari-factor-gen.png

Magma vs NTL:

http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/magNTL-factor-gen.png

Bill.

On 20 Dec, 20:13, Bill Hart <goodwillh...@googlemail.com> wrote:
> For comparison, here is our favourite non-free closed source package
> vs Pari and NTL for factoring with two large factors:
>
> Magma vs Pari:
>
> http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/magp...
>
> Magma vs NTL:
>
> http://sage.math.washington.edu/home/wbhart/flint-trunk/graphing/magN...

Bill Hart

unread,
Dec 20, 2007, 4:26:30 PM12/20/07
to sage-devel
I fixed the four magma graphs above so that they take into account the
time to compute and divide by the content, since Pari and NTL
certainly do that.

The result is just about the same. Magma is significantly faster for
longer polynomials, usually a factor of about 2.5, though it varies a
bit.

Bill.

On 20 Dec, 20:28, Bill Hart <goodwillh...@googlemail.com> wrote:
> And here we go for generic polynomials:
>
> Magma vs Pari:
>

Bill Hart

unread,
Dec 20, 2007, 7:53:45 PM12/20/07
to sage-devel
Taking a look at the actual data from the timings I did, the time
taken to factor varies much more in Magma than in NTL. Here are the
Magma timings for length 341 polynomials:

341 1 1.020e+06 1.723e+06
341 2 9.600e+05 3.090e+06
341 3 1.937e+06 3.177e+06
341 4 8.900e+05 2.973e+06
341 5 2.200e+06 3.203e+06
341 6 1.120e+06 2.352e+06
341 7 9.200e+05 3.170e+06
341 8 1.075e+06 2.507e+06
341 10 1.167e+06 3.463e+06
341 12 1.383e+06 2.433e+06
341 15 2.177e+06 3.175e+06
341 18 1.173e+06 2.480e+06
341 22 9.500e+05 2.583e+06
341 26 1.233e+06 3.257e+06

Here are the times for NTL:

341 1 3.444e+06 3.740e+06
341 2 3.124e+06 3.634e+06
341 3 3.122e+06 3.707e+06
341 4 3.397e+06 4.052e+06
341 5 3.563e+06 3.797e+06
341 6 3.348e+06 4.000e+06
341 7 3.130e+06 3.895e+06
341 8 3.367e+06 3.684e+06
341 10 3.611e+06 4.001e+06
341 12 3.232e+06 3.776e+06
341 15 3.536e+06 3.978e+06
341 18 3.192e+06 3.744e+06
341 22 3.409e+06 3.833e+06
341 26 3.418e+06 3.787e+06

As you can see, the NTL timings differ between the best and worst
times by about 10-15%, however the Magma timings differ from each
other by a factor of up to 2 or 3.

I believe this is because of small factors being removed at various
stages due to some heuristics developed by Allan Steel as mentioned in
the Magma documentation. If you look at the maximum times taken in
each case, NTL is not that far behind NTL.

Bill.
Reply all
Reply to author
Forward
0 new messages