about karatsuba multiplication for polynomials over generic rings

luisfe

unread,

Jan 14, 2011, 8:04:48 AM1/14/11

to sage-devel

I have rewritten the karatsuba algorithm for
Polynomial_generic_dense_field. The code needs some cleaning, but it
is already usable at #10255

My primary personal motivation is that, for number fields as base
rings, karatsuba performs worse than the generic multiplication. For
this concrete problem is probably much better to create a custom class
(ticket #10591), but the patch at #10255 already introduces an
enhancement for several base rings that currently use
Polynomial_generic_dense_field code such as:

ZZ[I][x]: x200 faster
ZZ[I,sqrt(2)][x]: x30 faster
QuaternionAlgebra(1,1)[x]: x2 faster
SR[x]: x2 faster
PolynomialRing(MatrixSpace(QQ,2),'x'): x1.45 faster

Comparing the new code with the previous one:

- It has less function calls, less slicing of lists and less `if`
branches (this has a low impact).
- It does not perform plenty of unnecessary additions of python 0 +
ModuleElement (this has a very high impact over some base rings, like
QQ[I] and ZZ[I] due to slow coercion of python 0).
- User has ring-wise control over the karatsuba threshold to fall back
to the school method, this can speed up things in some cases. By
default the threshold 0, no school method at all, the same as the
previous implementation. But this is customizable using
PolynomialRing.set_karatsuba_thresold. This is quite ring and input
dependent and may introduce a performance penalty, so use this at your
own risk!
- The case of different sized inputs is treated differently. for
polynomials f and g of degree m-1 and n-1. The previous implementation
used as split point min(n,m)/2. I feel that it is inefficient in many
cases:

{{{
sage: K=ZZ[x]
sage: f=K.random_element(4000)
sage: g=K.random_element(2000)
sage: h=K.random_element(1500)
sage: %time _= f._mul_generic(g)
CPU times: user 0.90 s, sys: 0.00 s, total: 0.90 s
Wall time: 0.90 s
sage: %time _= f._mul_karatsuba(g)
CPU times: user 2.65 s, sys: 0.00 s, total: 2.66 s
Wall time: 2.66 s
sage: %time _= f._mul_karatsuba(h)
CPU times: user 3.41 s, sys: 0.00 s, total: 3.41 s
Wall time: 3.42 s
}}}

The new code, for f is of degree m-1 and g of degree n-1 with m<n,
consider g as:

g0 + x^m*g1 + x^(2*m)g2 + ... + x^(q*m)*gq

Compute the products f*gi recursively with karatsuba and reconstruct
f*g.
I took the idea from some documentation of an old implementation of
gmp. This is still not optimal in all cases, for instance, the pair of
degrees (6700,6000) is slightly slower than the pair of degrees
(6700,6700).

Does anyone knows any alternative that is easy to implement to
Karatsuba for different degree polynomials?

With the new code

{{{
sage: sage: K=ZZ[x]
sage: sage: f=K.random_element(4000)
sage: sage: g=K.random_element(2000)
sage: sage: h=K.random_element(1500)
sage: sage: %time _= f._mul_generic(g)
CPU times: user 0.90 s, sys: 0.00 s, total: 0.90 s
Wall time: 0.91 s
sage: sage: %time _= f._mul_karatsuba(g)
CPU times: user 0.24 s, sys: 0.00 s, total: 0.25 s
Wall time: 0.25 s
sage: sage: %time _= f._mul_karatsuba(h)
CPU times: user 0.26 s, sys: 0.00 s, total: 0.26 s
Wall time: 0.26 s
}}}

Do you think that it is a good idea to include this code in the sage
library?

Any comment and/or suggestion for improvement is welcomed.

Bests

Luis

rjf

unread,

Jan 14, 2011, 1:07:09 PM1/14/11

to sage-devel

For a discussion of practical fast polynomial multiplication,
see
http://www.eecs.berkeley.edu/~fateman/papers/dumbisfast.pdf
and also the first reference in that paper.
(As well as other references).
The code in GMP is likely to be well thought out.

You are probably making a major mistake setting the cutoff at
zero, if you care about polynomials of only modest degree
(like 50), and are doing many such operations.

If you are doing operations (or timing operations) solely
on polynomials with 10,000 coefficients, you should consider
if your base field supports some kind of discrete fast-Fourier
transform.

If the only reason you are doing this is speed, the difference
between 3.4 seconds and 2.6 seconds is hardly worth a week's
programming effort. If it is to establish a really good practical
algorithm, you have to do some reading.

Why use karatsuba? If m is small, use whatever is fastest.
If the sizes differ so substantially,use another algorithm entirely.

If you use an FFT-based algorithm, you can compute a transform of f
just once.
If f is small enough, the "schoolboy" algorithm is faster.
Also, there are many alternatives between karatsuba and fft (Toom or
Cook-Toom)
algorithms.

> I took the idea from some documentation of an old implementation of
> gmp.

Why not look at the latest GMP?

luisfe

unread,

Jan 16, 2011, 2:08:47 PM1/16/11

to sage-devel

On Jan 14, 7:07 pm, rjf <fate...@gmail.com> wrote:
> For a discussion of practical fast polynomial multiplication,

> seehttp://www.eecs.berkeley.edu/~fateman/papers/dumbisfast.pdf

> and also the first reference in that paper.
> (As well as other references).
> The code in GMP is likely to be well thought out.

Thanks, sparse detection and multiplication in this case is certainly
a pending subject.

> You are probably making a major mistake setting the cutoff at
> zero, if you care about polynomials of only modest degree
> (like 50), and are doing many such operations.

You are probably right, the number of base rings that will not gain a
benefit from a nonzero threshold is quite small. The best would be to
identify most of such rings and act accordingly.

> If you are doing operations (or timing operations) solely
> on polynomials with 10,000 coefficients, you should consider
> if your base field supports some kind of discrete fast-Fourier
> transform.

I have an implementation of a 2-radix FFT as a proof of concept for
characteristic zero fields, but it starts to be better at around
degree 8000, so it is not around my priorities. I know Cook-Toom but I
have never tried to implement them

> If the only reason you are doing this is speed, the difference
> between 3.4 seconds and 2.6 seconds is hardly worth a week's
> programming effort. If it is to establish a really good practical
> algorithm, you have to do some reading.

You misread the numbers, the gain is from 3.4 seconds to 0.2. ten
times faster with my code.
But what I wanted to point out is that, for a pair of dense
polynomials of degrees 4000, 1500, current karatsuba code is three
times as slow as schoolbook method. This is not acceptable.

> > The new code, for f is of degree m-1 and g of degree n-1 with m<n,
> > consider g as:
>
> > g0 + x^m*g1 + x^(2*m)g2 + ... + x^(q*m)*gq
>
> > Compute the products f*gi recursively with karatsuba and reconstruct
> > f*g.
>
> Why use karatsuba? If m is small, use whatever is fastest.
> If the sizes differ so substantially,use another algorithm entirely.

In this case, if the size of m is below the threshold, it would use
schoolbook method. If two polynomials of size m are in the range of
karatsuba method, the number of operations of this approach is of
order O (m^0.59 n), grows linearly in n. Looks acceptable to me.

> If you use an FFT-based algorithm, you can compute a transform of f
> just once.

This is a good idea.

> If f is small enough, the "schoolboy" algorithm is faster.
> Also, there are many alternatives between karatsuba and fft (Toom or
> Cook-Toom)
> algorithms.
>
> > I took the idea from some documentation of an old implementation of
> > gmp.
>
> Why not look at the latest GMP?

Yes, I have to look at it.

Bests

Luis

Reply all

Reply to author

Forward