Matrix multiplication

136 views
Skip to first unread message

Simon King

unread,
Jul 12, 2011, 12:09:42 PM7/12/11
to sage-devel
Hi!

Working at #11589, I revisited the almost-forgotten ticket #8096 (no
activity since 16 months), which aims at speeding up matrix
multiplication.

#10763 (already merged) and #11589 (positive review) provide some
improvement. However, using the defaults in Sage, the multiplication
of two random dense 2000x2000 matrices over GF(3) still takes 8.34
seconds (it used to be 12 seconds).
That does not compare nicely with my tuned (i.e. unpublished) version
of C-MeatAxe, which completes the same task in 1.72 seconds.

So, obviously it makes sense to try and improve Sage's matrix
multiplication. That may actually be rather easy, because linbox can
do the above multiplication in 1.65 seconds (using _multiply_linbox).

Is there a compelling reason why linbox multiplication is not the
default for dense matrices over finite fields?

Best regards,
Simon

Martin Albrecht

unread,
Jul 12, 2011, 12:26:22 PM7/12/11
to sage-...@googlegroups.com
> Is there a compelling reason why linbox multiplication is not the
> default for dense matrices over finite fields?

IIRC, this dates back to the time when we didn't ship ATLAS, i.e. LinBox could
have been built against some reference implementation BLAS (== slow).

It wasn't touched afterwards because people tried to move Matrix_modn_dense
over to LinBox as the native format (again, IIRC) and these projects stalled.

As far as I can tell the only downside of switching to LinBox is the increased
memory consumption.

Cheers,
Martin

--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
_www: http://martinralbrecht.wordpress.com/
_jab: martinr...@jabber.ccc.de

Volker Braun

unread,
Jul 12, 2011, 1:10:40 PM7/12/11
to sage-...@googlegroups.com
Does linbox actually use blas for fast matrix multiplication over finite fields? I know its hard to tell from the linbox source code but I don't think floating point stuff could be faster than the direct application of Strassen's algorithm.

Martin Albrecht

unread,
Jul 12, 2011, 1:48:08 PM7/12/11
to sage-...@googlegroups.com

Yes. It uses FFLAS/FFPACK for dense matrices over GF(p). Which in top of the
numerical routines it implements Strassen:

http://ljk.imag.fr/membres/Jean-Guillaume.Dumas/FFLAS/

As far as I know it's the fastest implementation for medium size primes (16
bits or so). For very small primes one can do better

1) by bit-slicing:

http://arxiv.org/abs/0901.1413

or 2) bit-packing:

http://wiki.sagemath.org/days10/JGDumasTalk
http://wiki.sagemath.org/days10?action=AttachFile&do=view&target=Dumas_SD10_talk.pdf

Btw. the main author of FFLAS/FFPACK Clément Pernet was the first Sage Postdoc
at UW.

It's a shame we are not using the libraries we ship to full potential but so
far no one stepped up (me included) to fix this situation.

Simon King

unread,
Jul 12, 2011, 2:57:06 PM7/12/11
to sage-devel
Hi Martin,

On 12 Jul., 19:48, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> As far as I know it's the fastest implementation for medium size primes (16
> bits or so). For very small primes one can do better

But actually linbox is already much faster over GF(3) than what is
currently done in Sage.

> It's a shame we are not using the libraries we ship to full potential but so
> far no one stepped up (me included) to fix this situation.

OK, but how much work would be needed? It appears to me that there is
only very little overhead in calling the linbox routines (via
sage.libs.linbox.linbox), since the data of a Matrix_modn_dense is
already stored in the same format used by linbox (namely mod_int **).
In particular, I wounder about your statement that the memory
consumption would increase. Is that because of internal linbox
computations?

Here is how linbox is wrapped (see
sage.matrix.matrix_modn_dense.Matrix_modn_dense._multiply_linbox):
...
self._init_linbox()
sig_on()
linbox.matrix_matrix_multiply(ans._matrix, B._matrix,
B._nrows, B._ncols)
sig_off()

self._init_linbox() seems rather light-weight, and
linbox.matrix_multiply is also a very thin wrapper of the linbox
routines. Similarly, one proceeds with other routines (computing
echelon form and so on)

Actually I think it would even be easy enough to call the linbox
routines directly (without using the sage.libs.linbox.linbox wrapper,
thus avoiding all overhead). Since I need fast matrix arithmetics over
finite fields, I have some motivation to rewrite _mul_ accordingly.

But one question: Can linbox deal with non-prime finite fields?

Cheers,
Simon

Martin Albrecht

unread,
Jul 12, 2011, 3:09:18 PM7/12/11
to sage-...@googlegroups.com, Clement Pernet, Burcin Erocal
Hi Simon,

> wrote:
> > As far as I know it's the fastest implementation for medium size primes
> > (16 bits or so). For very small primes one can do better
>
> But actually linbox is already much faster over GF(3) than what is
> currently done in Sage.

Yep, it was just meant as a statement about the viability of the chosen
strategy (using ATLAS)

> > It's a shame we are not using the libraries we ship to full potential but
> > so far no one stepped up (me included) to fix this situation.
>
> OK, but how much work would be needed?

It depends on how far you want to go:

1) Only switch multiplication over to LinBox, that's probably easy to do

2) Switch everything to LinBox of FFLAS/FFPACK directly. This probably needs
some hacking around.

> It appears to me that there is
> only very little overhead in calling the linbox routines (via
> sage.libs.linbox.linbox), since the data of a Matrix_modn_dense is
> already stored in the same format used by linbox (namely mod_int **).
> In particular, I wounder about your statement that the memory
> consumption would increase. Is that because of internal linbox
> computations?

> Here is how linbox is wrapped (see
> sage.matrix.matrix_modn_dense.Matrix_modn_dense._multiply_linbox):
> ...
> self._init_linbox()
> sig_on()
> linbox.matrix_matrix_multiply(ans._matrix, B._matrix,
> B._nrows, B._ncols)
> sig_off()
>
> self._init_linbox() seems rather light-weight, and
> linbox.matrix_multiply is also a very thin wrapper of the linbox
> routines. Similarly, one proceeds with other routines (computing
> echelon form and so on)

Actually, this is how we are wrapping our wrapper for LinBox which is part of
the LinBox SPKG. This is where the actual heavy lifting is done such as
converting ints to doubles etc. So what you see in Sage gives you a misleading
impression.

Btw. the reason why this wrapper exists is that LinBox takes so long to
compile and takes so much memory. A harmless doctest change thus triggered a
long compilation. Hence we outsourced it.



> Actually I think it would even be easy enough to call the linbox
> routines directly (without using the sage.libs.linbox.linbox wrapper,
> thus avoiding all overhead). Since I need fast matrix arithmetics over
> finite fields, I have some motivation to rewrite _mul_ accordingly.

If one is serious about this then one should probably use the FFLAS/FFPACK
low-level routines directly (Burcin might have some old patches for this, he's
the last person I know of who tried) since that avoids compiling all thee
layers of C++ in LinBox.

> But one question: Can linbox deal with non-prime finite fields?

They had some experimental code at some point IIRC but nothing for general
use.

Simon King

unread,
Jul 12, 2011, 5:12:18 PM7/12/11
to sage-devel
Hi Martin,

On 12 Jul., 21:09, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> If one is serious about this then one should probably use the FFLAS/FFPACK
> low-level routines directly (Burcin might have some old patches for this, he's
> the last person I know of who tried) since that avoids compiling all thee
> layers of C++ in LinBox.

That would probably be beyond my scope.

> > But one question: Can linbox deal with non-prime finite fields?
>
> They had some experimental code at some point IIRC but nothing for general
> use.

What a pity - I need non-prime fields as well.

But perhaps it is feasible to create a sub-class Matrix_modp_dense of
Matrix_modn_dense for matrices over finite prime fields -- overriding
_mul_ etc. to use the functions from sage.libs.linbox.linbox. In that
way, one could benefit from the LinBox speed at least to some extent.
I suppose that fits into the scope of #8096.

Best regards,
Simon

Burcin Erocal

unread,
Jul 12, 2011, 5:35:51 PM7/12/11
to sage-...@googlegroups.com
On Tue, 12 Jul 2011 14:12:18 -0700 (PDT)
Simon King <simon...@uni-jena.de> wrote:

> On 12 Jul., 21:09, Martin Albrecht <martinralbre...@googlemail.com>
> wrote:
> > If one is serious about this then one should probably use the
> > FFLAS/FFPACK low-level routines directly (Burcin might have some
> > old patches for this, he's the last person I know of who tried)
> > since that avoids compiling all thee layers of C++ in LinBox.
>
> That would probably be beyond my scope.

At SD16, Clement and I looked into wrapping linbox directly. The speeds
were promising, but we never took the time to fix pickling, etc.
The patch is most likely unusable by now:

http://sage.math.washington.edu/home/burcin/dense_ctypes.patch

I can quickly redo the work to create a template that handles the short
and long ints as special cases if there is interest. The linbox calls
in that patch can also be much better now that Cython has C++ support.


Cheers,
Burcin

Simon King

unread,
Jul 13, 2011, 5:48:08 AM7/13/11
to sage-devel
Hi!

On 12 Jul., 23:35, Burcin Erocal <bur...@erocal.org> wrote:
> On Tue, 12 Jul 2011 14:12:18 -0700 (PDT)
> At SD16, Clement and I looked into wrapping linbox directly. The speeds
> were promising, but we never took the time to fix pickling, etc.


I, for one, would appreciate fast matrix multiplication over finite
fields!

However, there is one complication: In my current project (computing
Ext-Algebras of basic algebras), I definitely need finite non-prime
fields, and the current implementation in Sage for that case is
definitely too slow. I just did an example of multiplying two 500x500
matrices over GF(125). Sage took 11.24 seconds, MeatAxe (with some
improvements that I implemented yesterday) took 0.2 seconds.

If LinBox really can not "officially" deal with non-prime fields, I
should probably stick with MeatAxe then (even though, in the small
version, it only supports fields of size <256).

My current MeatAxe wrapper (in my optional cohomology spkg) is based
on an old MeatAxe version (2.2.4). Perhaps I could take the most
recent version (2.4.13), modify my wrapper accordingly, and contribute
it to Sage. The MeatAxe licence is good enough for Sage, isn't it? But
is MeatAxe eligible to become a standard package? Currently, there
only is an experimental meataxe-2.4.3.spkg.

Correct me if the following is nonsense:
- The MeatAxe sources should go into an spkg. Question: How should I
deal with my modifications to the original MeatAxe sources? Do I
remember correctly that the spkg should contain the unmodified
sources, together with some automatically applied patches that are
stored in a sub-directory patches/?
- The spkg should create a library mtx.a, that should go into, say,
SAGE_ROOT/local/lib/mtx-2.4.13/, so that my wrapper can easily link
against it.

Best regards,
Simon

Martin Albrecht

unread,
Jul 13, 2011, 6:11:06 AM7/13/11
to sage-...@googlegroups.com
Hi,

> I, for one, would appreciate fast matrix multiplication over finite
> fields!
>
> However, there is one complication: In my current project (computing
> Ext-Algebras of basic algebras), I definitely need finite non-prime
> fields, and the current implementation in Sage for that case is
> definitely too slow. I just did an example of multiplying two 500x500
> matrices over GF(125). Sage took 11.24 seconds, MeatAxe (with some
> improvements that I implemented yesterday) took 0.2 seconds.

Do you know Tom Boothby's and Robert Bradshaw's excellent paper

Bitslicing and the Method of Four Russians Over Larger Finite Fields

http://arxiv.org/abs/0901.1413

I just took a naive version of that for a spin to see how well it compares:

# We represent matrices over polynomials as polynomials over matrices

sage: P.<A2,A1,A0,B2,B1,B0> = PolynomialRing(GF(5))
sage: R.<x> = P[]
sage: Q.<x> = R.quo(x^3 + 3*x + 3)

sage: f = (A2*x^2+A1*x+A0)
sage: g = (A2*x^2+A1*x+A0)

# so this is what we need to compute:

sage: f*g
(2*A2^2 + A1^2 + 2*A2*A0)*x^2 + (2*A2^2 - A2*A1 + 2*A1*A0)*x - A2*A1 + A0^2

# here, one would actually apply a Karatsuba to get the number of mults down,
# but I was too lazy and only did this (fingers crossed I didn't introduce a
# bug)

sage: def my_mul(A2,A1,A0, B2,B1,B0):
... A22 = A2._multiply_linbox(A2)
... A2A1 = A2._multiply_linbox(A1)
... A22_2 = 2*A22
... C2 = (A22_2 + A1._multiply_linbox(A1) + 2*A2._multiply_linbox(A0))
... C1 = (A22_2 - A2A1 + 2*A1._multiply_linbox(A0))
... C0 = - A2A1 + A0._multiply_linbox(A2)
... return C2,C1,C0

sage: A = [random_matrix(GF(5),500,500) for _ in range(3)]
sage: B = [random_matrix(GF(5),500,500) for _ in range(3)]
sage: %time _ =my_mul(*(A+B))
CPU time: 0.22 s, Wall time: 0.22 s

sage: A = [random_matrix(GF(5),2000,2000) for _ in range(3)]
sage: B = [random_matrix(GF(5),2000,2000) for _ in range(3)]
sage: %time _ =my_mul(*(A+B))
CPU time: 10.23 s, Wall time: 10.57 s

So even with all the Python overhead (and modulo bugs which I may have
introduced) this approach seems to be comparable to whatever Meataxe is doing.
I'd actually guess it gets faster than MeatAxe as dimensions grow since AFAIK
MeatAxe doesn't do matrix multiplication asymptotically fast?

> If LinBox really can not "officially" deal with non-prime fields, I
> should probably stick with MeatAxe then (even though, in the small
> version, it only supports fields of size <256).
>
> My current MeatAxe wrapper (in my optional cohomology spkg) is based
> on an old MeatAxe version (2.2.4). Perhaps I could take the most
> recent version (2.4.13), modify my wrapper accordingly, and contribute
> it to Sage. The MeatAxe licence is good enough for Sage, isn't it? But
> is MeatAxe eligible to become a standard package? Currently, there
> only is an experimental meataxe-2.4.3.spkg.
>
> Correct me if the following is nonsense:
> - The MeatAxe sources should go into an spkg. Question: How should I
> deal with my modifications to the original MeatAxe sources? Do I
> remember correctly that the spkg should contain the unmodified
> sources, together with some automatically applied patches that are
> stored in a sub-directory patches/?
> - The spkg should create a library mtx.a, that should go into, say,
> SAGE_ROOT/local/lib/mtx-2.4.13/, so that my wrapper can easily link
> against it.
>
> Best regards,
> Simon

Cheers,

Simon King

unread,
Jul 13, 2011, 7:23:47 AM7/13/11
to sage-devel
Hi Martin,

On 13 Jul., 12:11, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> Do you know Tom Boothby's and Robert Bradshaw's excellent paper
>
>    Bitslicing and the Method of Four Russians Over Larger Finite Fields
>
>    http://arxiv.org/abs/0901.1413

Thank you for your hint!

> sage: P.<A2,A1,A0,B2,B1,B0> = PolynomialRing(GF(5))
> sage: R.<x> = P[]
> sage: Q.<x> = R.quo(x^3 + 3*x + 3)
>
> sage: f = (A2*x^2+A1*x+A0)
> sage: g = (A2*x^2+A1*x+A0)
>
> # so this is what we need to compute:
>
> sage: f*g
> (2*A2^2 + A1^2 + 2*A2*A0)*x^2 + (2*A2^2 - A2*A1 + 2*A1*A0)*x - A2*A1 + A0^2

That is the same as f*f. Did you intend to do
sage: g = (B2*x^2+B1*x+B0)
sage: f*g
(2*A2*B2 + A0*B2 + A1*B1 + A2*B0)*x^2 + (2*A2*B2 + 2*A1*B2 + 2*A2*B1 +
A0*B1 + A1*B0)*x + 2*A1*B2 + 2*A2*B1 + A0*B0
?

> sage: def my_mul(A2,A1,A0, B2,B1,B0):
> ...       A22 = A2._multiply_linbox(A2)
> ...       A2A1 = A2._multiply_linbox(A1)
> ...       A22_2 = 2*A22
> ...       C2 = (A22_2 + A1._multiply_linbox(A1) + 2*A2._multiply_linbox(A0))
> ...       C1 = (A22_2 - A2A1 + 2*A1._multiply_linbox(A0))
> ...       C0 = - A2A1 + A0._multiply_linbox(A2)
> ...       return C2,C1,C0

B2,B1,B0 are not used, it just computes the square of the first
argument.

f*g from above yields
sage: def my_mul(A2,A1,A0, B2,B1,B0):
....: A0B0 = A0._multiply_linbox(B0)
....: A0B1 = A0._multiply_linbox(B1)
....: A0B2 = A0._multiply_linbox(B2)
....: A1B0 = A1._multiply_linbox(B0)
....: A1B1 = A1._multiply_linbox(B1)
....: A1B2 = A1._multiply_linbox(B2)*2
....: A2B0 = A2._multiply_linbox(B0)
....: A2B1 = A2._multiply_linbox(B1)*2
....: A2B2 = A2._multiply_linbox(B2)*2
....: C2 = A2B2+A0B2+A1B1+A2B0
....: C1 = A2B2+A1B2+A2B1+A0B1+A1B0
....: C0 = A1B2+A2B1+A0B0
....: return C2,C1,C0
....:
sage: A = [random_matrix(GF(5),500,500) for _ in range(3)]
sage: B = [random_matrix(GF(5),500,500) for _ in range(3)]
sage: %time _ =my_mul(*(A+B))
CPU times: user 0.42 s, sys: 0.05 s, total: 0.47 s
Wall time: 0.48 s
sage: A = [random_matrix(GF(5),2000,2000) for _ in range(3)]
sage: B = [random_matrix(GF(5),2000,2000) for _ in range(3)]
sage: %time _ =my_mul(*(A+B))
CPU times: user 15.08 s, sys: 0.39 s, total: 15.46 s
Wall time: 15.51 s

It is a bit slower than MeatAxe (for 2000x2000 over GF(125), it needs
11.68 seconds on my computer), but I think that's not bad for the
first attempt.

Best regards,
Simon

Martin Albrecht

unread,
Jul 13, 2011, 7:50:31 AM7/13/11
to sage-...@googlegroups.com
> > sage: P.<A2,A1,A0,B2,B1,B0> = PolynomialRing(GF(5))
> > sage: R.<x> = P[]
> > sage: Q.<x> = R.quo(x^3 + 3*x + 3)
> >
> > sage: f = (A2*x^2+A1*x+A0)
> > sage: g = (A2*x^2+A1*x+A0)
> >
> > # so this is what we need to compute:
> >
> > sage: f*g
> > (2*A2^2 + A1^2 + 2*A2*A0)*x^2 + (2*A2^2 - A2*A1 + 2*A1*A0)*x - A2*A1 +
> > A0^2
>
> That is the same as f*f. Did you intend to do
> sage: g = (B2*x^2+B1*x+B0)
> sage: f*g
> (2*A2*B2 + A0*B2 + A1*B1 + A2*B0)*x^2 + (2*A2*B2 + 2*A1*B2 + 2*A2*B1 +
> A0*B1 + A1*B0)*x + 2*A1*B2 + 2*A2*B1 + A0*B0
> ?

Ouch, yes.

Note that these timings are for the school book multiplication (i.e. 9 = 3^2
multiplications) instead of Karatsuba-style, Python overhead and Sage <=>
LinBox overhead.

It seems one can bring this down to 7 multiplications:

http://bit.ly/r5PVgv

Btw. Magma does it in 4.37 seconds on my computer, to get a sense of what is
the state-of-the-art.

Simon King

unread,
Jul 13, 2011, 8:40:27 AM7/13/11
to sage-devel
Hi Martin,

On 13 Jul., 13:50, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> Note that these timings are for the school book multiplication (i.e. 9 = 3^2
> multiplications) instead of Karatsuba-style, Python overhead and Sage <=>
> LinBox overhead.

Sure. As you said in your previous post, that implementation is the
most naive, given the basic idea.

> Btw. Magma does it in 4.37 seconds on my computer, to get a sense of what is
> the state-of-the-art.

You mean for 2000x2000 over GF(125)? So, it is a long way to go, even
for MeatAxe. BTW, Sage's current implementation needs 906.69 s for a
single multiplication.


If I understand correctly, the basic idea can be easily automatised:
- The given base field F (=GF(125), for instance) determines a
(Conway) polynomial p of degree d.
- One starts with a polynomial ring R over the corresponding prime
field F0 and 2*d variables, constructs the polynomial ring in x over
R, and mods out p. The result is a quotient ring Q.
- Multiplication of two "general" elements of Q, represented by degree-
(d-1) polynomials whose coefficients are the 2*d variables from R,
results in a d-tuple T of elements of R.
- Matrices over F are represented by d-tuples of matrices over F0, and
multiplication is obtained by inserting the elements of these d-tuples
into the elements of T.

But is it possible to *automatically* find a Karatsuba- or Toom-Cook-
type formula that reduces the number of products that appear in T?

An example from the paper: Over GF(4), the multiplication of A=A0+xA1
with B=B0+xB1 (using p=x^2+x+1) would naively be
(A0*B0+A1*B1) + x*(A1*B1+A1*B0+A0*B1)
But using Karatsuba, it becomes
(A0*B0+A1*B1) + x*((A0+A1)*(B0+B1)+A0*B0),
which saves one product.

Is there an automated procedure that finds the second formula, given
the first? If not, how could one do better than with schoolboy
multiplication, in the general case?

Best regards,
Simon

Martin Albrecht

unread,
Jul 13, 2011, 9:03:09 AM7/13/11
to sage-...@googlegroups.com
Hi,

> > Btw. Magma does it in 4.37 seconds on my computer, to get a sense of what
> > is the state-of-the-art.
>
> You mean for 2000x2000 over GF(125)? So, it is a long way to go, even
> for MeatAxe.

Yes, and it should get worse as you increase the degree or did they implement
Strassen in MeatAxe by now?

> BTW, Sage's current implementation needs 906.69 s for a
> single multiplication.

We suck at this it seems :)

> If I understand correctly, the basic idea can be easily automatised:
> - The given base field F (=GF(125), for instance) determines a
> (Conway) polynomial p of degree d.
> - One starts with a polynomial ring R over the corresponding prime
> field F0 and 2*d variables, constructs the polynomial ring in x over
> R, and mods out p. The result is a quotient ring Q.
> - Multiplication of two "general" elements of Q, represented by degree-
> (d-1) polynomials whose coefficients are the 2*d variables from R,
> results in a d-tuple T of elements of R.
> - Matrices over F are represented by d-tuples of matrices over F0, and
> multiplication is obtained by inserting the elements of these d-tuples
> into the elements of T.

Yep & it would be much faster already than what we have currently.

> But is it possible to *automatically* find a Karatsuba- or Toom-Cook-
> type formula that reduces the number of products that appear in T?
>
> An example from the paper: Over GF(4), the multiplication of A=A0+xA1
> with B=B0+xB1 (using p=x^2+x+1) would naively be
> (A0*B0+A1*B1) + x*(A1*B1+A1*B0+A0*B1)
> But using Karatsuba, it becomes
> (A0*B0+A1*B1) + x*((A0+A1)*(B0+B1)+A0*B0),
> which saves one product.
>
> Is there an automated procedure that finds the second formula, given
> the first? If not, how could one do better than with schoolboy
> multiplication, in the general case?

Well, the paper does search for Karatsuba-style formulas automatically, but
it's a non-trivial (brute-force-ish) computation. So far, there does not seem
to be anything better than this. But other people would know better than me.

Simon King

unread,
Jul 13, 2011, 9:11:03 AM7/13/11
to sage-devel
Hi Martin,

On 13 Jul., 15:03, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> > BTW, Sage's current implementation needs 906.69 s for a
> > single multiplication.
>
> We suck at this it seems :)

Totally.

According to the Boothby-Bradshaw article you were referring to, their
ideas were supposed to be implemented in Sage. Is that done, yet? I
wouldn't like to double the work...

Best regards,
Simon

Clement Pernet

unread,
Jul 13, 2011, 9:25:25 AM7/13/11
to sage-devel
Hi,

Most questions about finite fields matmul have been answered, in
short:
- one could quickly fix the pb that sage does not use linbox by
default (I'm surprised by this fact)
- in the current state, operands are converted from ints to double,
hence a overhead in both time
and memory.
- Burcin & I started to address this by rewriting the matrix-modn-
dense class using floating point
representation: a prototype code was working but we wen't into
troubles when polishing it and making
it work with the coercion system. Unfortunately I never found time to
get back to it since then.
- Further timing improvements are expected by using floats instead of
doubles for tiny fields (eg
roughly GF(p<100)) with no pain (already available in LinBox)
- Non prime fields are supported in LinBox, as long as their
cardinality remains small (LinBox
implementation has to be improved following http://hal.archives-ouvertes.fr/hal-00315772
Theorem 2)
but the wrapping in sage is missing. So I think it might probably
be simpler to wrapp it rather
than creating a new spkg with MeatAxe.

Clément

PS: weird, my reply by email has been rejected and I'm now posting via
the web interface.

Simon King

unread,
Jul 13, 2011, 10:00:34 AM7/13/11
to sage-devel
Hi!

On the one hand, I need multiplication of square matrices over finite
non-prime fields. From the things readily available in Sage (+ my
MeatAxe wrapper), MeatAxe is best, the native Sage implementation
sucks, and the trick described in the Boothby-Bradshaw paper (using
LinBox for the prime field under the hood) is almost as good as
MeatAxe even in naive implementation.

On the other hand, I need multiplication of square matrices with
vectors, again over non-prime fields. Here, Meataxe is clearly best.
Over GF(125), multiplication of a 2000x2000 matrix with a 2000x1
matrix takes 24.6 ms, native Sage needs 327 ms, and the Boothby-
Bradshaw trick needs 109 ms. The problem is that LinBox appears to be
rather slow in multiplying a 2000x2000 matrix with a 2000x1 matrix
over GF(5) -- slower than native Sage, actually.

But I suppose that there is a special LinBox function for
multiplication with a vector. How is that function called? Is it
wrapped?

Things are getting confusing. I need a break.

Best regards,
Simon

Simon King

unread,
Jul 13, 2011, 12:29:19 PM7/13/11
to sage-devel
Hi Clement,

On 13 Jul., 15:25, Clement Pernet <clement.per...@gmail.com> wrote:
> ...
>   but the wrapping in sage is missing. So I think it might  probably
> be simpler to wrapp it rather
> than creating a new spkg with MeatAxe.

I am not so sure if that would be easier FOR ME. If I understand
correctly, finishing the LinBox wrapper requires C++ knowledge, which
I don't really have. But I do have a wrapper for an old (but tuned)
version of MeatAxe, and I do have a wrapper for a more recent (but
untuned) version of MeatAxe. The former is part of my optional
cohomology-spkg.

Currently, it seems that in that way I would get most easily a good
performance for the functions that I need:
1. Matrix times Matrix:
MeatAxe is only slightly slower over prime fields than LinBox.
Over non-prime fields, it is faster than a naive implementation of
the Boothby-Bradshaw trick.

2. Matrix times column:
Over prime fields, the native Sage implementation is clearly
fastest, MeatAxe is slower, LinBox is VERY slow.
Over non-prime fields, MeatAxe is fastest, the naive implementation
of the "trick" is 5 times slower.

3. Row times matrix:
MeatAxe is clearly fastest both over prime and over non-prime
fields -- unless someone has already wrapped a specialised
multiplication command of LinBox??

What I mainly need is 1. and 3., both over prime and over non-prime
fields. It seems to me that extending my MeatAxe wrapper is (for me)
easier than extending the LinBox wrapper, and gives reasonably good
performance. I'll go for it.

QUESTION:
According to their web site, "The MeatAxe should run on most UNIX
platforms including Linux, NetBSD, Ultrix, Solaris, HP-UX, and on
Windows NT." So, a MeatAxe wrapper might actually be (eventually) a
candidate for a standard package, isn't it?
OTOH, the current MeatAxe licence is GPL 2. Does that disqualify
MeatAxe? Would GPL 2+ be required?

> PS: weird, my reply by email has been rejected and I'm now posting via
> the web interface.

I do that all the time...

Cheers,
Simon

William Stein

unread,
Jul 13, 2011, 11:55:41 PM7/13/11
to sage-...@googlegroups.com
On Wed, Jul 13, 2011 at 6:29 PM, Simon King <simon...@uni-jena.de> wrote:
> QUESTION:
> According to their web site, "The MeatAxe should run on most UNIX
> platforms including Linux, NetBSD, Ultrix, Solaris, HP-UX, and on
> Windows NT." So, a MeatAxe wrapper might actually be (eventually) a
> candidate for a standard package, isn't it?
> OTOH, the current MeatAxe licence is GPL 2.
> Does that disqualify MeatAxe?

Yes, it disqualifies it.

> Would GPL 2+ be required?

Yes, that would be required.

I think people will have some issues with including an spkg that
replicates functionality available probably in code already in Sage,
with sufficient work. Every single spkg that is added to sage
increases the maintenance work of Sage a lot. Removing spkg's later
is very hard.

-- William

Simon King

unread,
Jul 14, 2011, 3:21:04 AM7/14/11
to sage-devel
Hi William,

On 14 Jul., 05:55, William Stein <wst...@gmail.com> wrote:
> I think people will have some issues with including an spkg that
> replicates functionality available probably in code already in Sage,
> with sufficient work.

From my perspective, your last line is decisive: I would probably not
easily be able to make Sage full use of LinBox - my C++ knowledge is
rather limited, and I know nothing about LinBox. So, in order to use
that for my project, I would have to wait for other people doing the
work.

Moreover, my benchmarks seem to indicate that *for the purpose of my
project* MeatAxe performs relatively well (unfortunately not compared
with Magma, as it seems).

> Every single spkg that is added to sage
> increases the maintenance work of Sage a lot.  Removing spkg's later
> is very hard.

OK. Then I'll probably proceed as I did with my old cohomology spkg:
I'll use MeatAxe, but not as a separate spkg.

Cheers,
Simon

Simon King

unread,
Jul 14, 2011, 3:28:00 AM7/14/11
to sage-devel
PS:

On 14 Jul., 05:55, William Stein <wst...@gmail.com> wrote:
> Every single spkg that is added to sage
> increases the maintenance work of Sage a lot.  Removing spkg's later
> is very hard.

Note that there already is an experimental MeatAxe spkg. It provides
the MeatAxe executables, but does not provide the MeatAxe C-library
(which I would need) or a wrapper.

Volker Braun

unread,
Jul 14, 2011, 4:59:58 AM7/14/11
to sage-...@googlegroups.com
The LinBox matrix-vector multiplication is 

vectorMul (Vector1 &w, const Matrix &A, const Vector2 &v)  // w <- Av
vectorAxpyin (Vector1 &y, const Matrix &A, const Vector2 &x) // y <- y+Ax

I don't see a rowvector-matrix mul routine, but then that must be avoided in C storage order (row-major). 

Martin Albrecht

unread,
Jul 14, 2011, 7:26:43 AM7/14/11
to sage-...@googlegroups.com
> 2. Matrix times column:
> Over prime fields, the native Sage implementation is clearly
> fastest, MeatAxe is slower, LinBox is VERY slow.
> Over non-prime fields, MeatAxe is fastest, the naive implementation
> of the "trick" is 5 times slower.
>
> 3. Row times matrix:
> MeatAxe is clearly fastest both over prime and over non-prime
> fields -- unless someone has already wrapped a specialised
> multiplication command of LinBox??

Note that these operations are quadratic just as the conversion Sage =>
LinBox. Hence, I'm not that surprised that our LinBoX wrapper performs poorly
on those.

Simon King

unread,
Jul 14, 2011, 7:53:16 AM7/14/11
to sage-devel
Hi Martin,

On 14 Jul., 13:26, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> Note that these operations are quadratic just as the conversion Sage =>
> LinBox.

I am a bit surprised. My impression from reading
sage.matrix.matrix_modn_dense was that Sage stores the matrix data in
LinBox format anyway. So, did I get the wrong impression?

Cheers,
Simon

Martin Albrecht

unread,
Jul 14, 2011, 8:10:20 AM7/14/11
to sage-...@googlegroups.com

Hi,

linbox.matrix_matrix_multiply

calls

linbox_modn_dense_matrix_matrix_multiply

from the LinBox SPKG (src/interfaces/sage/linbox.C)

which calls

static DenseMatrix<ModInt> linbox_new_modn_matrix(mod_int modulus, mod_int**
matrix, size_t nrows, si\
ze_t ncols) {

ModInt F((double)modulus);

DenseMatrix<ModInt> A ( F, nrows, ncols);

size_t i, j, k;
for (i=0; i < nrows; i++) {
for (j=0; j < ncols; j++) {
A.setEntry(i, j, (double)matrix[i][j]);
}
}
return A;
};

which copies the matrix to a new matrix A.

Simon King

unread,
Jul 14, 2011, 10:37:33 AM7/14/11
to sage-devel
Hi Martin,

On 14 Jul., 14:10, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> Hi,
>
>    linbox.matrix_matrix_multiply
>
> calls
>
>     linbox_modn_dense_matrix_matrix_multiply
>
> from the LinBox SPKG (src/interfaces/sage/linbox.C)
>
> ...
>
> which copies the matrix to a new matrix A.

I see. So, the plan for a proper wrapper would be to use Linbox's data
format in the first place.

Perhaps it would be best to study http://sage.math.washington.edu/home/burcin/dense_ctypes.patch

If really coercion is the only missing bit then it is worth to give it
a try.

Cheers,
Simon

Simon King

unread,
Jul 14, 2011, 10:40:24 AM7/14/11
to sage-devel
PS:

On 14 Jul., 16:37, Simon King <simon.k...@uni-jena.de> wrote:
> Perhaps it would be best to study http://sage.math.washington.edu/home/burcin/dense_ctypes.patch

Is there a ticket where that patch is dealt with?

Burcin Erocal

unread,
Jul 18, 2011, 5:37:40 AM7/18/11
to sage-...@googlegroups.com
Hi Simon,

AFAIK, there are 3 tickets on trac: 4260, 4968, 4258

We could merge these and close 2 of them as duplicates.

When I get a chance I will take a look at redoing the template
structure in the dense_ctypes.patch. I am back to normal schedule
only today, so this will probably happen sometime tomorrow.

Perhaps you can help with the pickling, coercion, etc. problems once I
get the linbox wrapper working. ;)


Cheers,
Burcin

Martin Albrecht

unread,
Jul 18, 2011, 6:05:55 AM7/18/11
to sage-...@googlegroups.com
> When I get a chance I will take a look at redoing the template
> structure in the dense_ctypes.patch. I am back to normal schedule
> only today, so this will probably happen sometime tomorrow.
>
> Perhaps you can help with the pickling, coercion, etc. problems once I
> get the linbox wrapper working. ;)

Cool! My plan was to work on this during the upcoming Sage Days in Seattle at
the end of August. If you have a working prototype by then I'll try to finish
it off :)

Simon King

unread,
Jul 18, 2011, 6:07:35 AM7/18/11
to sage-devel
Hi Burcin,

On 18 Jul., 11:37, Burcin Erocal <bur...@erocal.org> wrote:
> AFAIK, there are 3 tickets on trac: 4260, 4968, 4258

Thank you!

> Perhaps you can help with the pickling, coercion, etc. problems once I
> get the linbox wrapper working. ;)

I can at least try.

In the meantime, I work on improving MeatAxe. For example, they don't
use Strassen multiplication. So, it is tempting for me to see whether
MeatAxe can benefit from the stuff in sage.matrix.strassen. For my
application (project on computation of Ext algebras), I will try to
write everything generically, so that eventually both MeatAxe and Sage/
Linbox matrices will work.

Cheers,
Simon

Ivo Hedtke

unread,
Jul 18, 2011, 6:15:33 AM7/18/11
to sage-...@googlegroups.com
Hi,

I am new to sage-devel. Sorry if I asked questions that you perhaps discussed in
the the past.

Why do you wan't to use Strassen for Matrix Multiplication? This algorithms is
not as stable as naive algorithm (if one uses floating point numbers, if you use
it for symbol computations that is no problem). It is also not very fast for
small matrices. And if you don't use the in-memory version of the algorithm it
has extremly memory requirements.

Why not Winograd's method. Or something like Tiling or Loop-Unrolling?

Again, sorry if you discussed this in the past.

Best wishes,
Ivo.

Martin Albrecht

unread,
Jul 18, 2011, 6:32:33 AM7/18/11
to sage-...@googlegroups.com
On Monday 18 July 2011, Ivo Hedtke wrote:
> Hi,

Hi Ivo,

> I am new to sage-devel. Sorry if I asked questions that you perhaps
> discussed in the the past.
>
> Why do you wan't to use Strassen for Matrix Multiplication? This algorithms
> is not as stable as naive algorithm (if one uses floating point numbers,
> if you use it for symbol computations that is no problem).

We are discussing using it over finite fields and hence there is no stability
issue.

> It is also not very fast for small matrices.

That's why one usually implements a cutoff: below a certain dimension naive or
or some other multiplication is used: both when the input matrix is small and
when the recursion reaches the small dimension.

> And if you don't use the in-memory version
> of the algorithm it has extremly memory requirements.

The memory overhead is 2/3 n^2 if you set it up right.

> Why not Winograd's method.

Usually, when people talk about Strassen they mean Strassen-Winograd, is that
what you mean? It just reduces the number of additions.

> Or something like Tiling or Loop-Unrolling?

Of course, one implements the base case as efficiently as possible. In
LinBox's case it's outsourced to ATLAS, in other cases there are dedicated
implementations. While these implementation tricks are crucial for good
performance, they won't beat Strassen-Winograd since they are asymptotically
slower: n^3 vs. n^2.807.

Simon King

unread,
Jul 18, 2011, 6:42:27 AM7/18/11
to sage-devel
Hi Ivo,

On 18 Jul., 12:15, Ivo Hedtke <hed...@me.com> wrote:
>
> Why do you wan't to use Strassen for Matrix Multiplication? This algorithms is
> not as stable as naive algorithm (if one uses floating point numbers, if you use
> it for symbol computations that is no problem).

I work over finite fields.

> It is also not very fast for
> small matrices.

That's why there is a cutoff.

> Why not Winograd's method.

Practical answer: Since I found that there is a framework for the
Strassen multiplication in Sage (see sage.matrix.strassen), but I
don't know whether a similar support exists for the Coppersmith-
Winograd algorithm.

search_src('Winograd') told me that the multiplication algorithm used
in sage.matrix.matrix_mod2_dense is called "Strassen-Winograd", but
the asymptotics seems to coincide with what I know as Strassen
algorithm.

> Or something like Tiling or Loop-Unrolling?

Would that change the asymptotics?

Cheers,
Simon

Simon King

unread,
Jul 18, 2011, 6:44:52 AM7/18/11
to sage-devel
Hi Martin,

On 18 Jul., 12:32, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> > Why not Winograd's method.
>
> Usually, when people talk about Strassen they mean Strassen-Winograd, is that
> what you mean? It just reduces the number of additions.

My guess is that Ivo refers to Coppersmith-Winograd, which, according
to Wikipedia, *is* faster than Strassen-Winograd: n^2.376 versus
n^2.807.

Cheers,
Simon

Simon King

unread,
Jul 18, 2011, 6:46:55 AM7/18/11
to sage-devel
PS:

On 18 Jul., 12:44, Simon King <simon.k...@uni-jena.de> wrote:
> > Usually, when people talk about Strassen they mean Strassen-Winograd, is that
> > what you mean? It just reduces the number of additions.
>
> My guess is that Ivo refers to Coppersmith-Winograd, which, according
> to Wikipedia, *is* faster than Strassen-Winograd: n^2.376 versus
> n^2.807.

... but only pays off for matrix sizes too big to be practical. I
guess that's why it is not in Sage.

hedtke

unread,
Jul 18, 2011, 7:32:11 AM7/18/11
to sage-...@googlegroups.com
Hi Simon and Martin,

I know Strassen-Winograd. Strassen uses 18 additions, Strassen-Winograd only 15, which is optimal for 7 multiplications ;-)

With the in-memory variant I mean: "Boyer, Dumans, Pernet and Zhou: Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm. International Symposium on Symbolic and Algebraic Computation 2009."

This needs only a constant number of auxiliary variables, i think.

OK. If you work on finite fields, there is no stability issue.

Loop Unrolling or Tilling doesn't change the asymptotic complexity. But we can use Knowledge about the computer architecture (L1 Cache, etc), to speed up the computations by a factor of 2.

Please remember that the parameters for Strassens algorightm from his original paper are far away from optimal. I mean that the original method only was build for matrices of the size m*2^k. Strassen said something like: If you ant to calculate the product of two n x n matrices for an arbitrary n, choose m=f(n) and k=g(n) for some f and g (see his paper from 1969). These f and g are NOT a good choice. There are better ones from R. Probert and others.

Best wishes,
Ivo.

Am 18. Jul 2011 um 12:46 schrieb Simon King <simon...@uni-jena.de>:

PS:

On 18 Jul., 12:44, Simon King <simon.k...@uni-jena.de> wrote:
> > Usually, when people talk about Strassen they mean Strassen-Winograd, is that
> > what you mean? It just reduces the number of additions.
>
> My guess is that Ivo refers to Coppersmith-Winograd, which, according
> to Wikipedia, *is* faster than Strassen-Winograd: n^2.376 versus
> n^2807.


... but only pays off for matrix sizes too big to be practical. I
guess that's why it is not in Sage.

--
To post to this group, send an email to sage-...@googlegroups.com
To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Simon King

unread,
Jul 18, 2011, 8:09:47 AM7/18/11
to sage-devel
Hi Ivo,

On 18 Jul., 13:32, hedtke <hed...@me.com> wrote:
> I know Strassen-Winograd. Strassen uses 18 additions, Strassen-Winograd only 15, which is optimal for 7 multiplications ;-)

Looking at sage.matrix.strassen.strassen_multiply_window, I find 15
additions/subtractions. So, it seems that Strassen==Strassen-Winograd
in Sage.

> With the in-memory variant I mean: "Boyer, Dumans, Pernet and Zhou: Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm. International Symposium on Symbolic and Algebraic Computation 2009."

I am not so sure about the memory efficiency of the implementation,
though. One comment says
# todo: we can probably save some memory in these
# operations by reusing some of the buffers, if we interleave
# these additions with the multiplications (below)

Moreover, the algorithm is called recursively, which probably means
that there are a lot of temporary variables stored in memory.

Thank you for the pointer to a memory efficient version!

Best regards,
Simon

Clement Pernet

unread,
Jul 18, 2011, 8:37:29 AM7/18/11
to sage-devel
Hi Ivo,

On Jul 18, 1:32 pm, hedtke <hed...@me.com> wrote:
> With the in-memory variant I mean: "Boyer, Dumans, Pernet and Zhou: Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm. International Symposium on Symbolic and Algebraic Computation 2009."
>
> This needs only a constant number of auxiliary variables, i think.

Well, we wrote this paper from our experience implementing Strassen-
Winograd's algorithm in LinBox. The version in LinBox is actually not
the fully in-memory version, for it has a sligthly larger constant in
the time complexity.
We use the classic 2/3n^2 extra memory version (for C<-AxB) and the
1n^2 version for (C<-AxB+-C), as the computation time is usually a
bigger concern than memory.
>
> Loop Unrolling or Tilling doesn't change the asymptotic complexity. But we can use Knowledge about the computer architecture (L1 Cache, etc), to speed up the computations by a factor of 2.

As Martin already wrote, this is already taken into consideration for
the base case: using the numercial routines BLAS for LinBox, or table
precomputations+fine tuning+... for M4RI over GF2
>
> Please remember that the parameters for Strassens algorightm from his original paper are far away from optimal. I mean that the original method only was build for matrices of the size m*2^k. Strassen said something like: If you ant to calculate the product of two n x n matrices for an arbitrary n, choose m=f(n) and k=g(n) for some f and g (see his paper from 1969). These f and g are NOT a good choice. There are better ones from R. Probert and others.
Of course we do not use these parameters. In LinBox, they are
determined at install time, by benchmarks.

Clément Pernet

Clement Pernet

unread,
Jul 18, 2011, 8:41:39 AM7/18/11
to sage-devel
Great!!
I have to be in Grenoble during that period for teaching but I'll try
to arrange some free time to hang around on IRC and work remotely with
you guys during that period.
Clément

On Jul 18, 12:05 pm, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> _jab: martinralbre...@jabber.ccc.de

Simon King

unread,
Jul 18, 2011, 9:36:14 AM7/18/11
to sage-devel
Hi Ivo,

On 18 Jul., 13:32, hedtke <hed...@me.com> wrote:
> With the in-memory variant I mean: "Boyer, Dumans, Pernet and Zhou: Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm. International Symposium on Symbolic and Algebraic Computation 2009."

Of course, I can not speak for LinBox. But the stuff in
sage.matrix.strassen could really be a bit more memory efficient: It
uses more than 5 times the temporary memory that is needed even in the
least memory efficient multiplication algorithm from the paper you
cited.

So, I'm now trying to improve the performance of sage.matrix.strassen.

Cheers,
Simon

Ivo Hedtke

unread,
Jul 18, 2011, 10:44:52 AM7/18/11
to sage-...@googlegroups.com
Hi Simon,

what would be a nice piece of work. Good luck and have fun ;-)

Please test the following if your implementation is ready: Try to use your Strassen implementation only with 1 step of recursion (in the next step only the naive algorithm). And a second version with only 3 steps of recursion. This will bound the memory requirements and give a speed up. This is a known heuristic (to stop the recursion early). But I don't know if it is better than the parameters for the algorithm found by the benchmark.

By the way: With "Winograd's method" I mean the one from Section 2.3 (page 10) from http://arxiv.org/abs/1106.1347 .
But please note that the survey article mentioned above is the first draft. It is not complete and the implementations are not optimized.

Best wishes, Ivo.

Simon King

unread,
Jul 18, 2011, 12:22:29 PM7/18/11
to sage-devel
Hi Ivo,

On 18 Jul., 16:44, Ivo Hedtke <hed...@me.com> wrote:
> what would be a nice piece of work. Good luck and have fun ;-)

Thank you! I had fun...

> Please test the following if your implementation is ready: Try to use your Strassen implementation only with 1 step of recursion (in the next step only the naive algorithm). And a second version with only 3 steps of recursion. This will bound the memory requirements and give a speed up. This is a known heuristic (to stop the recursion early).

Concerning memory: Since the memory is used only temporarily, I don't
see how I could determine the exact memory requirements without
inserting get_memory_usage() statements into the code. Below, I
determine the memory consumption by staring at "top" while running the
computation.

Concerning iterations: The algorithm does not count the number of
iterations, but uses a next iteration step only if *all* current
matrix dimensions exceed a given cutoff parameter.

1st test: Multiplication of a 64x83 matrix by a 83x101 matrix over the
integers modulo 2^65 (that is a doc test example).
cutoff=20 seems to be a rather good choice (which amounts to two
iterations for these matrix dimensions).
Default multiplication in Sage: 365 ms.
New Strassen implementation, cutoff=20: 291 ms.
Old Strassen implementation, cutoff=20: 288 ms.

2nd test: Multiplication of two 1000x1000 matrices over GF(125).
Default multiplication in Sage: 102.13 seconds, using 3.9% of my
computer's memory.
New Strassen implementation, cutoff=20: 61.16 seconds using 4.1% of my
computer's memory.
Old Strassen implementation, cutoff=20: 61.78 seconds using 4.7% of my
computer's memory.

3rd test: Multiplication of two 2000x2000 matrices over GF(125).
Default multiplication takes ages.
New Strassen implementation, cutoff=20: 427 seconds using 6.9% of my
computer's memory.
Old Strassen implementation, cutoff=20: 431 seconds using 9.3% of my
computer's memory.


I don't know if it is worth it. Shall I open a ticket and post my
patch?

Best regards,
Simon

Ivo Hedtke

unread,
Jul 18, 2011, 12:27:54 PM7/18/11
to sage-...@googlegroups.com
Hi Simon,

wow, I am impressed.

If the cutoff-parameter is determined by the benchmark: Yes, submit it!

In the other case I see the problem of how to define the cutoff in a good way. In this case the method could be significantly slower.

I look forward to review your patch.

Best wishes from Halle to Jena,
Ivo

Simon King

unread,
Jul 18, 2011, 12:36:23 PM7/18/11
to sage-devel
PS:

On 18 Jul., 18:22, Simon King <simon.k...@uni-jena.de> wrote:
> ...
> 3rd test: Multiplication of two 2000x2000 matrices over GF(125).
> Default multiplication takes ages.
> New Strassen implementation, cutoff=20: 427 seconds using 6.9% of my
> computer's memory.
> Old Strassen implementation, cutoff=20: 431 seconds using 9.3% of my
> computer's memory.
>
> I don't know if it is worth it. Shall I open a ticket and post my
> patch?

Concerning "worth it": Even without Strassen multiplication, MeatAxe
only needs about 12 seconds for that example.

So, one could argue that it is not worth it: One easily gets a better
speed in a different way.

Or one could argue that it is worth it: If one uses
sage.matrix.strassen on top of MeatAxe then the speed might be even
better (perhaps close to Magma speed). By the way, the MeatAxe version
I started with is 2.2.4 -- it was a special release under GPL 2+
(otherwise identical to 2.2.3, which is GPL2 only). So, the licence
problem might be solvable.

Or one could argue that it is not worth it: LinBox already has
Strassen. So, it might be better if Burcin and Martin revived the
LinBox wrapper (so that we have full speed over finite prime fields)
and then use the tricks discussed earlier in this thread to extend it
to non-prime fields (which would be quite a lot of work).

Anyway, I think Strassen multiplication should become the default for
Matrix_modn_dense.

What do people think?
Cheers,
Simon

Ivo Hedtke

unread,
Jul 18, 2011, 12:45:26 PM7/18/11
to sage-...@googlegroups.com
Hi Simon,

Am 18.07.2011 um 18:36 schrieb Simon King:

> Anyway, I think Strassen multiplication should become the default for
> Matrix_modn_dense.
>
> What do people think?

I am no fan of Strassen as standard method.

I usually work with very large matrices. For this size the memory issue is very important. I think that it would be better to use the current standard method and only give Strassen as a additional method. Then one can use Strassen if he knows what he is doing.

Ivo.

Simon King

unread,
Jul 18, 2011, 5:45:08 PM7/18/11
to sage-devel
Hi Ivo and all others,

On 18 Jul., 18:27, Ivo Hedtke <hed...@me.com> wrote:
> If the cutoff-parameter is determined by the benchmark: Yes, submit it!

The cutoff-parameter is not hard-wired. So, everyone can find his/her
own.

> I look forward to review your patch.

The patch is at trac ticket #11610, together with some timings and
memory data. Thank you for your pointer to the article: It was
straight forward to implement a schedule from the article; I chose a
non-destructive version (i.e., the input matrices are preserved). With
that schedule, Strassen-Winograd could be a real alternative to the
default multiplication (school book) of
sage.matrix.matrix_modn_dense.Matrix_modn_dense.

Cheers,
Simon

Martin Albrecht

unread,
Aug 24, 2011, 11:41:08 PM8/24/11
to sage-...@googlegroups.com
Hi everybody,

it's the last day of Sage Days 32 and I push Burcin's and Clément's patch far
enough to have all doctests pass on sage.math, cf.

http://trac.sagemath.org/sage_trac/ticket/4260

With respect to performance the difference is like night and day:

== Night (without #4260)==

sage: A = random_matrix(GF(previous_prime(2^23)),1000,1000)
sage: B = random_matrix(GF(previous_prime(2^23)),1000,1000)
sage: %time A*B
CPU times: user 326.12 s, sys: 1.27 s, total: 327.39 s
Wall time: 328.79 s
1000 x 1000 dense matrix over Finite Field of size 8388593

sage: %time A.echelonize()
CPU times: user 92.59 s, sys: 0.64 s, total: 93.23 s
Wall time: 93.67 s

== Day (with #4260) ==

sage: A = random_matrix(GF(previous_prime(2^23)),1000,1000)
sage: B = random_matrix(GF(previous_prime(2^23)),1000,1000)
sage: %time A*B
CPU times: user 0.26 s, sys: 0.02 s, total: 0.28 s
Wall time: 0.28 s
1000 x 1000 dense matrix over Finite Field of size 8388593

sage: %time A.echelonize()
CPU times: user 0.41 s, sys: 0.00 s, total: 0.41 s
Wall time: 0.42 s

Rob Beezer (sitting next to me :)) is currently looking at the patch, adding
more doctests/documentation and stress testing the whole thing. It would be
great if people could give #4260 a spin on different platforms to check
whether it works for them. Especially

- big endian machines to test whether pickling works
- 32-bit machines and
- non-Linux machines

would be greatly appreciated. Also, people who care deeply about matrices over
ZZ and QQ should give it a spin and check for performance regressions or
improvements (because of multi-modular stuff) There shouldn't be any, but who
knows.

Cheers,
Martin

PS: After SD32 I probably wont' have time to push this patch much further so
if you care about switching from very slow to very very fast for operations
with dense matrices mod n (n < 2^23) then feel free to take charge.

--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
_www: http://martinralbrecht.wordpress.com/

_jab: martinr...@jabber.ccc.de

Martin Albrecht

unread,
Sep 27, 2011, 11:19:17 AM9/27/11
to sage-...@googlegroups.com
I just fixed the one speed-regression Simon pointed out, so the patch
http://trac.sagemath.org/sage_trac/ticket/4260 needs a review again
... hint, hint!

_www: http://www.informatik.uni-bremen.de/~malb
_jab: martinr...@jabber.ccc.de

Reply all
Reply to author
Forward
0 new messages