Matrix multiplication

138 views
Skip to first unread message

Simon King

unread,
Jul 12, 2011, 12:09:42 PM7/12/11
to sage-devel
Hi!

Working at #11589, I revisited the almost-forgotten ticket #8096 (no
activity since 16 months), which aims at speeding up matrix
multiplication.

#10763 (already merged) and #11589 (positive review) provide some
improvement. However, using the defaults in Sage, the multiplication
of two random dense 2000x2000 matrices over GF(3) still takes 8.34
seconds (it used to be 12 seconds).
That does not compare nicely with my tuned (i.e. unpublished) version
of C-MeatAxe, which completes the same task in 1.72 seconds.

So, obviously it makes sense to try and improve Sage's matrix
multiplication. That may actually be rather easy, because linbox can
do the above multiplication in 1.65 seconds (using _multiply_linbox).

Is there a compelling reason why linbox multiplication is not the
default for dense matrices over finite fields?

Best regards,
Simon

Martin Albrecht

unread,
Jul 12, 2011, 12:26:22 PM7/12/11
to sage-...@googlegroups.com
> Is there a compelling reason why linbox multiplication is not the
> default for dense matrices over finite fields?

IIRC, this dates back to the time when we didn't ship ATLAS, i.e. LinBox could
have been built against some reference implementation BLAS (== slow).

It wasn't touched afterwards because people tried to move Matrix_modn_dense
over to LinBox as the native format (again, IIRC) and these projects stalled.

As far as I can tell the only downside of switching to LinBox is the increased
memory consumption.

Cheers,
Martin

--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
_www: http://martinralbrecht.wordpress.com/
_jab: martinr...@jabber.ccc.de

Volker Braun

unread,
Jul 12, 2011, 1:10:40 PM7/12/11
to sage-...@googlegroups.com
Does linbox actually use blas for fast matrix multiplication over finite fields? I know its hard to tell from the linbox source code but I don't think floating point stuff could be faster than the direct application of Strassen's algorithm.

Martin Albrecht

unread,
Jul 12, 2011, 1:48:08 PM7/12/11
to sage-...@googlegroups.com

Yes. It uses FFLAS/FFPACK for dense matrices over GF(p). Which in top of the
numerical routines it implements Strassen:

http://ljk.imag.fr/membres/Jean-Guillaume.Dumas/FFLAS/

As far as I know it's the fastest implementation for medium size primes (16
bits or so). For very small primes one can do better

1) by bit-slicing:

http://arxiv.org/abs/0901.1413

or 2) bit-packing:

http://wiki.sagemath.org/days10/JGDumasTalk
http://wiki.sagemath.org/days10?action=AttachFile&do=view&target=Dumas_SD10_talk.pdf

Btw. the main author of FFLAS/FFPACK Clément Pernet was the first Sage Postdoc
at UW.

It's a shame we are not using the libraries we ship to full potential but so
far no one stepped up (me included) to fix this situation.

Simon King

unread,
Jul 12, 2011, 2:57:06 PM7/12/11
to sage-devel
Hi Martin,

On 12 Jul., 19:48, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> As far as I know it's the fastest implementation for medium size primes (16
> bits or so). For very small primes one can do better

But actually linbox is already much faster over GF(3) than what is
currently done in Sage.

> It's a shame we are not using the libraries we ship to full potential but so
> far no one stepped up (me included) to fix this situation.

OK, but how much work would be needed? It appears to me that there is
only very little overhead in calling the linbox routines (via
sage.libs.linbox.linbox), since the data of a Matrix_modn_dense is
already stored in the same format used by linbox (namely mod_int **).
In particular, I wounder about your statement that the memory
consumption would increase. Is that because of internal linbox
computations?

Here is how linbox is wrapped (see
sage.matrix.matrix_modn_dense.Matrix_modn_dense._multiply_linbox):
...
self._init_linbox()
sig_on()
linbox.matrix_matrix_multiply(ans._matrix, B._matrix,
B._nrows, B._ncols)
sig_off()

self._init_linbox() seems rather light-weight, and
linbox.matrix_multiply is also a very thin wrapper of the linbox
routines. Similarly, one proceeds with other routines (computing
echelon form and so on)

Actually I think it would even be easy enough to call the linbox
routines directly (without using the sage.libs.linbox.linbox wrapper,
thus avoiding all overhead). Since I need fast matrix arithmetics over
finite fields, I have some motivation to rewrite _mul_ accordingly.

But one question: Can linbox deal with non-prime finite fields?

Cheers,
Simon

Martin Albrecht

unread,
Jul 12, 2011, 3:09:18 PM7/12/11
to sage-...@googlegroups.com, Clement Pernet, Burcin Erocal
Hi Simon,

> wrote:
> > As far as I know it's the fastest implementation for medium size primes
> > (16 bits or so). For very small primes one can do better
>
> But actually linbox is already much faster over GF(3) than what is
> currently done in Sage.

Yep, it was just meant as a statement about the viability of the chosen
strategy (using ATLAS)

> > It's a shame we are not using the libraries we ship to full potential but
> > so far no one stepped up (me included) to fix this situation.
>
> OK, but how much work would be needed?

It depends on how far you want to go:

1) Only switch multiplication over to LinBox, that's probably easy to do

2) Switch everything to LinBox of FFLAS/FFPACK directly. This probably needs
some hacking around.

> It appears to me that there is
> only very little overhead in calling the linbox routines (via
> sage.libs.linbox.linbox), since the data of a Matrix_modn_dense is
> already stored in the same format used by linbox (namely mod_int **).
> In particular, I wounder about your statement that the memory
> consumption would increase. Is that because of internal linbox
> computations?

> Here is how linbox is wrapped (see
> sage.matrix.matrix_modn_dense.Matrix_modn_dense._multiply_linbox):
> ...
> self._init_linbox()
> sig_on()
> linbox.matrix_matrix_multiply(ans._matrix, B._matrix,
> B._nrows, B._ncols)
> sig_off()
>
> self._init_linbox() seems rather light-weight, and
> linbox.matrix_multiply is also a very thin wrapper of the linbox
> routines. Similarly, one proceeds with other routines (computing
> echelon form and so on)

Actually, this is how we are wrapping our wrapper for LinBox which is part of
the LinBox SPKG. This is where the actual heavy lifting is done such as
converting ints to doubles etc. So what you see in Sage gives you a misleading
impression.

Btw. the reason why this wrapper exists is that LinBox takes so long to
compile and takes so much memory. A harmless doctest change thus triggered a
long compilation. Hence we outsourced it.



> Actually I think it would even be easy enough to call the linbox
> routines directly (without using the sage.libs.linbox.linbox wrapper,
> thus avoiding all overhead). Since I need fast matrix arithmetics over
> finite fields, I have some motivation to rewrite _mul_ accordingly.

If one is serious about this then one should probably use the FFLAS/FFPACK
low-level routines directly (Burcin might have some old patches for this, he's
the last person I know of who tried) since that avoids compiling all thee
layers of C++ in LinBox.

> But one question: Can linbox deal with non-prime finite fields?

They had some experimental code at some point IIRC but nothing for general
use.

Simon King

unread,
Jul 12, 2011, 5:12:18 PM7/12/11
to sage-devel
Hi Martin,

On 12 Jul., 21:09, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> If one is serious about this then one should probably use the FFLAS/FFPACK
> low-level routines directly (Burcin might have some old patches for this, he's
> the last person I know of who tried) since that avoids compiling all thee
> layers of C++ in LinBox.

That would probably be beyond my scope.

> > But one question: Can linbox deal with non-prime finite fields?
>
> They had some experimental code at some point IIRC but nothing for general
> use.

What a pity - I need non-prime fields as well.

But perhaps it is feasible to create a sub-class Matrix_modp_dense of
Matrix_modn_dense for matrices over finite prime fields -- overriding
_mul_ etc. to use the functions from sage.libs.linbox.linbox. In that
way, one could benefit from the LinBox speed at least to some extent.
I suppose that fits into the scope of #8096.

Best regards,
Simon

Burcin Erocal

unread,
Jul 12, 2011, 5:35:51 PM7/12/11
to sage-...@googlegroups.com
On Tue, 12 Jul 2011 14:12:18 -0700 (PDT)
Simon King <simon...@uni-jena.de> wrote:

> On 12 Jul., 21:09, Martin Albrecht <martinralbre...@googlemail.com>
> wrote:
> > If one is serious about this then one should probably use the
> > FFLAS/FFPACK low-level routines directly (Burcin might have some
> > old patches for this, he's the last person I know of who tried)
> > since that avoids compiling all thee layers of C++ in LinBox.
>
> That would probably be beyond my scope.

At SD16, Clement and I looked into wrapping linbox directly. The speeds
were promising, but we never took the time to fix pickling, etc.
The patch is most likely unusable by now:

http://sage.math.washington.edu/home/burcin/dense_ctypes.patch

I can quickly redo the work to create a template that handles the short
and long ints as special cases if there is interest. The linbox calls
in that patch can also be much better now that Cython has C++ support.


Cheers,
Burcin

Simon King

unread,
Jul 13, 2011, 5:48:08 AM7/13/11
to sage-devel
Hi!

On 12 Jul., 23:35, Burcin Erocal <bur...@erocal.org> wrote:
> On Tue, 12 Jul 2011 14:12:18 -0700 (PDT)
> At SD16, Clement and I looked into wrapping linbox directly. The speeds
> were promising, but we never took the time to fix pickling, etc.


I, for one, would appreciate fast matrix multiplication over finite
fields!

However, there is one complication: In my current project (computing
Ext-Algebras of basic algebras), I definitely need finite non-prime
fields, and the current implementation in Sage for that case is
definitely too slow. I just did an example of multiplying two 500x500
matrices over GF(125). Sage took 11.24 seconds, MeatAxe (with some
improvements that I implemented yesterday) took 0.2 seconds.

If LinBox really can not "officially" deal with non-prime fields, I
should probably stick with MeatAxe then (even though, in the small
version, it only supports fields of size <256).

My current MeatAxe wrapper (in my optional cohomology spkg) is based
on an old MeatAxe version (2.2.4). Perhaps I could take the most
recent version (2.4.13), modify my wrapper accordingly, and contribute
it to Sage. The MeatAxe licence is good enough for Sage, isn't it? But
is MeatAxe eligible to become a standard package? Currently, there
only is an experimental meataxe-2.4.3.spkg.

Correct me if the following is nonsense:
- The MeatAxe sources should go into an spkg. Question: How should I
deal with my modifications to the original MeatAxe sources? Do I
remember correctly that the spkg should contain the unmodified
sources, together with some automatically applied patches that are
stored in a sub-directory patches/?
- The spkg should create a library mtx.a, that should go into, say,
SAGE_ROOT/local/lib/mtx-2.4.13/, so that my wrapper can easily link
against it.

Best regards,
Simon

Martin Albrecht

unread,
Jul 13, 2011, 6:11:06 AM7/13/11
to sage-...@googlegroups.com
Hi,

> I, for one, would appreciate fast matrix multiplication over finite
> fields!
>
> However, there is one complication: In my current project (computing
> Ext-Algebras of basic algebras), I definitely need finite non-prime
> fields, and the current implementation in Sage for that case is
> definitely too slow. I just did an example of multiplying two 500x500
> matrices over GF(125). Sage took 11.24 seconds, MeatAxe (with some
> improvements that I implemented yesterday) took 0.2 seconds.

Do you know Tom Boothby's and Robert Bradshaw's excellent paper

Bitslicing and the Method of Four Russians Over Larger Finite Fields

http://arxiv.org/abs/0901.1413

I just took a naive version of that for a spin to see how well it compares:

# We represent matrices over polynomials as polynomials over matrices

sage: P.<A2,A1,A0,B2,B1,B0> = PolynomialRing(GF(5))
sage: R.<x> = P[]
sage: Q.<x> = R.quo(x^3 + 3*x + 3)

sage: f = (A2*x^2+A1*x+A0)
sage: g = (A2*x^2+A1*x+A0)

# so this is what we need to compute:

sage: f*g
(2*A2^2 + A1^2 + 2*A2*A0)*x^2 + (2*A2^2 - A2*A1 + 2*A1*A0)*x - A2*A1 + A0^2

# here, one would actually apply a Karatsuba to get the number of mults down,
# but I was too lazy and only did this (fingers crossed I didn't introduce a
# bug)

sage: def my_mul(A2,A1,A0, B2,B1,B0):
... A22 = A2._multiply_linbox(A2)
... A2A1 = A2._multiply_linbox(A1)
... A22_2 = 2*A22
... C2 = (A22_2 + A1._multiply_linbox(A1) + 2*A2._multiply_linbox(A0))
... C1 = (A22_2 - A2A1 + 2*A1._multiply_linbox(A0))
... C0 = - A2A1 + A0._multiply_linbox(A2)
... return C2,C1,C0

sage: A = [random_matrix(GF(5),500,500) for _ in range(3)]
sage: B = [random_matrix(GF(5),500,500) for _ in range(3)]
sage: %time _ =my_mul(*(A+B))
CPU time: 0.22 s, Wall time: 0.22 s

sage: A = [random_matrix(GF(5),2000,2000) for _ in range(3)]
sage: B = [random_matrix(GF(5),2000,2000) for _ in range(3)]
sage: %time _ =my_mul(*(A+B))
CPU time: 10.23 s, Wall time: 10.57 s

So even with all the Python overhead (and modulo bugs which I may have
introduced) this approach seems to be comparable to whatever Meataxe is doing.
I'd actually guess it gets faster than MeatAxe as dimensions grow since AFAIK
MeatAxe doesn't do matrix multiplication asymptotically fast?

> If LinBox really can not "officially" deal with non-prime fields, I
> should probably stick with MeatAxe then (even though, in the small
> version, it only supports fields of size <256).
>
> My current MeatAxe wrapper (in my optional cohomology spkg) is based
> on an old MeatAxe version (2.2.4). Perhaps I could take the most
> recent version (2.4.13), modify my wrapper accordingly, and contribute
> it to Sage. The MeatAxe licence is good enough for Sage, isn't it? But
> is MeatAxe eligible to become a standard package? Currently, there
> only is an experimental meataxe-2.4.3.spkg.
>
> Correct me if the following is nonsense:
> - The MeatAxe sources should go into an spkg. Question: How should I
> deal with my modifications to the original MeatAxe sources? Do I
> remember correctly that the spkg should contain the unmodified
> sources, together with some automatically applied patches that are
> stored in a sub-directory patches/?
> - The spkg should create a library mtx.a, that should go into, say,
> SAGE_ROOT/local/lib/mtx-2.4.13/, so that my wrapper can easily link
> against it.
>
> Best regards,
> Simon

Cheers,

Simon King

unread,
Jul 13, 2011, 7:23:47 AM7/13/11
to sage-devel
Hi Martin,

On 13 Jul., 12:11, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> Do you know Tom Boothby's and Robert Bradshaw's excellent paper
>
>    Bitslicing and the Method of Four Russians Over Larger Finite Fields
>
>    http://arxiv.org/abs/0901.1413

Thank you for your hint!

> sage: P.<A2,A1,A0,B2,B1,B0> = PolynomialRing(GF(5))
> sage: R.<x> = P[]
> sage: Q.<x> = R.quo(x^3 + 3*x + 3)
>
> sage: f = (A2*x^2+A1*x+A0)
> sage: g = (A2*x^2+A1*x+A0)
>
> # so this is what we need to compute:
>
> sage: f*g
> (2*A2^2 + A1^2 + 2*A2*A0)*x^2 + (2*A2^2 - A2*A1 + 2*A1*A0)*x - A2*A1 + A0^2

That is the same as f*f. Did you intend to do
sage: g = (B2*x^2+B1*x+B0)
sage: f*g
(2*A2*B2 + A0*B2 + A1*B1 + A2*B0)*x^2 + (2*A2*B2 + 2*A1*B2 + 2*A2*B1 +
A0*B1 + A1*B0)*x + 2*A1*B2 + 2*A2*B1 + A0*B0
?

> sage: def my_mul(A2,A1,A0, B2,B1,B0):
> ...       A22 = A2._multiply_linbox(A2)
> ...       A2A1 = A2._multiply_linbox(A1)
> ...       A22_2 = 2*A22
> ...       C2 = (A22_2 + A1._multiply_linbox(A1) + 2*A2._multiply_linbox(A0))
> ...       C1 = (A22_2 - A2A1 + 2*A1._multiply_linbox(A0))
> ...       C0 = - A2A1 + A0._multiply_linbox(A2)
> ...       return C2,C1,C0

B2,B1,B0 are not used, it just computes the square of the first
argument.

f*g from above yields
sage: def my_mul(A2,A1,A0, B2,B1,B0):
....: A0B0 = A0._multiply_linbox(B0)
....: A0B1 = A0._multiply_linbox(B1)
....: A0B2 = A0._multiply_linbox(B2)
....: A1B0 = A1._multiply_linbox(B0)
....: A1B1 = A1._multiply_linbox(B1)
....: A1B2 = A1._multiply_linbox(B2)*2
....: A2B0 = A2._multiply_linbox(B0)
....: A2B1 = A2._multiply_linbox(B1)*2
....: A2B2 = A2._multiply_linbox(B2)*2
....: C2 = A2B2+A0B2+A1B1+A2B0
....: C1 = A2B2+A1B2+A2B1+A0B1+A1B0
....: C0 = A1B2+A2B1+A0B0
....: return C2,C1,C0
....:
sage: A = [random_matrix(GF(5),500,500) for _ in range(3)]
sage: B = [random_matrix(GF(5),500,500) for _ in range(3)]
sage: %time _ =my_mul(*(A+B))
CPU times: user 0.42 s, sys: 0.05 s, total: 0.47 s
Wall time: 0.48 s
sage: A = [random_matrix(GF(5),2000,2000) for _ in range(3)]
sage: B = [random_matrix(GF(5),2000,2000) for _ in range(3)]
sage: %time _ =my_mul(*(A+B))
CPU times: user 15.08 s, sys: 0.39 s, total: 15.46 s
Wall time: 15.51 s

It is a bit slower than MeatAxe (for 2000x2000 over GF(125), it needs
11.68 seconds on my computer), but I think that's not bad for the
first attempt.

Best regards,
Simon

Martin Albrecht

unread,
Jul 13, 2011, 7:50:31 AM7/13/11
to sage-...@googlegroups.com
> > sage: P.<A2,A1,A0,B2,B1,B0> = PolynomialRing(GF(5))
> > sage: R.<x> = P[]
> > sage: Q.<x> = R.quo(x^3 + 3*x + 3)
> >
> > sage: f = (A2*x^2+A1*x+A0)
> > sage: g = (A2*x^2+A1*x+A0)
> >
> > # so this is what we need to compute:
> >
> > sage: f*g
> > (2*A2^2 + A1^2 + 2*A2*A0)*x^2 + (2*A2^2 - A2*A1 + 2*A1*A0)*x - A2*A1 +
> > A0^2
>
> That is the same as f*f. Did you intend to do
> sage: g = (B2*x^2+B1*x+B0)
> sage: f*g
> (2*A2*B2 + A0*B2 + A1*B1 + A2*B0)*x^2 + (2*A2*B2 + 2*A1*B2 + 2*A2*B1 +
> A0*B1 + A1*B0)*x + 2*A1*B2 + 2*A2*B1 + A0*B0
> ?

Ouch, yes.

Note that these timings are for the school book multiplication (i.e. 9 = 3^2
multiplications) instead of Karatsuba-style, Python overhead and Sage <=>
LinBox overhead.

It seems one can bring this down to 7 multiplications:

http://bit.ly/r5PVgv

Btw. Magma does it in 4.37 seconds on my computer, to get a sense of what is
the state-of-the-art.

Simon King

unread,
Jul 13, 2011, 8:40:27 AM7/13/11
to sage-devel
Hi Martin,

On 13 Jul., 13:50, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> Note that these timings are for the school book multiplication (i.e. 9 = 3^2
> multiplications) instead of Karatsuba-style, Python overhead and Sage <=>
> LinBox overhead.

Sure. As you said in your previous post, that implementation is the
most naive, given the basic idea.

> Btw. Magma does it in 4.37 seconds on my computer, to get a sense of what is
> the state-of-the-art.

You mean for 2000x2000 over GF(125)? So, it is a long way to go, even
for MeatAxe. BTW, Sage's current implementation needs 906.69 s for a
single multiplication.


If I understand correctly, the basic idea can be easily automatised:
- The given base field F (=GF(125), for instance) determines a
(Conway) polynomial p of degree d.
- One starts with a polynomial ring R over the corresponding prime
field F0 and 2*d variables, constructs the polynomial ring in x over
R, and mods out p. The result is a quotient ring Q.
- Multiplication of two "general" elements of Q, represented by degree-
(d-1) polynomials whose coefficients are the 2*d variables from R,
results in a d-tuple T of elements of R.
- Matrices over F are represented by d-tuples of matrices over F0, and
multiplication is obtained by inserting the elements of these d-tuples
into the elements of T.

But is it possible to *automatically* find a Karatsuba- or Toom-Cook-
type formula that reduces the number of products that appear in T?

An example from the paper: Over GF(4), the multiplication of A=A0+xA1
with B=B0+xB1 (using p=x^2+x+1) would naively be
(A0*B0+A1*B1) + x*(A1*B1+A1*B0+A0*B1)
But using Karatsuba, it becomes
(A0*B0+A1*B1) + x*((A0+A1)*(B0+B1)+A0*B0),
which saves one product.

Is there an automated procedure that finds the second formula, given
the first? If not, how could one do better than with schoolboy
multiplication, in the general case?

Best regards,
Simon

Martin Albrecht

unread,
Jul 13, 2011, 9:03:09 AM7/13/11
to sage-...@googlegroups.com
Hi,

> > Btw. Magma does it in 4.37 seconds on my computer, to get a sense of what
> > is the state-of-the-art.
>
> You mean for 2000x2000 over GF(125)? So, it is a long way to go, even
> for MeatAxe.

Yes, and it should get worse as you increase the degree or did they implement
Strassen in MeatAxe by now?

> BTW, Sage's current implementation needs 906.69 s for a
> single multiplication.

We suck at this it seems :)

> If I understand correctly, the basic idea can be easily automatised:
> - The given base field F (=GF(125), for instance) determines a
> (Conway) polynomial p of degree d.
> - One starts with a polynomial ring R over the corresponding prime
> field F0 and 2*d variables, constructs the polynomial ring in x over
> R, and mods out p. The result is a quotient ring Q.
> - Multiplication of two "general" elements of Q, represented by degree-
> (d-1) polynomials whose coefficients are the 2*d variables from R,
> results in a d-tuple T of elements of R.
> - Matrices over F are represented by d-tuples of matrices over F0, and
> multiplication is obtained by inserting the elements of these d-tuples
> into the elements of T.

Yep & it would be much faster already than what we have currently.

> But is it possible to *automatically* find a Karatsuba- or Toom-Cook-
> type formula that reduces the number of products that appear in T?
>
> An example from the paper: Over GF(4), the multiplication of A=A0+xA1
> with B=B0+xB1 (using p=x^2+x+1) would naively be
> (A0*B0+A1*B1) + x*(A1*B1+A1*B0+A0*B1)
> But using Karatsuba, it becomes
> (A0*B0+A1*B1) + x*((A0+A1)*(B0+B1)+A0*B0),
> which saves one product.
>
> Is there an automated procedure that finds the second formula, given
> the first? If not, how could one do better than with schoolboy
> multiplication, in the general case?

Well, the paper does search for Karatsuba-style formulas automatically, but
it's a non-trivial (brute-force-ish) computation. So far, there does not seem
to be anything better than this. But other people would know better than me.

Simon King

unread,
Jul 13, 2011, 9:11:03 AM7/13/11
to sage-devel
Hi Martin,

On 13 Jul., 15:03, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> > BTW, Sage's current implementation needs 906.69 s for a
> > single multiplication.
>
> We suck at this it seems :)

Totally.

According to the Boothby-Bradshaw article you were referring to, their
ideas were supposed to be implemented in Sage. Is that done, yet? I
wouldn't like to double the work...

Best regards,
Simon

Clement Pernet

unread,
Jul 13, 2011, 9:25:25 AM7/13/11
to sage-devel
Hi,

Most questions about finite fields matmul have been answered, in
short:
- one could quickly fix the pb that sage does not use linbox by
default (I'm surprised by this fact)
- in the current state, operands are converted from ints to double,
hence a overhead in both time
and memory.
- Burcin & I started to address this by rewriting the matrix-modn-
dense class using floating point
representation: a prototype code was working but we wen't into
troubles when polishing it and making
it work with the coercion system. Unfortunately I never found time to
get back to it since then.
- Further timing improvements are expected by using floats instead of
doubles for tiny fields (eg
roughly GF(p<100)) with no pain (already available in LinBox)
- Non prime fields are supported in LinBox, as long as their
cardinality remains small (LinBox
implementation has to be improved following http://hal.archives-ouvertes.fr/hal-00315772
Theorem 2)
but the wrapping in sage is missing. So I think it might probably
be simpler to wrapp it rather
than creating a new spkg with MeatAxe.

Clément

PS: weird, my reply by email has been rejected and I'm now posting via
the web interface.

Simon King

unread,
Jul 13, 2011, 10:00:34 AM7/13/11
to sage-devel
Hi!

On the one hand, I need multiplication of square matrices over finite
non-prime fields. From the things readily available in Sage (+ my
MeatAxe wrapper), MeatAxe is best, the native Sage implementation
sucks, and the trick described in the Boothby-Bradshaw paper (using
LinBox for the prime field under the hood) is almost as good as
MeatAxe even in naive implementation.

On the other hand, I need multiplication of square matrices with
vectors, again over non-prime fields. Here, Meataxe is clearly best.
Over GF(125), multiplication of a 2000x2000 matrix with a 2000x1
matrix takes 24.6 ms, native Sage needs 327 ms, and the Boothby-
Bradshaw trick needs 109 ms. The problem is that LinBox appears to be
rather slow in multiplying a 2000x2000 matrix with a 2000x1 matrix
over GF(5) -- slower than native Sage, actually.

But I suppose that there is a special LinBox function for
multiplication with a vector. How is that function called? Is it
wrapped?

Things are getting confusing. I need a break.

Best regards,
Simon

Simon King

unread,
Jul 13, 2011, 12:29:19 PM7/13/11
to sage-devel
Hi Clement,

On 13 Jul., 15:25, Clement Pernet <clement.per...@gmail.com> wrote:
> ...
>   but the wrapping in sage is missing. So I think it might  probably
> be simpler to wrapp it rather
> than creating a new spkg with MeatAxe.

I am not so sure if that would be easier FOR ME. If I understand
correctly, finishing the LinBox wrapper requires C++ knowledge, which
I don't really have. But I do have a wrapper for an old (but tuned)
version of MeatAxe, and I do have a wrapper for a more recent (but
untuned) version of MeatAxe. The former is part of my optional
cohomology-spkg.

Currently, it seems that in that way I would get most easily a good
performance for the functions that I need:
1. Matrix times Matrix:
MeatAxe is only slightly slower over prime fields than LinBox.
Over non-prime fields, it is faster than a naive implementation of
the Boothby-Bradshaw trick.

2. Matrix times column:
Over prime fields, the native Sage implementation is clearly
fastest, MeatAxe is slower, LinBox is VERY slow.
Over non-prime fields, MeatAxe is fastest, the naive implementation
of the "trick" is 5 times slower.

3. Row times matrix:
MeatAxe is clearly fastest both over prime and over non-prime
fields -- unless someone has already wrapped a specialised
multiplication command of LinBox??

What I mainly need is 1. and 3., both over prime and over non-prime
fields. It seems to me that extending my MeatAxe wrapper is (for me)
easier than extending the LinBox wrapper, and gives reasonably good
performance. I'll go for it.

QUESTION:
According to their web site, "The MeatAxe should run on most UNIX
platforms including Linux, NetBSD, Ultrix, Solaris, HP-UX, and on
Windows NT." So, a MeatAxe wrapper might actually be (eventually) a
candidate for a standard package, isn't it?
OTOH, the current MeatAxe licence is GPL 2. Does that disqualify
MeatAxe? Would GPL 2+ be required?

> PS: weird, my reply by email has been rejected and I'm now posting via
> the web interface.

I do that all the time...

Cheers,
Simon

William Stein

unread,
Jul 13, 2011, 11:55:41 PM7/13/11
to sage-...@googlegroups.com
On Wed, Jul 13, 2011 at 6:29 PM, Simon King <simon...@uni-jena.de> wrote:
> QUESTION:
> According to their web site, "The MeatAxe should run on most UNIX
> platforms including Linux, NetBSD, Ultrix, Solaris, HP-UX, and on
> Windows NT." So, a MeatAxe wrapper might actually be (eventually) a
> candidate for a standard package, isn't it?
> OTOH, the current MeatAxe licence is GPL 2.
> Does that disqualify MeatAxe?

Yes, it disqualifies it.

> Would GPL 2+ be required?

Yes, that would be required.

I think people will have some issues with including an spkg that
replicates functionality available probably in code already in Sage,
with sufficient work. Every single spkg that is added to sage
increases the maintenance work of Sage a lot. Removing spkg's later
is very hard.

-- William

Simon King

unread,
Jul 14, 2011, 3:21:04 AM7/14/11
to sage-devel
Hi William,

On 14 Jul., 05:55, William Stein <wst...@gmail.com> wrote:
> I think people will have some issues with including an spkg that
> replicates functionality available probably in code already in Sage,
> with sufficient work.

From my perspective, your last line is decisive: I would probably not
easily be able to make Sage full use of LinBox - my C++ knowledge is
rather limited, and I know nothing about LinBox. So, in order to use
that for my project, I would have to wait for other people doing the
work.

Moreover, my benchmarks seem to indicate that *for the purpose of my
project* MeatAxe performs relatively well (unfortunately not compared
with Magma, as it seems).

> Every single spkg that is added to sage
> increases the maintenance work of Sage a lot.  Removing spkg's later
> is very hard.

OK. Then I'll probably proceed as I did with my old cohomology spkg:
I'll use MeatAxe, but not as a separate spkg.

Cheers,
Simon

Simon King

unread,
Jul 14, 2011, 3:28:00 AM7/14/11
to sage-devel
PS:

On 14 Jul., 05:55, William Stein <wst...@gmail.com> wrote:
> Every single spkg that is added to sage
> increases the maintenance work of Sage a lot.  Removing spkg's later
> is very hard.

Note that there already is an experimental MeatAxe spkg. It provides
the MeatAxe executables, but does not provide the MeatAxe C-library
(which I would need) or a wrapper.

Volker Braun

unread,
Jul 14, 2011, 4:59:58 AM7/14/11
to sage-...@googlegroups.com
The LinBox matrix-vector multiplication is 

vectorMul (Vector1 &w, const Matrix &A, const Vector2 &v)  // w <- Av
vectorAxpyin (Vector1 &y, const Matrix &A, const Vector2 &x) // y <- y+Ax

I don't see a rowvector-matrix mul routine, but then that must be avoided in C storage order (row-major). 

Martin Albrecht

unread,
Jul 14, 2011, 7:26:43 AM7/14/11
to sage-...@googlegroups.com
> 2. Matrix times column:
> Over prime fields, the native Sage implementation is clearly
> fastest, MeatAxe is slower, LinBox is VERY slow.
> Over non-prime fields, MeatAxe is fastest, the naive implementation
> of the "trick" is 5 times slower.
>
> 3. Row times matrix:
> MeatAxe is clearly fastest both over prime and over non-prime
> fields -- unless someone has already wrapped a specialised
> multiplication command of LinBox??

Note that these operations are quadratic just as the conversion Sage =>
LinBox. Hence, I'm not that surprised that our LinBoX wrapper performs poorly
on those.

Simon King

unread,
Jul 14, 2011, 7:53:16 AM7/14/11
to sage-devel
Hi Martin,

On 14 Jul., 13:26, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> Note that these operations are quadratic just as the conversion Sage =>
> LinBox.

I am a bit surprised. My impression from reading
sage.matrix.matrix_modn_dense was that Sage stores the matrix data in
LinBox format anyway. So, did I get the wrong impression?

Cheers,
Simon

Martin Albrecht

unread,
Jul 14, 2011, 8:10:20 AM7/14/11
to sage-...@googlegroups.com

Hi,

linbox.matrix_matrix_multiply

calls

linbox_modn_dense_matrix_matrix_multiply

from the LinBox SPKG (src/interfaces/sage/linbox.C)

which calls

static DenseMatrix<ModInt> linbox_new_modn_matrix(mod_int modulus, mod_int**
matrix, size_t nrows, si\
ze_t ncols) {

ModInt F((double)modulus);

DenseMatrix<ModInt> A ( F, nrows, ncols);

size_t i, j, k;
for (i=0; i < nrows; i++) {
for (j=0; j < ncols; j++) {
A.setEntry(i, j, (double)matrix[i][j]);
}
}
return A;
};

which copies the matrix to a new matrix A.

Simon King

unread,
Jul 14, 2011, 10:37:33 AM7/14/11
to sage-devel
Hi Martin,

On 14 Jul., 14:10, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> Hi,
>
>    linbox.matrix_matrix_multiply
>
> calls
>
>     linbox_modn_dense_matrix_matrix_multiply
>
> from the LinBox SPKG (src/interfaces/sage/linbox.C)
>
> ...
>
> which copies the matrix to a new matrix A.

I see. So, the plan for a proper wrapper would be to use Linbox's data
format in the first place.

Perhaps it would be best to study http://sage.math.washington.edu/home/burcin/dense_ctypes.patch

If really coercion is the only missing bit then it is worth to give it
a try.

Cheers,
Simon

Simon King

unread,
Jul 14, 2011, 10:40:24 AM7/14/11
to sage-devel
PS:

On 14 Jul., 16:37, Simon King <simon.k...@uni-jena.de> wrote:
> Perhaps it would be best to study http://sage.math.washington.edu/home/burcin/dense_ctypes.patch

Is there a ticket where that patch is dealt with?

Burcin Erocal

unread,
Jul 18, 2011, 5:37:40 AM7/18/11
to sage-...@googlegroups.com
Hi Simon,

AFAIK, there are 3 tickets on trac: 4260, 4968, 4258

We could merge these and close 2 of them as duplicates.

When I get a chance I will take a look at redoing the template
structure in the dense_ctypes.patch. I am back to normal schedule
only today, so this will probably happen sometime tomorrow.

Perhaps you can help with the pickling, coercion, etc. problems once I
get the linbox wrapper working. ;)


Cheers,
Burcin

Martin Albrecht

unread,
Jul 18, 2011, 6:05:55 AM7/18/11
to sage-...@googlegroups.com
> When I get a chance I will take a look at redoing the template
> structure in the dense_ctypes.patch. I am back to normal schedule
> only today, so this will probably happen sometime tomorrow.
>
> Perhaps you can help with the pickling, coercion, etc. problems once I
> get the linbox wrapper working. ;)

Cool! My plan was to work on this during the upcoming Sage Days in Seattle at
the end of August. If you have a working prototype by then I'll try to finish
it off :)

Simon King

unread,
Jul 18, 2011, 6:07:35 AM7/18/11
to sage-devel
Hi Burcin,

On 18 Jul., 11:37, Burcin Erocal <bur...@erocal.org> wrote:
> AFAIK, there are 3 tickets on trac: 4260, 4968, 4258

Thank you!

> Perhaps you can help with the pickling, coercion, etc. problems once I
> get the linbox wrapper working. ;)

I can at least try.

In the meantime, I work on improving MeatAxe. For example, they don't
use Strassen multiplication. So, it is tempting for me to see whether
MeatAxe can benefit from the stuff in sage.matrix.strassen. For my
application (project on computation of Ext algebras), I will try to
write everything generically, so that eventually both MeatAxe and Sage/
Linbox matrices will work.

Cheers,
Simon

Ivo Hedtke

unread,
Jul 18, 2011, 6:15:33 AM7/18/11
to sage-...@googlegroups.com
Hi,

I am new to sage-devel. Sorry if I asked questions that you perhaps discussed in
the the past.

Why do you wan't to use Strassen for Matrix Multiplication? This algorithms is
not as stable as naive algorithm (if one uses floating point numbers, if you use
it for symbol computations that is no problem). It is also not very fast for
small matrices. And if you don't use the in-memory version of the algorithm it
has extremly memory requirements.

Why not Winograd's method. Or something like Tiling or Loop-Unrolling?

Again, sorry if you discussed this in the past.

Best wishes,
Ivo.

Martin Albrecht

unread,
Jul 18, 2011, 6:32:33 AM7/18/11
to sage-...@googlegroups.com
On Monday 18 July 2011, Ivo Hedtke wrote:
> Hi,

Hi Ivo,

> I am new to sage-devel. Sorry if I asked questions that you perhaps
> discussed in the the past.
>
> Why do you wan't to use Strassen for Matrix Multiplication? This algorithms
> is not as stable as naive algorithm (if one uses floating point numbers,
> if you use it for symbol computations that is no problem).

We are discussing using it over finite fields and hence there is no stability
issue.

> It is also not very fast for small matrices.

That's why one usually implements a cutoff: below a certain dimension naive or
or some other multiplication is used: both when the input matrix is small and
when the recursion reaches the small dimension.

> And if you don't use the in-memory version
> of the algorithm it has extremly memory requirements.

The memory overhead is 2/3 n^2 if you set it up right.

> Why not Winograd's method.

Usually, when people talk about Strassen they mean Strassen-Winograd, is that
what you mean? It just reduces the number of additions.

> Or something like Tiling or Loop-Unrolling?

Of course, one implements the base case as efficiently as possible. In
LinBox's case it's outsourced to ATLAS, in other cases there are dedicated
implementations. While these implementation tricks are crucial for good
performance, they won't beat Strassen-Winograd since they are asymptotically
slower: n^3 vs. n^2.807.

Simon King

unread,
Jul 18, 2011, 6:42:27 AM7/18/11
to sage-devel
Hi Ivo,

On 18 Jul., 12:15, Ivo Hedtke <hed...@me.com> wrote:
>
> Why do you wan't to use Strassen for Matrix Multiplication? This algorithms is
> not as stable as naive algorithm (if one uses floating point numbers, if you use
> it for symbol computations that is no problem).

I work over finite fields.

> It is also not very fast for
> small matrices.

That's why there is a cutoff.

> Why not Winograd's method.

Practical answer: Since I found that there is a framework for the
Strassen multiplication in Sage (see sage.matrix.strassen), but I
don't know whether a similar support exists for the Coppersmith-
Winograd algorithm.

search_src('Winograd') told me that the multiplication algorithm used
in sage.matrix.matrix_mod2_dense is called "Strassen-Winograd", but
the asymptotics seems to coincide with what I know as Strassen
algorithm.

> Or something like Tiling or Loop-Unrolling?

Would that change the asymptotics?

Cheers,
Simon

Simon King

unread,
Jul 18, 2011, 6:44:52 AM7/18/11
to sage-devel
Hi Martin,

On 18 Jul., 12:32, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> > Why not Winograd's method.
>
> Usually, when people talk about Strassen they mean Strassen-Winograd, is that
> what you mean? It just reduces the number of additions.

My guess is that Ivo refers to Coppersmith-Winograd, which, according
to Wikipedia, *is* faster than Strassen-Winograd: n^2.376 versus
n^2.807.

Cheers,
Simon

Simon King

unread,
Jul 18, 2011, 6:46:55 AM7/18/11
to sage-devel
PS:

On 18 Jul., 12:44, Simon King <simon.k...@uni-jena.de> wrote:
> > Usually, when people talk about Strassen they mean Strassen-Winograd, is that
> > what you mean? It just reduces the number of additions.
>
> My guess is that Ivo refers to Coppersmith-Winograd, which, according
> to Wikipedia, *is* faster than Strassen-Winograd: n^2.376 versus
> n^2.807.

... but only pays off for matrix sizes too big to be practical. I
guess that's why it is not in Sage.

hedtke

unread,
Jul 18, 2011, 7:32:11 AM7/18/11