147 views

Skip to first unread message

Jul 12, 2011, 12:09:42 PM7/12/11

to sage-devel

Hi!

Working at #11589, I revisited the almost-forgotten ticket #8096 (no

activity since 16 months), which aims at speeding up matrix

multiplication.

#10763 (already merged) and #11589 (positive review) provide some

improvement. However, using the defaults in Sage, the multiplication

of two random dense 2000x2000 matrices over GF(3) still takes 8.34

seconds (it used to be 12 seconds).

That does not compare nicely with my tuned (i.e. unpublished) version

of C-MeatAxe, which completes the same task in 1.72 seconds.

So, obviously it makes sense to try and improve Sage's matrix

multiplication. That may actually be rather easy, because linbox can

do the above multiplication in 1.65 seconds (using _multiply_linbox).

Is there a compelling reason why linbox multiplication is not the

default for dense matrices over finite fields?

Best regards,

Simon

Working at #11589, I revisited the almost-forgotten ticket #8096 (no

activity since 16 months), which aims at speeding up matrix

multiplication.

#10763 (already merged) and #11589 (positive review) provide some

improvement. However, using the defaults in Sage, the multiplication

of two random dense 2000x2000 matrices over GF(3) still takes 8.34

seconds (it used to be 12 seconds).

That does not compare nicely with my tuned (i.e. unpublished) version

of C-MeatAxe, which completes the same task in 1.72 seconds.

So, obviously it makes sense to try and improve Sage's matrix

multiplication. That may actually be rather easy, because linbox can

do the above multiplication in 1.65 seconds (using _multiply_linbox).

Is there a compelling reason why linbox multiplication is not the

default for dense matrices over finite fields?

Best regards,

Simon

Jul 12, 2011, 12:26:22 PM7/12/11

to sage-...@googlegroups.com

> Is there a compelling reason why linbox multiplication is not the

> default for dense matrices over finite fields?

> default for dense matrices over finite fields?

IIRC, this dates back to the time when we didn't ship ATLAS, i.e. LinBox could

have been built against some reference implementation BLAS (== slow).

It wasn't touched afterwards because people tried to move Matrix_modn_dense

over to LinBox as the native format (again, IIRC) and these projects stalled.

As far as I can tell the only downside of switching to LinBox is the increased

memory consumption.

Cheers,

Martin

--

name: Martin Albrecht

_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99

_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF

_www: http://martinralbrecht.wordpress.com/

_jab: martinr...@jabber.ccc.de

Jul 12, 2011, 1:10:40 PM7/12/11

to sage-...@googlegroups.com

Does linbox actually use blas for fast matrix multiplication over finite fields? I know its hard to tell from the linbox source code but I don't think floating point stuff could be faster than the direct application of Strassen's algorithm.

Jul 12, 2011, 1:48:08 PM7/12/11

to sage-...@googlegroups.com

Yes. It uses FFLAS/FFPACK for dense matrices over GF(p). Which in top of the

numerical routines it implements Strassen:

http://ljk.imag.fr/membres/Jean-Guillaume.Dumas/FFLAS/

As far as I know it's the fastest implementation for medium size primes (16

bits or so). For very small primes one can do better

1) by bit-slicing:

http://arxiv.org/abs/0901.1413

or 2) bit-packing:

http://wiki.sagemath.org/days10/JGDumasTalk

http://wiki.sagemath.org/days10?action=AttachFile&do=view&target=Dumas_SD10_talk.pdf

Btw. the main author of FFLAS/FFPACK Clément Pernet was the first Sage Postdoc

at UW.

It's a shame we are not using the libraries we ship to full potential but so

far no one stepped up (me included) to fix this situation.

Jul 12, 2011, 2:57:06 PM7/12/11

to sage-devel

Hi Martin,

On 12 Jul., 19:48, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

currently done in Sage.

> It's a shame we are not using the libraries we ship to full potential but so

> far no one stepped up (me included) to fix this situation.

OK, but how much work would be needed? It appears to me that there is

only very little overhead in calling the linbox routines (via

sage.libs.linbox.linbox), since the data of a Matrix_modn_dense is

already stored in the same format used by linbox (namely mod_int **).

In particular, I wounder about your statement that the memory

consumption would increase. Is that because of internal linbox

computations?

Here is how linbox is wrapped (see

sage.matrix.matrix_modn_dense.Matrix_modn_dense._multiply_linbox):

...

self._init_linbox()

sig_on()

linbox.matrix_matrix_multiply(ans._matrix, B._matrix,

B._nrows, B._ncols)

sig_off()

self._init_linbox() seems rather light-weight, and

linbox.matrix_multiply is also a very thin wrapper of the linbox

routines. Similarly, one proceeds with other routines (computing

echelon form and so on)

Actually I think it would even be easy enough to call the linbox

routines directly (without using the sage.libs.linbox.linbox wrapper,

thus avoiding all overhead). Since I need fast matrix arithmetics over

finite fields, I have some motivation to rewrite _mul_ accordingly.

But one question: Can linbox deal with non-prime finite fields?

Cheers,

Simon

On 12 Jul., 19:48, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

> As far as I know it's the fastest implementation for medium size primes (16

> bits or so). For very small primes one can do better

But actually linbox is already much faster over GF(3) than what is
> bits or so). For very small primes one can do better

currently done in Sage.

> It's a shame we are not using the libraries we ship to full potential but so

> far no one stepped up (me included) to fix this situation.

only very little overhead in calling the linbox routines (via

sage.libs.linbox.linbox), since the data of a Matrix_modn_dense is

already stored in the same format used by linbox (namely mod_int **).

In particular, I wounder about your statement that the memory

consumption would increase. Is that because of internal linbox

computations?

Here is how linbox is wrapped (see

sage.matrix.matrix_modn_dense.Matrix_modn_dense._multiply_linbox):

...

self._init_linbox()

sig_on()

linbox.matrix_matrix_multiply(ans._matrix, B._matrix,

B._nrows, B._ncols)

sig_off()

self._init_linbox() seems rather light-weight, and

linbox.matrix_multiply is also a very thin wrapper of the linbox

routines. Similarly, one proceeds with other routines (computing

echelon form and so on)

Actually I think it would even be easy enough to call the linbox

routines directly (without using the sage.libs.linbox.linbox wrapper,

thus avoiding all overhead). Since I need fast matrix arithmetics over

finite fields, I have some motivation to rewrite _mul_ accordingly.

But one question: Can linbox deal with non-prime finite fields?

Cheers,

Simon

Jul 12, 2011, 3:09:18 PM7/12/11

to sage-...@googlegroups.com, Clement Pernet, Burcin Erocal

Hi Simon,

> wrote:

> > As far as I know it's the fastest implementation for medium size primes

> > (16 bits or so). For very small primes one can do better

>

> But actually linbox is already much faster over GF(3) than what is

> currently done in Sage.

Yep, it was just meant as a statement about the viability of the chosen

strategy (using ATLAS)

> > It's a shame we are not using the libraries we ship to full potential but

> > so far no one stepped up (me included) to fix this situation.

>

> OK, but how much work would be needed?

It depends on how far you want to go:

1) Only switch multiplication over to LinBox, that's probably easy to do

2) Switch everything to LinBox of FFLAS/FFPACK directly. This probably needs

some hacking around.

> It appears to me that there is

> only very little overhead in calling the linbox routines (via

> sage.libs.linbox.linbox), since the data of a Matrix_modn_dense is

> already stored in the same format used by linbox (namely mod_int **).

> In particular, I wounder about your statement that the memory

> consumption would increase. Is that because of internal linbox

> computations?

> Here is how linbox is wrapped (see

> sage.matrix.matrix_modn_dense.Matrix_modn_dense._multiply_linbox):

> ...

> self._init_linbox()

> sig_on()

> linbox.matrix_matrix_multiply(ans._matrix, B._matrix,

> B._nrows, B._ncols)

> sig_off()

>

> self._init_linbox() seems rather light-weight, and

> linbox.matrix_multiply is also a very thin wrapper of the linbox

> routines. Similarly, one proceeds with other routines (computing

> echelon form and so on)

Actually, this is how we are wrapping our wrapper for LinBox which is part of

the LinBox SPKG. This is where the actual heavy lifting is done such as

converting ints to doubles etc. So what you see in Sage gives you a misleading

impression.

Btw. the reason why this wrapper exists is that LinBox takes so long to

compile and takes so much memory. A harmless doctest change thus triggered a

long compilation. Hence we outsourced it.

> Actually I think it would even be easy enough to call the linbox

> routines directly (without using the sage.libs.linbox.linbox wrapper,

> thus avoiding all overhead). Since I need fast matrix arithmetics over

> finite fields, I have some motivation to rewrite _mul_ accordingly.

If one is serious about this then one should probably use the FFLAS/FFPACK

low-level routines directly (Burcin might have some old patches for this, he's

the last person I know of who tried) since that avoids compiling all thee

layers of C++ in LinBox.

> But one question: Can linbox deal with non-prime finite fields?

They had some experimental code at some point IIRC but nothing for general

use.

Jul 12, 2011, 5:12:18 PM7/12/11

to sage-devel

Hi Martin,

On 12 Jul., 21:09, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

> > But one question: Can linbox deal with non-prime finite fields?

>

> They had some experimental code at some point IIRC but nothing for general

> use.

What a pity - I need non-prime fields as well.

But perhaps it is feasible to create a sub-class Matrix_modp_dense of

Matrix_modn_dense for matrices over finite prime fields -- overriding

_mul_ etc. to use the functions from sage.libs.linbox.linbox. In that

way, one could benefit from the LinBox speed at least to some extent.

I suppose that fits into the scope of #8096.

Best regards,

Simon

On 12 Jul., 21:09, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

> If one is serious about this then one should probably use the FFLAS/FFPACK

> low-level routines directly (Burcin might have some old patches for this, he's

> the last person I know of who tried) since that avoids compiling all thee

> layers of C++ in LinBox.

That would probably be beyond my scope.
> low-level routines directly (Burcin might have some old patches for this, he's

> the last person I know of who tried) since that avoids compiling all thee

> layers of C++ in LinBox.

> > But one question: Can linbox deal with non-prime finite fields?

>

> They had some experimental code at some point IIRC but nothing for general

> use.

But perhaps it is feasible to create a sub-class Matrix_modp_dense of

Matrix_modn_dense for matrices over finite prime fields -- overriding

_mul_ etc. to use the functions from sage.libs.linbox.linbox. In that

way, one could benefit from the LinBox speed at least to some extent.

I suppose that fits into the scope of #8096.

Best regards,

Simon

Jul 12, 2011, 5:35:51 PM7/12/11

to sage-...@googlegroups.com

On Tue, 12 Jul 2011 14:12:18 -0700 (PDT)

Simon King <simon...@uni-jena.de> wrote:

Simon King <simon...@uni-jena.de> wrote:

> On 12 Jul., 21:09, Martin Albrecht <martinralbre...@googlemail.com>

> wrote:

> > If one is serious about this then one should probably use the

> > FFLAS/FFPACK low-level routines directly (Burcin might have some

> > old patches for this, he's the last person I know of who tried)

> > since that avoids compiling all thee layers of C++ in LinBox.

>

> That would probably be beyond my scope.

At SD16, Clement and I looked into wrapping linbox directly. The speeds

were promising, but we never took the time to fix pickling, etc.

The patch is most likely unusable by now:

http://sage.math.washington.edu/home/burcin/dense_ctypes.patch

I can quickly redo the work to create a template that handles the short

and long ints as special cases if there is interest. The linbox calls

in that patch can also be much better now that Cython has C++ support.

Cheers,

Burcin

Jul 13, 2011, 5:48:08 AM7/13/11

to sage-devel

Hi!

On 12 Jul., 23:35, Burcin Erocal <bur...@erocal.org> wrote:

> On Tue, 12 Jul 2011 14:12:18 -0700 (PDT)

fields!

However, there is one complication: In my current project (computing

Ext-Algebras of basic algebras), I definitely need finite non-prime

fields, and the current implementation in Sage for that case is

definitely too slow. I just did an example of multiplying two 500x500

matrices over GF(125). Sage took 11.24 seconds, MeatAxe (with some

improvements that I implemented yesterday) took 0.2 seconds.

If LinBox really can not "officially" deal with non-prime fields, I

should probably stick with MeatAxe then (even though, in the small

version, it only supports fields of size <256).

My current MeatAxe wrapper (in my optional cohomology spkg) is based

on an old MeatAxe version (2.2.4). Perhaps I could take the most

recent version (2.4.13), modify my wrapper accordingly, and contribute

it to Sage. The MeatAxe licence is good enough for Sage, isn't it? But

is MeatAxe eligible to become a standard package? Currently, there

only is an experimental meataxe-2.4.3.spkg.

Correct me if the following is nonsense:

- The MeatAxe sources should go into an spkg. Question: How should I

deal with my modifications to the original MeatAxe sources? Do I

remember correctly that the spkg should contain the unmodified

sources, together with some automatically applied patches that are

stored in a sub-directory patches/?

- The spkg should create a library mtx.a, that should go into, say,

SAGE_ROOT/local/lib/mtx-2.4.13/, so that my wrapper can easily link

against it.

Best regards,

Simon

On 12 Jul., 23:35, Burcin Erocal <bur...@erocal.org> wrote:

> On Tue, 12 Jul 2011 14:12:18 -0700 (PDT)

> At SD16, Clement and I looked into wrapping linbox directly. The speeds

> were promising, but we never took the time to fix pickling, etc.

I, for one, would appreciate fast matrix multiplication over finite
> were promising, but we never took the time to fix pickling, etc.

fields!

However, there is one complication: In my current project (computing

Ext-Algebras of basic algebras), I definitely need finite non-prime

fields, and the current implementation in Sage for that case is

definitely too slow. I just did an example of multiplying two 500x500

matrices over GF(125). Sage took 11.24 seconds, MeatAxe (with some

improvements that I implemented yesterday) took 0.2 seconds.

If LinBox really can not "officially" deal with non-prime fields, I

should probably stick with MeatAxe then (even though, in the small

version, it only supports fields of size <256).

My current MeatAxe wrapper (in my optional cohomology spkg) is based

on an old MeatAxe version (2.2.4). Perhaps I could take the most

recent version (2.4.13), modify my wrapper accordingly, and contribute

it to Sage. The MeatAxe licence is good enough for Sage, isn't it? But

is MeatAxe eligible to become a standard package? Currently, there

only is an experimental meataxe-2.4.3.spkg.

Correct me if the following is nonsense:

- The MeatAxe sources should go into an spkg. Question: How should I

deal with my modifications to the original MeatAxe sources? Do I

remember correctly that the spkg should contain the unmodified

sources, together with some automatically applied patches that are

stored in a sub-directory patches/?

- The spkg should create a library mtx.a, that should go into, say,

SAGE_ROOT/local/lib/mtx-2.4.13/, so that my wrapper can easily link

against it.

Best regards,

Simon

Jul 13, 2011, 6:11:06 AM7/13/11

to sage-...@googlegroups.com

Hi,

> I, for one, would appreciate fast matrix multiplication over finite

> fields!

>

> However, there is one complication: In my current project (computing

> Ext-Algebras of basic algebras), I definitely need finite non-prime

> fields, and the current implementation in Sage for that case is

> definitely too slow. I just did an example of multiplying two 500x500

> matrices over GF(125). Sage took 11.24 seconds, MeatAxe (with some

> improvements that I implemented yesterday) took 0.2 seconds.

Do you know Tom Boothby's and Robert Bradshaw's excellent paper

Bitslicing and the Method of Four Russians Over Larger Finite Fields

http://arxiv.org/abs/0901.1413

I just took a naive version of that for a spin to see how well it compares:

# We represent matrices over polynomials as polynomials over matrices

sage: P.<A2,A1,A0,B2,B1,B0> = PolynomialRing(GF(5))

sage: R.<x> = P[]

sage: Q.<x> = R.quo(x^3 + 3*x + 3)

sage: f = (A2*x^2+A1*x+A0)

sage: g = (A2*x^2+A1*x+A0)

# so this is what we need to compute:

sage: f*g

(2*A2^2 + A1^2 + 2*A2*A0)*x^2 + (2*A2^2 - A2*A1 + 2*A1*A0)*x - A2*A1 + A0^2

# here, one would actually apply a Karatsuba to get the number of mults down,

# but I was too lazy and only did this (fingers crossed I didn't introduce a

# bug)

sage: def my_mul(A2,A1,A0, B2,B1,B0):

... A22 = A2._multiply_linbox(A2)

... A2A1 = A2._multiply_linbox(A1)

... A22_2 = 2*A22

... C2 = (A22_2 + A1._multiply_linbox(A1) + 2*A2._multiply_linbox(A0))

... C1 = (A22_2 - A2A1 + 2*A1._multiply_linbox(A0))

... C0 = - A2A1 + A0._multiply_linbox(A2)

... return C2,C1,C0

sage: A = [random_matrix(GF(5),500,500) for _ in range(3)]

sage: B = [random_matrix(GF(5),500,500) for _ in range(3)]

sage: %time _ =my_mul(*(A+B))

CPU time: 0.22 s, Wall time: 0.22 s

sage: A = [random_matrix(GF(5),2000,2000) for _ in range(3)]

sage: B = [random_matrix(GF(5),2000,2000) for _ in range(3)]

sage: %time _ =my_mul(*(A+B))

CPU time: 10.23 s, Wall time: 10.57 s

So even with all the Python overhead (and modulo bugs which I may have

introduced) this approach seems to be comparable to whatever Meataxe is doing.

I'd actually guess it gets faster than MeatAxe as dimensions grow since AFAIK

MeatAxe doesn't do matrix multiplication asymptotically fast?

> If LinBox really can not "officially" deal with non-prime fields, I

> should probably stick with MeatAxe then (even though, in the small

> version, it only supports fields of size <256).

>

> My current MeatAxe wrapper (in my optional cohomology spkg) is based

> on an old MeatAxe version (2.2.4). Perhaps I could take the most

> recent version (2.4.13), modify my wrapper accordingly, and contribute

> it to Sage. The MeatAxe licence is good enough for Sage, isn't it? But

> is MeatAxe eligible to become a standard package? Currently, there

> only is an experimental meataxe-2.4.3.spkg.

>

> Correct me if the following is nonsense:

> - The MeatAxe sources should go into an spkg. Question: How should I

> deal with my modifications to the original MeatAxe sources? Do I

> remember correctly that the spkg should contain the unmodified

> sources, together with some automatically applied patches that are

> stored in a sub-directory patches/?

> - The spkg should create a library mtx.a, that should go into, say,

> SAGE_ROOT/local/lib/mtx-2.4.13/, so that my wrapper can easily link

> against it.

>

> Best regards,

> Simon

Cheers,

Jul 13, 2011, 7:23:47 AM7/13/11

to sage-devel

Hi Martin,

On 13 Jul., 12:11, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

> sage: P.<A2,A1,A0,B2,B1,B0> = PolynomialRing(GF(5))

> sage: R.<x> = P[]

> sage: Q.<x> = R.quo(x^3 + 3*x + 3)

>

> sage: f = (A2*x^2+A1*x+A0)

> sage: g = (A2*x^2+A1*x+A0)

>

> # so this is what we need to compute:

>

> sage: f*g

> (2*A2^2 + A1^2 + 2*A2*A0)*x^2 + (2*A2^2 - A2*A1 + 2*A1*A0)*x - A2*A1 + A0^2

That is the same as f*f. Did you intend to do

sage: g = (B2*x^2+B1*x+B0)

sage: f*g

(2*A2*B2 + A0*B2 + A1*B1 + A2*B0)*x^2 + (2*A2*B2 + 2*A1*B2 + 2*A2*B1 +

A0*B1 + A1*B0)*x + 2*A1*B2 + 2*A2*B1 + A0*B0

?

> sage: def my_mul(A2,A1,A0, B2,B1,B0):

> ... A22 = A2._multiply_linbox(A2)

> ... A2A1 = A2._multiply_linbox(A1)

> ... A22_2 = 2*A22

> ... C2 = (A22_2 + A1._multiply_linbox(A1) + 2*A2._multiply_linbox(A0))

> ... C1 = (A22_2 - A2A1 + 2*A1._multiply_linbox(A0))

> ... C0 = - A2A1 + A0._multiply_linbox(A2)

> ... return C2,C1,C0

B2,B1,B0 are not used, it just computes the square of the first

argument.

f*g from above yields

sage: def my_mul(A2,A1,A0, B2,B1,B0):

....: A0B0 = A0._multiply_linbox(B0)

....: A0B1 = A0._multiply_linbox(B1)

....: A0B2 = A0._multiply_linbox(B2)

....: A1B0 = A1._multiply_linbox(B0)

....: A1B1 = A1._multiply_linbox(B1)

....: A1B2 = A1._multiply_linbox(B2)*2

....: A2B0 = A2._multiply_linbox(B0)

....: A2B1 = A2._multiply_linbox(B1)*2

....: A2B2 = A2._multiply_linbox(B2)*2

....: C2 = A2B2+A0B2+A1B1+A2B0

....: C1 = A2B2+A1B2+A2B1+A0B1+A1B0

....: C0 = A1B2+A2B1+A0B0

....: return C2,C1,C0

....:

Wall time: 0.48 s

Wall time: 15.51 s

It is a bit slower than MeatAxe (for 2000x2000 over GF(125), it needs

11.68 seconds on my computer), but I think that's not bad for the

first attempt.

Best regards,

Simon

On 13 Jul., 12:11, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

> Do you know Tom Boothby's and Robert Bradshaw's excellent paper

>

> Bitslicing and the Method of Four Russians Over Larger Finite Fields

>

> http://arxiv.org/abs/0901.1413

Thank you for your hint!
>

> Bitslicing and the Method of Four Russians Over Larger Finite Fields

>

> http://arxiv.org/abs/0901.1413

> sage: P.<A2,A1,A0,B2,B1,B0> = PolynomialRing(GF(5))

> sage: R.<x> = P[]

> sage: Q.<x> = R.quo(x^3 + 3*x + 3)

>

> sage: f = (A2*x^2+A1*x+A0)

> sage: g = (A2*x^2+A1*x+A0)

>

> # so this is what we need to compute:

>

> sage: f*g

> (2*A2^2 + A1^2 + 2*A2*A0)*x^2 + (2*A2^2 - A2*A1 + 2*A1*A0)*x - A2*A1 + A0^2

sage: g = (B2*x^2+B1*x+B0)

sage: f*g

(2*A2*B2 + A0*B2 + A1*B1 + A2*B0)*x^2 + (2*A2*B2 + 2*A1*B2 + 2*A2*B1 +

A0*B1 + A1*B0)*x + 2*A1*B2 + 2*A2*B1 + A0*B0

?

> sage: def my_mul(A2,A1,A0, B2,B1,B0):

> ... A22 = A2._multiply_linbox(A2)

> ... A2A1 = A2._multiply_linbox(A1)

> ... A22_2 = 2*A22

> ... C2 = (A22_2 + A1._multiply_linbox(A1) + 2*A2._multiply_linbox(A0))

> ... C1 = (A22_2 - A2A1 + 2*A1._multiply_linbox(A0))

> ... C0 = - A2A1 + A0._multiply_linbox(A2)

> ... return C2,C1,C0

argument.

f*g from above yields

sage: def my_mul(A2,A1,A0, B2,B1,B0):

....: A0B0 = A0._multiply_linbox(B0)

....: A0B1 = A0._multiply_linbox(B1)

....: A0B2 = A0._multiply_linbox(B2)

....: A1B0 = A1._multiply_linbox(B0)

....: A1B1 = A1._multiply_linbox(B1)

....: A1B2 = A1._multiply_linbox(B2)*2

....: A2B0 = A2._multiply_linbox(B0)

....: A2B1 = A2._multiply_linbox(B1)*2

....: A2B2 = A2._multiply_linbox(B2)*2

....: C2 = A2B2+A0B2+A1B1+A2B0

....: C1 = A2B2+A1B2+A2B1+A0B1+A1B0

....: C0 = A1B2+A2B1+A0B0

....: return C2,C1,C0

....:

sage: A = [random_matrix(GF(5),500,500) for _ in range(3)]

sage: B = [random_matrix(GF(5),500,500) for _ in range(3)]

sage: %time _ =my_mul(*(A+B))

CPU times: user 0.42 s, sys: 0.05 s, total: 0.47 s
sage: B = [random_matrix(GF(5),500,500) for _ in range(3)]

sage: %time _ =my_mul(*(A+B))

Wall time: 0.48 s

sage: A = [random_matrix(GF(5),2000,2000) for _ in range(3)]

sage: B = [random_matrix(GF(5),2000,2000) for _ in range(3)]

sage: %time _ =my_mul(*(A+B))

CPU times: user 15.08 s, sys: 0.39 s, total: 15.46 s
sage: B = [random_matrix(GF(5),2000,2000) for _ in range(3)]

sage: %time _ =my_mul(*(A+B))

Wall time: 15.51 s

It is a bit slower than MeatAxe (for 2000x2000 over GF(125), it needs

11.68 seconds on my computer), but I think that's not bad for the

first attempt.

Best regards,

Simon

Jul 13, 2011, 7:50:31 AM7/13/11

to sage-...@googlegroups.com

> > sage: P.<A2,A1,A0,B2,B1,B0> = PolynomialRing(GF(5))

> > sage: R.<x> = P[]

> > sage: Q.<x> = R.quo(x^3 + 3*x + 3)

> >

> > sage: f = (A2*x^2+A1*x+A0)

> > sage: g = (A2*x^2+A1*x+A0)

> >

> > # so this is what we need to compute:

> >

> > sage: f*g

> > (2*A2^2 + A1^2 + 2*A2*A0)*x^2 + (2*A2^2 - A2*A1 + 2*A1*A0)*x - A2*A1 +

> > A0^2

>

> That is the same as f*f. Did you intend to do

> sage: g = (B2*x^2+B1*x+B0)

> sage: f*g

> (2*A2*B2 + A0*B2 + A1*B1 + A2*B0)*x^2 + (2*A2*B2 + 2*A1*B2 + 2*A2*B1 +

> A0*B1 + A1*B0)*x + 2*A1*B2 + 2*A2*B1 + A0*B0

> ?

> > sage: R.<x> = P[]

> > sage: Q.<x> = R.quo(x^3 + 3*x + 3)

> >

> > sage: f = (A2*x^2+A1*x+A0)

> > sage: g = (A2*x^2+A1*x+A0)

> >

> > # so this is what we need to compute:

> >

> > sage: f*g

> > (2*A2^2 + A1^2 + 2*A2*A0)*x^2 + (2*A2^2 - A2*A1 + 2*A1*A0)*x - A2*A1 +

> > A0^2

>

> That is the same as f*f. Did you intend to do

> sage: g = (B2*x^2+B1*x+B0)

> sage: f*g

> (2*A2*B2 + A0*B2 + A1*B1 + A2*B0)*x^2 + (2*A2*B2 + 2*A1*B2 + 2*A2*B1 +

> A0*B1 + A1*B0)*x + 2*A1*B2 + 2*A2*B1 + A0*B0

> ?

Ouch, yes.

Note that these timings are for the school book multiplication (i.e. 9 = 3^2

multiplications) instead of Karatsuba-style, Python overhead and Sage <=>

LinBox overhead.

It seems one can bring this down to 7 multiplications:

Btw. Magma does it in 4.37 seconds on my computer, to get a sense of what is

the state-of-the-art.

Jul 13, 2011, 8:40:27 AM7/13/11

to sage-devel

Hi Martin,

On 13 Jul., 13:50, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

most naive, given the basic idea.

> Btw. Magma does it in 4.37 seconds on my computer, to get a sense of what is

> the state-of-the-art.

You mean for 2000x2000 over GF(125)? So, it is a long way to go, even

for MeatAxe. BTW, Sage's current implementation needs 906.69 s for a

single multiplication.

If I understand correctly, the basic idea can be easily automatised:

- The given base field F (=GF(125), for instance) determines a

(Conway) polynomial p of degree d.

- One starts with a polynomial ring R over the corresponding prime

field F0 and 2*d variables, constructs the polynomial ring in x over

R, and mods out p. The result is a quotient ring Q.

- Multiplication of two "general" elements of Q, represented by degree-

(d-1) polynomials whose coefficients are the 2*d variables from R,

results in a d-tuple T of elements of R.

- Matrices over F are represented by d-tuples of matrices over F0, and

multiplication is obtained by inserting the elements of these d-tuples

into the elements of T.

But is it possible to *automatically* find a Karatsuba- or Toom-Cook-

type formula that reduces the number of products that appear in T?

An example from the paper: Over GF(4), the multiplication of A=A0+xA1

with B=B0+xB1 (using p=x^2+x+1) would naively be

(A0*B0+A1*B1) + x*(A1*B1+A1*B0+A0*B1)

But using Karatsuba, it becomes

(A0*B0+A1*B1) + x*((A0+A1)*(B0+B1)+A0*B0),

which saves one product.

Is there an automated procedure that finds the second formula, given

the first? If not, how could one do better than with schoolboy

multiplication, in the general case?

Best regards,

Simon

On 13 Jul., 13:50, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

> Note that these timings are for the school book multiplication (i.e. 9 = 3^2

> multiplications) instead of Karatsuba-style, Python overhead and Sage <=>

> LinBox overhead.

Sure. As you said in your previous post, that implementation is the
> multiplications) instead of Karatsuba-style, Python overhead and Sage <=>

> LinBox overhead.

most naive, given the basic idea.

> Btw. Magma does it in 4.37 seconds on my computer, to get a sense of what is

> the state-of-the-art.

for MeatAxe. BTW, Sage's current implementation needs 906.69 s for a

single multiplication.

If I understand correctly, the basic idea can be easily automatised:

- The given base field F (=GF(125), for instance) determines a

(Conway) polynomial p of degree d.

- One starts with a polynomial ring R over the corresponding prime

field F0 and 2*d variables, constructs the polynomial ring in x over

R, and mods out p. The result is a quotient ring Q.

- Multiplication of two "general" elements of Q, represented by degree-

(d-1) polynomials whose coefficients are the 2*d variables from R,

results in a d-tuple T of elements of R.

- Matrices over F are represented by d-tuples of matrices over F0, and

multiplication is obtained by inserting the elements of these d-tuples

into the elements of T.

But is it possible to *automatically* find a Karatsuba- or Toom-Cook-

type formula that reduces the number of products that appear in T?

An example from the paper: Over GF(4), the multiplication of A=A0+xA1

with B=B0+xB1 (using p=x^2+x+1) would naively be

(A0*B0+A1*B1) + x*(A1*B1+A1*B0+A0*B1)

But using Karatsuba, it becomes

(A0*B0+A1*B1) + x*((A0+A1)*(B0+B1)+A0*B0),

which saves one product.

Is there an automated procedure that finds the second formula, given

the first? If not, how could one do better than with schoolboy

multiplication, in the general case?

Best regards,

Simon

Jul 13, 2011, 9:03:09 AM7/13/11

to sage-...@googlegroups.com

Hi,

> > Btw. Magma does it in 4.37 seconds on my computer, to get a sense of what

> > is the state-of-the-art.

>

> You mean for 2000x2000 over GF(125)? So, it is a long way to go, even

> for MeatAxe.

Yes, and it should get worse as you increase the degree or did they implement

Strassen in MeatAxe by now?

> BTW, Sage's current implementation needs 906.69 s for a

> single multiplication.

We suck at this it seems :)

> If I understand correctly, the basic idea can be easily automatised:

> - The given base field F (=GF(125), for instance) determines a

> (Conway) polynomial p of degree d.

> - One starts with a polynomial ring R over the corresponding prime

> field F0 and 2*d variables, constructs the polynomial ring in x over

> R, and mods out p. The result is a quotient ring Q.

> - Multiplication of two "general" elements of Q, represented by degree-

> (d-1) polynomials whose coefficients are the 2*d variables from R,

> results in a d-tuple T of elements of R.

> - Matrices over F are represented by d-tuples of matrices over F0, and

> multiplication is obtained by inserting the elements of these d-tuples

> into the elements of T.

Yep & it would be much faster already than what we have currently.

> But is it possible to *automatically* find a Karatsuba- or Toom-Cook-

> type formula that reduces the number of products that appear in T?

>

> An example from the paper: Over GF(4), the multiplication of A=A0+xA1

> with B=B0+xB1 (using p=x^2+x+1) would naively be

> (A0*B0+A1*B1) + x*(A1*B1+A1*B0+A0*B1)

> But using Karatsuba, it becomes

> (A0*B0+A1*B1) + x*((A0+A1)*(B0+B1)+A0*B0),

> which saves one product.

>

> Is there an automated procedure that finds the second formula, given

> the first? If not, how could one do better than with schoolboy

> multiplication, in the general case?

Well, the paper does search for Karatsuba-style formulas automatically, but

it's a non-trivial (brute-force-ish) computation. So far, there does not seem

to be anything better than this. But other people would know better than me.

Jul 13, 2011, 9:11:03 AM7/13/11

to sage-devel

Hi Martin,

On 13 Jul., 15:03, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

According to the Boothby-Bradshaw article you were referring to, their

ideas were supposed to be implemented in Sage. Is that done, yet? I

wouldn't like to double the work...

Best regards,

Simon

On 13 Jul., 15:03, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

> > BTW, Sage's current implementation needs 906.69 s for a

> > single multiplication.

>

> We suck at this it seems :)

Totally.
> > single multiplication.

>

> We suck at this it seems :)

According to the Boothby-Bradshaw article you were referring to, their

ideas were supposed to be implemented in Sage. Is that done, yet? I

wouldn't like to double the work...

Best regards,

Simon

Jul 13, 2011, 9:25:25 AM7/13/11

to sage-devel

Hi,

Most questions about finite fields matmul have been answered, in

short:

- one could quickly fix the pb that sage does not use linbox by

default (I'm surprised by this fact)

- in the current state, operands are converted from ints to double,

hence a overhead in both time

and memory.

- Burcin & I started to address this by rewriting the matrix-modn-

dense class using floating point

representation: a prototype code was working but we wen't into

troubles when polishing it and making

it work with the coercion system. Unfortunately I never found time to

get back to it since then.

- Further timing improvements are expected by using floats instead of

doubles for tiny fields (eg

roughly GF(p<100)) with no pain (already available in LinBox)

- Non prime fields are supported in LinBox, as long as their

cardinality remains small (LinBox

implementation has to be improved following http://hal.archives-ouvertes.fr/hal-00315772

Theorem 2)

but the wrapping in sage is missing. So I think it might probably

be simpler to wrapp it rather

than creating a new spkg with MeatAxe.

Clément

PS: weird, my reply by email has been rejected and I'm now posting via

the web interface.

Most questions about finite fields matmul have been answered, in

short:

- one could quickly fix the pb that sage does not use linbox by

default (I'm surprised by this fact)

- in the current state, operands are converted from ints to double,

hence a overhead in both time

and memory.

- Burcin & I started to address this by rewriting the matrix-modn-

dense class using floating point

representation: a prototype code was working but we wen't into

troubles when polishing it and making

it work with the coercion system. Unfortunately I never found time to

get back to it since then.

- Further timing improvements are expected by using floats instead of

doubles for tiny fields (eg

roughly GF(p<100)) with no pain (already available in LinBox)

- Non prime fields are supported in LinBox, as long as their

cardinality remains small (LinBox

implementation has to be improved following http://hal.archives-ouvertes.fr/hal-00315772

Theorem 2)

but the wrapping in sage is missing. So I think it might probably

be simpler to wrapp it rather

than creating a new spkg with MeatAxe.

Clément

PS: weird, my reply by email has been rejected and I'm now posting via

the web interface.

Jul 13, 2011, 10:00:34 AM7/13/11

to sage-devel

Hi!

On the one hand, I need multiplication of square matrices over finite

non-prime fields. From the things readily available in Sage (+ my

MeatAxe wrapper), MeatAxe is best, the native Sage implementation

sucks, and the trick described in the Boothby-Bradshaw paper (using

LinBox for the prime field under the hood) is almost as good as

MeatAxe even in naive implementation.

On the other hand, I need multiplication of square matrices with

vectors, again over non-prime fields. Here, Meataxe is clearly best.

Over GF(125), multiplication of a 2000x2000 matrix with a 2000x1

matrix takes 24.6 ms, native Sage needs 327 ms, and the Boothby-

Bradshaw trick needs 109 ms. The problem is that LinBox appears to be

rather slow in multiplying a 2000x2000 matrix with a 2000x1 matrix

over GF(5) -- slower than native Sage, actually.

But I suppose that there is a special LinBox function for

multiplication with a vector. How is that function called? Is it

wrapped?

Things are getting confusing. I need a break.

Best regards,

Simon

On the one hand, I need multiplication of square matrices over finite

non-prime fields. From the things readily available in Sage (+ my

MeatAxe wrapper), MeatAxe is best, the native Sage implementation

sucks, and the trick described in the Boothby-Bradshaw paper (using

LinBox for the prime field under the hood) is almost as good as

MeatAxe even in naive implementation.

On the other hand, I need multiplication of square matrices with

vectors, again over non-prime fields. Here, Meataxe is clearly best.

Over GF(125), multiplication of a 2000x2000 matrix with a 2000x1

matrix takes 24.6 ms, native Sage needs 327 ms, and the Boothby-

Bradshaw trick needs 109 ms. The problem is that LinBox appears to be

rather slow in multiplying a 2000x2000 matrix with a 2000x1 matrix

over GF(5) -- slower than native Sage, actually.

But I suppose that there is a special LinBox function for

multiplication with a vector. How is that function called? Is it

wrapped?

Things are getting confusing. I need a break.

Best regards,

Simon

Jul 13, 2011, 12:29:19 PM7/13/11

to sage-devel

Hi Clement,

On 13 Jul., 15:25, Clement Pernet <clement.per...@gmail.com> wrote:

> ...

correctly, finishing the LinBox wrapper requires C++ knowledge, which

I don't really have. But I do have a wrapper for an old (but tuned)

version of MeatAxe, and I do have a wrapper for a more recent (but

untuned) version of MeatAxe. The former is part of my optional

cohomology-spkg.

Currently, it seems that in that way I would get most easily a good

performance for the functions that I need:

1. Matrix times Matrix:

MeatAxe is only slightly slower over prime fields than LinBox.

Over non-prime fields, it is faster than a naive implementation of

the Boothby-Bradshaw trick.

2. Matrix times column:

Over prime fields, the native Sage implementation is clearly

fastest, MeatAxe is slower, LinBox is VERY slow.

Over non-prime fields, MeatAxe is fastest, the naive implementation

of the "trick" is 5 times slower.

3. Row times matrix:

MeatAxe is clearly fastest both over prime and over non-prime

fields -- unless someone has already wrapped a specialised

multiplication command of LinBox??

What I mainly need is 1. and 3., both over prime and over non-prime

fields. It seems to me that extending my MeatAxe wrapper is (for me)

easier than extending the LinBox wrapper, and gives reasonably good

performance. I'll go for it.

QUESTION:

According to their web site, "The MeatAxe should run on most UNIX

platforms including Linux, NetBSD, Ultrix, Solaris, HP-UX, and on

Windows NT." So, a MeatAxe wrapper might actually be (eventually) a

candidate for a standard package, isn't it?

OTOH, the current MeatAxe licence is GPL 2. Does that disqualify

MeatAxe? Would GPL 2+ be required?

> PS: weird, my reply by email has been rejected and I'm now posting via

> the web interface.

I do that all the time...

Cheers,

Simon

On 13 Jul., 15:25, Clement Pernet <clement.per...@gmail.com> wrote:

> ...

> but the wrapping in sage is missing. So I think it might probably

> be simpler to wrapp it rather

> than creating a new spkg with MeatAxe.

I am not so sure if that would be easier FOR ME. If I understand
> be simpler to wrapp it rather

> than creating a new spkg with MeatAxe.

correctly, finishing the LinBox wrapper requires C++ knowledge, which

I don't really have. But I do have a wrapper for an old (but tuned)

version of MeatAxe, and I do have a wrapper for a more recent (but

untuned) version of MeatAxe. The former is part of my optional

cohomology-spkg.

Currently, it seems that in that way I would get most easily a good

performance for the functions that I need:

1. Matrix times Matrix:

MeatAxe is only slightly slower over prime fields than LinBox.

Over non-prime fields, it is faster than a naive implementation of

the Boothby-Bradshaw trick.

2. Matrix times column:

Over prime fields, the native Sage implementation is clearly

fastest, MeatAxe is slower, LinBox is VERY slow.

Over non-prime fields, MeatAxe is fastest, the naive implementation

of the "trick" is 5 times slower.

3. Row times matrix:

MeatAxe is clearly fastest both over prime and over non-prime

fields -- unless someone has already wrapped a specialised

multiplication command of LinBox??

What I mainly need is 1. and 3., both over prime and over non-prime

fields. It seems to me that extending my MeatAxe wrapper is (for me)

easier than extending the LinBox wrapper, and gives reasonably good

performance. I'll go for it.

QUESTION:

According to their web site, "The MeatAxe should run on most UNIX

platforms including Linux, NetBSD, Ultrix, Solaris, HP-UX, and on

Windows NT." So, a MeatAxe wrapper might actually be (eventually) a

candidate for a standard package, isn't it?

OTOH, the current MeatAxe licence is GPL 2. Does that disqualify

MeatAxe? Would GPL 2+ be required?

> PS: weird, my reply by email has been rejected and I'm now posting via

> the web interface.

Cheers,

Simon

Jul 13, 2011, 11:55:41 PM7/13/11

to sage-...@googlegroups.com

On Wed, Jul 13, 2011 at 6:29 PM, Simon King <simon...@uni-jena.de> wrote:

> QUESTION:

> According to their web site, "The MeatAxe should run on most UNIX

> platforms including Linux, NetBSD, Ultrix, Solaris, HP-UX, and on

> Windows NT." So, a MeatAxe wrapper might actually be (eventually) a

> candidate for a standard package, isn't it?

> OTOH, the current MeatAxe licence is GPL 2.

> Does that disqualify MeatAxe?

> QUESTION:

> According to their web site, "The MeatAxe should run on most UNIX

> platforms including Linux, NetBSD, Ultrix, Solaris, HP-UX, and on

> Windows NT." So, a MeatAxe wrapper might actually be (eventually) a

> candidate for a standard package, isn't it?

> OTOH, the current MeatAxe licence is GPL 2.

> Does that disqualify MeatAxe?

Yes, it disqualifies it.

> Would GPL 2+ be required?

Yes, that would be required.

I think people will have some issues with including an spkg that

replicates functionality available probably in code already in Sage,

with sufficient work. Every single spkg that is added to sage

increases the maintenance work of Sage a lot. Removing spkg's later

is very hard.

-- William

Jul 14, 2011, 3:21:04 AM7/14/11

to sage-devel

Hi William,

On 14 Jul., 05:55, William Stein <wst...@gmail.com> wrote:

> I think people will have some issues with including an spkg that

> replicates functionality available probably in code already in Sage,

> with sufficient work.

From my perspective, your last line is decisive: I would probably not

easily be able to make Sage full use of LinBox - my C++ knowledge is

rather limited, and I know nothing about LinBox. So, in order to use

that for my project, I would have to wait for other people doing the

work.

Moreover, my benchmarks seem to indicate that *for the purpose of my

project* MeatAxe performs relatively well (unfortunately not compared

with Magma, as it seems).

> Every single spkg that is added to sage

> increases the maintenance work of Sage a lot. Removing spkg's later

> is very hard.

OK. Then I'll probably proceed as I did with my old cohomology spkg:

I'll use MeatAxe, but not as a separate spkg.

Cheers,

Simon

On 14 Jul., 05:55, William Stein <wst...@gmail.com> wrote:

> I think people will have some issues with including an spkg that

> replicates functionality available probably in code already in Sage,

> with sufficient work.

easily be able to make Sage full use of LinBox - my C++ knowledge is

rather limited, and I know nothing about LinBox. So, in order to use

that for my project, I would have to wait for other people doing the

work.

Moreover, my benchmarks seem to indicate that *for the purpose of my

project* MeatAxe performs relatively well (unfortunately not compared

with Magma, as it seems).

> Every single spkg that is added to sage

> increases the maintenance work of Sage a lot. Removing spkg's later

> is very hard.

I'll use MeatAxe, but not as a separate spkg.

Cheers,

Simon

Jul 14, 2011, 3:28:00 AM7/14/11

to sage-devel

PS:

the MeatAxe executables, but does not provide the MeatAxe C-library

(which I would need) or a wrapper.

> Every single spkg that is added to sage

> increases the maintenance work of Sage a lot. Removing spkg's later

> is very hard.

Note that there already is an experimental MeatAxe spkg. It provides
> increases the maintenance work of Sage a lot. Removing spkg's later

> is very hard.

the MeatAxe executables, but does not provide the MeatAxe C-library

(which I would need) or a wrapper.

Jul 14, 2011, 4:59:58 AM7/14/11

to sage-...@googlegroups.com

The LinBox matrix-vector multiplication is

vectorMul (Vector1 &w, const Matrix &A, const Vector2 &v) // w <- Av

vectorAxpyin (Vector1 &y, const Matrix &A, const Vector2 &x) // y <- y+Ax

I don't see a rowvector-matrix mul routine, but then that must be avoided in C storage order (row-major).

Jul 14, 2011, 7:26:43 AM7/14/11

to sage-...@googlegroups.com

> 2. Matrix times column:

> Over prime fields, the native Sage implementation is clearly

> fastest, MeatAxe is slower, LinBox is VERY slow.

> Over non-prime fields, MeatAxe is fastest, the naive implementation

> of the "trick" is 5 times slower.

>

> 3. Row times matrix:

> MeatAxe is clearly fastest both over prime and over non-prime

> fields -- unless someone has already wrapped a specialised

> multiplication command of LinBox??

> Over prime fields, the native Sage implementation is clearly

> fastest, MeatAxe is slower, LinBox is VERY slow.

> Over non-prime fields, MeatAxe is fastest, the naive implementation

> of the "trick" is 5 times slower.

>

> 3. Row times matrix:

> MeatAxe is clearly fastest both over prime and over non-prime

> fields -- unless someone has already wrapped a specialised

> multiplication command of LinBox??

Note that these operations are quadratic just as the conversion Sage =>

LinBox. Hence, I'm not that surprised that our LinBoX wrapper performs poorly

on those.

Jul 14, 2011, 7:53:16 AM7/14/11

to sage-devel

Hi Martin,

On 14 Jul., 13:26, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

sage.matrix.matrix_modn_dense was that Sage stores the matrix data in

LinBox format anyway. So, did I get the wrong impression?

Cheers,

Simon

On 14 Jul., 13:26, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

> Note that these operations are quadratic just as the conversion Sage =>

> LinBox.

I am a bit surprised. My impression from reading
> LinBox.

sage.matrix.matrix_modn_dense was that Sage stores the matrix data in

LinBox format anyway. So, did I get the wrong impression?

Cheers,

Simon

Jul 14, 2011, 8:10:20 AM7/14/11

to sage-...@googlegroups.com

Hi,

linbox.matrix_matrix_multiply

calls

linbox_modn_dense_matrix_matrix_multiply

from the LinBox SPKG (src/interfaces/sage/linbox.C)

which calls

static DenseMatrix<ModInt> linbox_new_modn_matrix(mod_int modulus, mod_int**

matrix, size_t nrows, si\

ze_t ncols) {

ModInt F((double)modulus);

DenseMatrix<ModInt> A ( F, nrows, ncols);

size_t i, j, k;

for (i=0; i < nrows; i++) {

for (j=0; j < ncols; j++) {

A.setEntry(i, j, (double)matrix[i][j]);

}

}

return A;

};

which copies the matrix to a new matrix A.

Jul 14, 2011, 10:37:33 AM7/14/11

to sage-devel

Hi Martin,

On 14 Jul., 14:10, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

format in the first place.

Perhaps it would be best to study http://sage.math.washington.edu/home/burcin/dense_ctypes.patch

If really coercion is the only missing bit then it is worth to give it

a try.

Cheers,

Simon

On 14 Jul., 14:10, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

> Hi,

>

> linbox.matrix_matrix_multiply

>

> calls

>

> linbox_modn_dense_matrix_matrix_multiply

>

> from the LinBox SPKG (src/interfaces/sage/linbox.C)

>

> ...
>

> linbox.matrix_matrix_multiply

>

> calls

>

> linbox_modn_dense_matrix_matrix_multiply

>

> from the LinBox SPKG (src/interfaces/sage/linbox.C)

>

>

> which copies the matrix to a new matrix A.

I see. So, the plan for a proper wrapper would be to use Linbox's data
> which copies the matrix to a new matrix A.

format in the first place.

Perhaps it would be best to study http://sage.math.washington.edu/home/burcin/dense_ctypes.patch

If really coercion is the only missing bit then it is worth to give it

a try.

Cheers,

Simon

Jul 14, 2011, 10:40:24 AM7/14/11

to sage-devel

PS:

On 14 Jul., 16:37, Simon King <simon.k...@uni-jena.de> wrote:

> Perhaps it would be best to study http://sage.math.washington.edu/home/burcin/dense_ctypes.patch

Is there a ticket where that patch is dealt with?

On 14 Jul., 16:37, Simon King <simon.k...@uni-jena.de> wrote:

> Perhaps it would be best to study http://sage.math.washington.edu/home/burcin/dense_ctypes.patch

Jul 18, 2011, 5:37:40 AM7/18/11

to sage-...@googlegroups.com

Hi Simon,

AFAIK, there are 3 tickets on trac: 4260, 4968, 4258

We could merge these and close 2 of them as duplicates.

When I get a chance I will take a look at redoing the template

structure in the dense_ctypes.patch. I am back to normal schedule

only today, so this will probably happen sometime tomorrow.

Perhaps you can help with the pickling, coercion, etc. problems once I

get the linbox wrapper working. ;)

Cheers,

Burcin

Jul 18, 2011, 6:05:55 AM7/18/11

to sage-...@googlegroups.com

> When I get a chance I will take a look at redoing the template

> structure in the dense_ctypes.patch. I am back to normal schedule

> only today, so this will probably happen sometime tomorrow.

>

> Perhaps you can help with the pickling, coercion, etc. problems once I

> get the linbox wrapper working. ;)

> structure in the dense_ctypes.patch. I am back to normal schedule

> only today, so this will probably happen sometime tomorrow.

>

> Perhaps you can help with the pickling, coercion, etc. problems once I

> get the linbox wrapper working. ;)

Cool! My plan was to work on this during the upcoming Sage Days in Seattle at

the end of August. If you have a working prototype by then I'll try to finish

it off :)

Jul 18, 2011, 6:07:35 AM7/18/11

to sage-devel

Hi Burcin,

On 18 Jul., 11:37, Burcin Erocal <bur...@erocal.org> wrote:

> AFAIK, there are 3 tickets on trac: 4260, 4968, 4258

Thank you!

> Perhaps you can help with the pickling, coercion, etc. problems once I

> get the linbox wrapper working. ;)

I can at least try.

In the meantime, I work on improving MeatAxe. For example, they don't

use Strassen multiplication. So, it is tempting for me to see whether

MeatAxe can benefit from the stuff in sage.matrix.strassen. For my

application (project on computation of Ext algebras), I will try to

write everything generically, so that eventually both MeatAxe and Sage/

Linbox matrices will work.

Cheers,

Simon

On 18 Jul., 11:37, Burcin Erocal <bur...@erocal.org> wrote:

> AFAIK, there are 3 tickets on trac: 4260, 4968, 4258

> Perhaps you can help with the pickling, coercion, etc. problems once I

> get the linbox wrapper working. ;)

In the meantime, I work on improving MeatAxe. For example, they don't

use Strassen multiplication. So, it is tempting for me to see whether

MeatAxe can benefit from the stuff in sage.matrix.strassen. For my

application (project on computation of Ext algebras), I will try to

write everything generically, so that eventually both MeatAxe and Sage/

Linbox matrices will work.

Cheers,

Simon

Jul 18, 2011, 6:15:33 AM7/18/11

to sage-...@googlegroups.com

Hi,

I am new to sage-devel. Sorry if I asked questions that you perhaps discussed in

the the past.

Why do you wan't to use Strassen for Matrix Multiplication? This algorithms is

not as stable as naive algorithm (if one uses floating point numbers, if you use

it for symbol computations that is no problem). It is also not very fast for

small matrices. And if you don't use the in-memory version of the algorithm it

has extremly memory requirements.

Why not Winograd's method. Or something like Tiling or Loop-Unrolling?

Again, sorry if you discussed this in the past.

Best wishes,

Ivo.

Jul 18, 2011, 6:32:33 AM7/18/11

to sage-...@googlegroups.com

On Monday 18 July 2011, Ivo Hedtke wrote:

> Hi,

> Hi,

Hi Ivo,

> I am new to sage-devel. Sorry if I asked questions that you perhaps

> discussed in the the past.

>

> Why do you wan't to use Strassen for Matrix Multiplication? This algorithms

> is not as stable as naive algorithm (if one uses floating point numbers,

> if you use it for symbol computations that is no problem).

We are discussing using it over finite fields and hence there is no stability

issue.

> It is also not very fast for small matrices.

That's why one usually implements a cutoff: below a certain dimension naive or

or some other multiplication is used: both when the input matrix is small and

when the recursion reaches the small dimension.

> And if you don't use the in-memory version

> of the algorithm it has extremly memory requirements.

The memory overhead is 2/3 n^2 if you set it up right.

> Why not Winograd's method.

Usually, when people talk about Strassen they mean Strassen-Winograd, is that

what you mean? It just reduces the number of additions.

> Or something like Tiling or Loop-Unrolling?

Of course, one implements the base case as efficiently as possible. In

LinBox's case it's outsourced to ATLAS, in other cases there are dedicated

implementations. While these implementation tricks are crucial for good

performance, they won't beat Strassen-Winograd since they are asymptotically

slower: n^3 vs. n^2.807.

Jul 18, 2011, 6:42:27 AM7/18/11

to sage-devel

Hi Ivo,

On 18 Jul., 12:15, Ivo Hedtke <hed...@me.com> wrote:

>

> Why do you wan't to use Strassen for Matrix Multiplication? This algorithms is

> not as stable as naive algorithm (if one uses floating point numbers, if you use

> it for symbol computations that is no problem).

I work over finite fields.

> It is also not very fast for

> small matrices.

That's why there is a cutoff.

> Why not Winograd's method.

Practical answer: Since I found that there is a framework for the

Strassen multiplication in Sage (see sage.matrix.strassen), but I

don't know whether a similar support exists for the Coppersmith-

Winograd algorithm.

search_src('Winograd') told me that the multiplication algorithm used

in sage.matrix.matrix_mod2_dense is called "Strassen-Winograd", but

the asymptotics seems to coincide with what I know as Strassen

algorithm.

> Or something like Tiling or Loop-Unrolling?

Would that change the asymptotics?

Cheers,

Simon

On 18 Jul., 12:15, Ivo Hedtke <hed...@me.com> wrote:

>

> Why do you wan't to use Strassen for Matrix Multiplication? This algorithms is

> not as stable as naive algorithm (if one uses floating point numbers, if you use

> it for symbol computations that is no problem).

> It is also not very fast for

> small matrices.

> Why not Winograd's method.

Strassen multiplication in Sage (see sage.matrix.strassen), but I

don't know whether a similar support exists for the Coppersmith-

Winograd algorithm.

search_src('Winograd') told me that the multiplication algorithm used

in sage.matrix.matrix_mod2_dense is called "Strassen-Winograd", but

the asymptotics seems to coincide with what I know as Strassen

algorithm.

> Or something like Tiling or Loop-Unrolling?

Cheers,

Simon

Jul 18, 2011, 6:44:52 AM7/18/11

to sage-devel

Hi Martin,

On 18 Jul., 12:32, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

to Wikipedia, *is* faster than Strassen-Winograd: n^2.376 versus

n^2.807.

Cheers,

Simon

On 18 Jul., 12:32, Martin Albrecht <martinralbre...@googlemail.com>

wrote:

> > Why not Winograd's method.

>

> Usually, when people talk about Strassen they mean Strassen-Winograd, is that

> what you mean? It just reduces the number of additions.

My guess is that Ivo refers to Coppersmith-Winograd, which, according
>

> Usually, when people talk about Strassen they mean Strassen-Winograd, is that

> what you mean? It just reduces the number of additions.

to Wikipedia, *is* faster than Strassen-Winograd: n^2.376 versus

n^2.807.

Cheers,

Simon

Jul 18, 2011, 6:46:55 AM7/18/11

to sage-devel

PS:

On 18 Jul., 12:44, Simon King <simon.k...@uni-jena.de> wrote:

> > Usually, when people talk about Strassen they mean Strassen-Winograd, is that

> > what you mean? It just reduces the number of additions.

>

> My guess is that Ivo refers to Coppersmith-Winograd, which, according

> to Wikipedia, *is* faster than Strassen-Winograd: n^2.376 versus

> n^2.807.

... but only pays off for matrix sizes too big to be practical. I

guess that's why it is not in Sage.

On 18 Jul., 12:44, Simon King <simon.k...@uni-jena.de> wrote:

> > Usually, when people talk about Strassen they mean Strassen-Winograd, is that

> > what you mean? It just reduces the number of additions.

>

> My guess is that Ivo refers to Coppersmith-Winograd, which, according

> to Wikipedia, *is* faster than Strassen-Winograd: n^2.376 versus

> n^2.807.

guess that's why it is not in Sage.

Jul 18, 2011, 7:32:11 AM7/18/11

to sage-...@googlegroups.com

Hi Simon and Martin,

I know Strassen-Winograd. Strassen uses 18 additions, Strassen-Winograd only 15, which is optimal for 7 multiplications ;-)

With the in-memory variant I mean: "Boyer, Dumans, Pernet and Zhou: Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm. International Symposium on Symbolic and Algebraic Computation 2009."

This needs only a constant number of auxiliary variables, i think.

OK. If you work on finite fields, there is no stability issue.

Loop Unrolling or Tilling doesn't change the asymptotic complexity. But we can use Knowledge about the computer architecture (L1 Cache, etc), to speed up the computations by a factor of 2.

Please remember that the parameters for Strassens algorightm from his original paper are far away from optimal. I mean that the original method only was build for matrices of the size m*2^k. Strassen said something like: If you ant to calculate the product of two n x n matrices for an arbitrary n, choose m=f(n) and k=g(n) for some f and g (see his paper from 1969). These f and g are NOT a good choice. There are better ones from R. Probert and others.

Best wishes,

Ivo.

PS:

On 18 Jul., 12:44, Simon King <simon.k...@uni-jena.de> wrote:

> > Usually, when people talk about Strassen they mean Strassen-Winograd, is that

> > what you mean? It just reduces the number of additions.

>

> My guess is that Ivo refers to Coppersmith-Winograd, which, according

> to Wikipedia, *is* faster than Strassen-Winograd: n^2.376 versus

> n^2807.

... but only pays off for matrix sizes too big to be practical. I

guess that's why it is not in Sage.

--

To post to this group, send an email to sage-...@googlegroups.com

To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com

For more options, visit this group at http://groups.google.com/group/sage-devel

URL: http://www.sagemath.org

Jul 18, 2011, 8:09:47 AM7/18/11

to sage-devel

Hi Ivo,

On 18 Jul., 13:32, hedtke <hed...@me.com> wrote:

> I know Strassen-Winograd. Strassen uses 18 additions, Strassen-Winograd only 15, which is optimal for 7 multiplications ;-)

Looking at sage.matrix.strassen.strassen_multiply_window, I find 15

additions/subtractions. So, it seems that Strassen==Strassen-Winograd

in Sage.

> With the in-memory variant I mean: "Boyer, Dumans, Pernet and Zhou: Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm. International Symposium on Symbolic and Algebraic Computation 2009."

I am not so sure about the memory efficiency of the implementation,

though. One comment says

# todo: we can probably save some memory in these

# operations by reusing some of the buffers, if we interleave

# these additions with the multiplications (below)

Moreover, the algorithm is called recursively, which probably means

that there are a lot of temporary variables stored in memory.

Thank you for the pointer to a memory efficient version!

Best regards,

Simon

On 18 Jul., 13:32, hedtke <hed...@me.com> wrote:

> I know Strassen-Winograd. Strassen uses 18 additions, Strassen-Winograd only 15, which is optimal for 7 multiplications ;-)

additions/subtractions. So, it seems that Strassen==Strassen-Winograd

in Sage.

> With the in-memory variant I mean: "Boyer, Dumans, Pernet and Zhou: Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm. International Symposium on Symbolic and Algebraic Computation 2009."

though. One comment says

# todo: we can probably save some memory in these

# operations by reusing some of the buffers, if we interleave

# these additions with the multiplications (below)

Moreover, the algorithm is called recursively, which probably means

that there are a lot of temporary variables stored in memory.

Thank you for the pointer to a memory efficient version!

Best regards,

Simon