IIRC, this dates back to the time when we didn't ship ATLAS, i.e. LinBox could
have been built against some reference implementation BLAS (== slow).
It wasn't touched afterwards because people tried to move Matrix_modn_dense
over to LinBox as the native format (again, IIRC) and these projects stalled.
As far as I can tell the only downside of switching to LinBox is the increased
memory consumption.
Cheers,
Martin
--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
_www: http://martinralbrecht.wordpress.com/
_jab: martinr...@jabber.ccc.de
Yes. It uses FFLAS/FFPACK for dense matrices over GF(p). Which in top of the
numerical routines it implements Strassen:
http://ljk.imag.fr/membres/Jean-Guillaume.Dumas/FFLAS/
As far as I know it's the fastest implementation for medium size primes (16
bits or so). For very small primes one can do better
1) by bit-slicing:
http://arxiv.org/abs/0901.1413
or 2) bit-packing:
http://wiki.sagemath.org/days10/JGDumasTalk
http://wiki.sagemath.org/days10?action=AttachFile&do=view&target=Dumas_SD10_talk.pdf
Btw. the main author of FFLAS/FFPACK Clément Pernet was the first Sage Postdoc
at UW.
It's a shame we are not using the libraries we ship to full potential but so
far no one stepped up (me included) to fix this situation.
> wrote:
> > As far as I know it's the fastest implementation for medium size primes
> > (16 bits or so). For very small primes one can do better
>
> But actually linbox is already much faster over GF(3) than what is
> currently done in Sage.
Yep, it was just meant as a statement about the viability of the chosen
strategy (using ATLAS)
> > It's a shame we are not using the libraries we ship to full potential but
> > so far no one stepped up (me included) to fix this situation.
>
> OK, but how much work would be needed?
It depends on how far you want to go:
1) Only switch multiplication over to LinBox, that's probably easy to do
2) Switch everything to LinBox of FFLAS/FFPACK directly. This probably needs
some hacking around.
> It appears to me that there is
> only very little overhead in calling the linbox routines (via
> sage.libs.linbox.linbox), since the data of a Matrix_modn_dense is
> already stored in the same format used by linbox (namely mod_int **).
> In particular, I wounder about your statement that the memory
> consumption would increase. Is that because of internal linbox
> computations?
> Here is how linbox is wrapped (see
> sage.matrix.matrix_modn_dense.Matrix_modn_dense._multiply_linbox):
> ...
> self._init_linbox()
> sig_on()
> linbox.matrix_matrix_multiply(ans._matrix, B._matrix,
> B._nrows, B._ncols)
> sig_off()
>
> self._init_linbox() seems rather light-weight, and
> linbox.matrix_multiply is also a very thin wrapper of the linbox
> routines. Similarly, one proceeds with other routines (computing
> echelon form and so on)
Actually, this is how we are wrapping our wrapper for LinBox which is part of
the LinBox SPKG. This is where the actual heavy lifting is done such as
converting ints to doubles etc. So what you see in Sage gives you a misleading
impression.
Btw. the reason why this wrapper exists is that LinBox takes so long to
compile and takes so much memory. A harmless doctest change thus triggered a
long compilation. Hence we outsourced it.
> Actually I think it would even be easy enough to call the linbox
> routines directly (without using the sage.libs.linbox.linbox wrapper,
> thus avoiding all overhead). Since I need fast matrix arithmetics over
> finite fields, I have some motivation to rewrite _mul_ accordingly.
If one is serious about this then one should probably use the FFLAS/FFPACK
low-level routines directly (Burcin might have some old patches for this, he's
the last person I know of who tried) since that avoids compiling all thee
layers of C++ in LinBox.
> But one question: Can linbox deal with non-prime finite fields?
They had some experimental code at some point IIRC but nothing for general
use.
> On 12 Jul., 21:09, Martin Albrecht <martinralbre...@googlemail.com>
> wrote:
> > If one is serious about this then one should probably use the
> > FFLAS/FFPACK low-level routines directly (Burcin might have some
> > old patches for this, he's the last person I know of who tried)
> > since that avoids compiling all thee layers of C++ in LinBox.
>
> That would probably be beyond my scope.
At SD16, Clement and I looked into wrapping linbox directly. The speeds
were promising, but we never took the time to fix pickling, etc.
The patch is most likely unusable by now:
http://sage.math.washington.edu/home/burcin/dense_ctypes.patch
I can quickly redo the work to create a template that handles the short
and long ints as special cases if there is interest. The linbox calls
in that patch can also be much better now that Cython has C++ support.
Cheers,
Burcin
> I, for one, would appreciate fast matrix multiplication over finite
> fields!
>
> However, there is one complication: In my current project (computing
> Ext-Algebras of basic algebras), I definitely need finite non-prime
> fields, and the current implementation in Sage for that case is
> definitely too slow. I just did an example of multiplying two 500x500
> matrices over GF(125). Sage took 11.24 seconds, MeatAxe (with some
> improvements that I implemented yesterday) took 0.2 seconds.
Do you know Tom Boothby's and Robert Bradshaw's excellent paper
Bitslicing and the Method of Four Russians Over Larger Finite Fields
http://arxiv.org/abs/0901.1413
I just took a naive version of that for a spin to see how well it compares:
# We represent matrices over polynomials as polynomials over matrices
sage: P.<A2,A1,A0,B2,B1,B0> = PolynomialRing(GF(5))
sage: R.<x> = P[]
sage: Q.<x> = R.quo(x^3 + 3*x + 3)
sage: f = (A2*x^2+A1*x+A0)
sage: g = (A2*x^2+A1*x+A0)
# so this is what we need to compute:
sage: f*g
(2*A2^2 + A1^2 + 2*A2*A0)*x^2 + (2*A2^2 - A2*A1 + 2*A1*A0)*x - A2*A1 + A0^2
# here, one would actually apply a Karatsuba to get the number of mults down,
# but I was too lazy and only did this (fingers crossed I didn't introduce a
# bug)
sage: def my_mul(A2,A1,A0, B2,B1,B0):
... A22 = A2._multiply_linbox(A2)
... A2A1 = A2._multiply_linbox(A1)
... A22_2 = 2*A22
... C2 = (A22_2 + A1._multiply_linbox(A1) + 2*A2._multiply_linbox(A0))
... C1 = (A22_2 - A2A1 + 2*A1._multiply_linbox(A0))
... C0 = - A2A1 + A0._multiply_linbox(A2)
... return C2,C1,C0
sage: A = [random_matrix(GF(5),500,500) for _ in range(3)]
sage: B = [random_matrix(GF(5),500,500) for _ in range(3)]
sage: %time _ =my_mul(*(A+B))
CPU time: 0.22 s, Wall time: 0.22 s
sage: A = [random_matrix(GF(5),2000,2000) for _ in range(3)]
sage: B = [random_matrix(GF(5),2000,2000) for _ in range(3)]
sage: %time _ =my_mul(*(A+B))
CPU time: 10.23 s, Wall time: 10.57 s
So even with all the Python overhead (and modulo bugs which I may have
introduced) this approach seems to be comparable to whatever Meataxe is doing.
I'd actually guess it gets faster than MeatAxe as dimensions grow since AFAIK
MeatAxe doesn't do matrix multiplication asymptotically fast?
> If LinBox really can not "officially" deal with non-prime fields, I
> should probably stick with MeatAxe then (even though, in the small
> version, it only supports fields of size <256).
>
> My current MeatAxe wrapper (in my optional cohomology spkg) is based
> on an old MeatAxe version (2.2.4). Perhaps I could take the most
> recent version (2.4.13), modify my wrapper accordingly, and contribute
> it to Sage. The MeatAxe licence is good enough for Sage, isn't it? But
> is MeatAxe eligible to become a standard package? Currently, there
> only is an experimental meataxe-2.4.3.spkg.
>
> Correct me if the following is nonsense:
> - The MeatAxe sources should go into an spkg. Question: How should I
> deal with my modifications to the original MeatAxe sources? Do I
> remember correctly that the spkg should contain the unmodified
> sources, together with some automatically applied patches that are
> stored in a sub-directory patches/?
> - The spkg should create a library mtx.a, that should go into, say,
> SAGE_ROOT/local/lib/mtx-2.4.13/, so that my wrapper can easily link
> against it.
>
> Best regards,
> Simon
Cheers,
Ouch, yes.
Note that these timings are for the school book multiplication (i.e. 9 = 3^2
multiplications) instead of Karatsuba-style, Python overhead and Sage <=>
LinBox overhead.
It seems one can bring this down to 7 multiplications:
Btw. Magma does it in 4.37 seconds on my computer, to get a sense of what is
the state-of-the-art.
> > Btw. Magma does it in 4.37 seconds on my computer, to get a sense of what
> > is the state-of-the-art.
>
> You mean for 2000x2000 over GF(125)? So, it is a long way to go, even
> for MeatAxe.
Yes, and it should get worse as you increase the degree or did they implement
Strassen in MeatAxe by now?
> BTW, Sage's current implementation needs 906.69 s for a
> single multiplication.
We suck at this it seems :)
> If I understand correctly, the basic idea can be easily automatised:
> - The given base field F (=GF(125), for instance) determines a
> (Conway) polynomial p of degree d.
> - One starts with a polynomial ring R over the corresponding prime
> field F0 and 2*d variables, constructs the polynomial ring in x over
> R, and mods out p. The result is a quotient ring Q.
> - Multiplication of two "general" elements of Q, represented by degree-
> (d-1) polynomials whose coefficients are the 2*d variables from R,
> results in a d-tuple T of elements of R.
> - Matrices over F are represented by d-tuples of matrices over F0, and
> multiplication is obtained by inserting the elements of these d-tuples
> into the elements of T.
Yep & it would be much faster already than what we have currently.
> But is it possible to *automatically* find a Karatsuba- or Toom-Cook-
> type formula that reduces the number of products that appear in T?
>
> An example from the paper: Over GF(4), the multiplication of A=A0+xA1
> with B=B0+xB1 (using p=x^2+x+1) would naively be
> (A0*B0+A1*B1) + x*(A1*B1+A1*B0+A0*B1)
> But using Karatsuba, it becomes
> (A0*B0+A1*B1) + x*((A0+A1)*(B0+B1)+A0*B0),
> which saves one product.
>
> Is there an automated procedure that finds the second formula, given
> the first? If not, how could one do better than with schoolboy
> multiplication, in the general case?
Well, the paper does search for Karatsuba-style formulas automatically, but
it's a non-trivial (brute-force-ish) computation. So far, there does not seem
to be anything better than this. But other people would know better than me.
Yes, it disqualifies it.
> Would GPL 2+ be required?
Yes, that would be required.
I think people will have some issues with including an spkg that
replicates functionality available probably in code already in Sage,
with sufficient work. Every single spkg that is added to sage
increases the maintenance work of Sage a lot. Removing spkg's later
is very hard.
-- William
Note that these operations are quadratic just as the conversion Sage =>
LinBox. Hence, I'm not that surprised that our LinBoX wrapper performs poorly
on those.
Hi,
linbox.matrix_matrix_multiply
calls
linbox_modn_dense_matrix_matrix_multiply
from the LinBox SPKG (src/interfaces/sage/linbox.C)
which calls
static DenseMatrix<ModInt> linbox_new_modn_matrix(mod_int modulus, mod_int**
matrix, size_t nrows, si\
ze_t ncols) {
ModInt F((double)modulus);
DenseMatrix<ModInt> A ( F, nrows, ncols);
size_t i, j, k;
for (i=0; i < nrows; i++) {
for (j=0; j < ncols; j++) {
A.setEntry(i, j, (double)matrix[i][j]);
}
}
return A;
};
which copies the matrix to a new matrix A.
AFAIK, there are 3 tickets on trac: 4260, 4968, 4258
We could merge these and close 2 of them as duplicates.
When I get a chance I will take a look at redoing the template
structure in the dense_ctypes.patch. I am back to normal schedule
only today, so this will probably happen sometime tomorrow.
Perhaps you can help with the pickling, coercion, etc. problems once I
get the linbox wrapper working. ;)
Cheers,
Burcin
Cool! My plan was to work on this during the upcoming Sage Days in Seattle at
the end of August. If you have a working prototype by then I'll try to finish
it off :)
I am new to sage-devel. Sorry if I asked questions that you perhaps discussed in
the the past.
Why do you wan't to use Strassen for Matrix Multiplication? This algorithms is
not as stable as naive algorithm (if one uses floating point numbers, if you use
it for symbol computations that is no problem). It is also not very fast for
small matrices. And if you don't use the in-memory version of the algorithm it
has extremly memory requirements.
Why not Winograd's method. Or something like Tiling or Loop-Unrolling?
Again, sorry if you discussed this in the past.
Best wishes,
Ivo.