IIRC, this dates back to the time when we didn't ship ATLAS, i.e. LinBox could
have been built against some reference implementation BLAS (== slow).
It wasn't touched afterwards because people tried to move Matrix_modn_dense
over to LinBox as the native format (again, IIRC) and these projects stalled.
As far as I can tell the only downside of switching to LinBox is the increased
memory consumption.
Cheers,
Martin
--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
_www: http://martinralbrecht.wordpress.com/
_jab: martinr...@jabber.ccc.de
Yes. It uses FFLAS/FFPACK for dense matrices over GF(p). Which in top of the
numerical routines it implements Strassen:
http://ljk.imag.fr/membres/Jean-Guillaume.Dumas/FFLAS/
As far as I know it's the fastest implementation for medium size primes (16
bits or so). For very small primes one can do better
1) by bit-slicing:
http://arxiv.org/abs/0901.1413
or 2) bit-packing:
http://wiki.sagemath.org/days10/JGDumasTalk
http://wiki.sagemath.org/days10?action=AttachFile&do=view&target=Dumas_SD10_talk.pdf
Btw. the main author of FFLAS/FFPACK Clément Pernet was the first Sage Postdoc
at UW.
It's a shame we are not using the libraries we ship to full potential but so
far no one stepped up (me included) to fix this situation.
> wrote:
> > As far as I know it's the fastest implementation for medium size primes
> > (16 bits or so). For very small primes one can do better
>
> But actually linbox is already much faster over GF(3) than what is
> currently done in Sage.
Yep, it was just meant as a statement about the viability of the chosen
strategy (using ATLAS)
> > It's a shame we are not using the libraries we ship to full potential but
> > so far no one stepped up (me included) to fix this situation.
>
> OK, but how much work would be needed?
It depends on how far you want to go:
1) Only switch multiplication over to LinBox, that's probably easy to do
2) Switch everything to LinBox of FFLAS/FFPACK directly. This probably needs
some hacking around.
> It appears to me that there is
> only very little overhead in calling the linbox routines (via
> sage.libs.linbox.linbox), since the data of a Matrix_modn_dense is
> already stored in the same format used by linbox (namely mod_int **).
> In particular, I wounder about your statement that the memory
> consumption would increase. Is that because of internal linbox
> computations?
> Here is how linbox is wrapped (see
> sage.matrix.matrix_modn_dense.Matrix_modn_dense._multiply_linbox):
> ...
> self._init_linbox()
> sig_on()
> linbox.matrix_matrix_multiply(ans._matrix, B._matrix,
> B._nrows, B._ncols)
> sig_off()
>
> self._init_linbox() seems rather light-weight, and
> linbox.matrix_multiply is also a very thin wrapper of the linbox
> routines. Similarly, one proceeds with other routines (computing
> echelon form and so on)
Actually, this is how we are wrapping our wrapper for LinBox which is part of
the LinBox SPKG. This is where the actual heavy lifting is done such as
converting ints to doubles etc. So what you see in Sage gives you a misleading
impression.
Btw. the reason why this wrapper exists is that LinBox takes so long to
compile and takes so much memory. A harmless doctest change thus triggered a
long compilation. Hence we outsourced it.
> Actually I think it would even be easy enough to call the linbox
> routines directly (without using the sage.libs.linbox.linbox wrapper,
> thus avoiding all overhead). Since I need fast matrix arithmetics over
> finite fields, I have some motivation to rewrite _mul_ accordingly.
If one is serious about this then one should probably use the FFLAS/FFPACK
low-level routines directly (Burcin might have some old patches for this, he's
the last person I know of who tried) since that avoids compiling all thee
layers of C++ in LinBox.
> But one question: Can linbox deal with non-prime finite fields?
They had some experimental code at some point IIRC but nothing for general
use.
> On 12 Jul., 21:09, Martin Albrecht <martinralbre...@googlemail.com>
> wrote:
> > If one is serious about this then one should probably use the
> > FFLAS/FFPACK low-level routines directly (Burcin might have some
> > old patches for this, he's the last person I know of who tried)
> > since that avoids compiling all thee layers of C++ in LinBox.
>
> That would probably be beyond my scope.
At SD16, Clement and I looked into wrapping linbox directly. The speeds
were promising, but we never took the time to fix pickling, etc.
The patch is most likely unusable by now:
http://sage.math.washington.edu/home/burcin/dense_ctypes.patch
I can quickly redo the work to create a template that handles the short
and long ints as special cases if there is interest. The linbox calls
in that patch can also be much better now that Cython has C++ support.
Cheers,
Burcin
> I, for one, would appreciate fast matrix multiplication over finite
> fields!
>
> However, there is one complication: In my current project (computing
> Ext-Algebras of basic algebras), I definitely need finite non-prime
> fields, and the current implementation in Sage for that case is
> definitely too slow. I just did an example of multiplying two 500x500
> matrices over GF(125). Sage took 11.24 seconds, MeatAxe (with some
> improvements that I implemented yesterday) took 0.2 seconds.
Do you know Tom Boothby's and Robert Bradshaw's excellent paper
Bitslicing and the Method of Four Russians Over Larger Finite Fields
http://arxiv.org/abs/0901.1413
I just took a naive version of that for a spin to see how well it compares:
# We represent matrices over polynomials as polynomials over matrices
sage: P.<A2,A1,A0,B2,B1,B0> = PolynomialRing(GF(5))
sage: R.<x> = P[]
sage: Q.<x> = R.quo(x^3 + 3*x + 3)
sage: f = (A2*x^2+A1*x+A0)
sage: g = (A2*x^2+A1*x+A0)
# so this is what we need to compute:
sage: f*g
(2*A2^2 + A1^2 + 2*A2*A0)*x^2 + (2*A2^2 - A2*A1 + 2*A1*A0)*x - A2*A1 + A0^2
# here, one would actually apply a Karatsuba to get the number of mults down,
# but I was too lazy and only did this (fingers crossed I didn't introduce a
# bug)
sage: def my_mul(A2,A1,A0, B2,B1,B0):
... A22 = A2._multiply_linbox(A2)
... A2A1 = A2._multiply_linbox(A1)
... A22_2 = 2*A22
... C2 = (A22_2 + A1._multiply_linbox(A1) + 2*A2._multiply_linbox(A0))
... C1 = (A22_2 - A2A1 + 2*A1._multiply_linbox(A0))
... C0 = - A2A1 + A0._multiply_linbox(A2)
... return C2,C1,C0
sage: A = [random_matrix(GF(5),500,500) for _ in range(3)]
sage: B = [random_matrix(GF(5),500,500) for _ in range(3)]
sage: %time _ =my_mul(*(A+B))
CPU time: 0.22 s, Wall time: 0.22 s
sage: A = [random_matrix(GF(5),2000,2000) for _ in range(3)]
sage: B = [random_matrix(GF(5),2000,2000) for _ in range(3)]
sage: %time _ =my_mul(*(A+B))
CPU time: 10.23 s, Wall time: 10.57 s
So even with all the Python overhead (and modulo bugs which I may have
introduced) this approach seems to be comparable to whatever Meataxe is doing.
I'd actually guess it gets faster than MeatAxe as dimensions grow since AFAIK
MeatAxe doesn't do matrix multiplication asymptotically fast?
> If LinBox really can not "officially" deal with non-prime fields, I
> should probably stick with MeatAxe then (even though, in the small
> version, it only supports fields of size <256).
>
> My current MeatAxe wrapper (in my optional cohomology spkg) is based
> on an old MeatAxe version (2.2.4). Perhaps I could take the most
> recent version (2.4.13), modify my wrapper accordingly, and contribute
> it to Sage. The MeatAxe licence is good enough for Sage, isn't it? But
> is MeatAxe eligible to become a standard package? Currently, there
> only is an experimental meataxe-2.4.3.spkg.
>
> Correct me if the following is nonsense:
> - The MeatAxe sources should go into an spkg. Question: How should I
> deal with my modifications to the original MeatAxe sources? Do I
> remember correctly that the spkg should contain the unmodified
> sources, together with some automatically applied patches that are
> stored in a sub-directory patches/?
> - The spkg should create a library mtx.a, that should go into, say,
> SAGE_ROOT/local/lib/mtx-2.4.13/, so that my wrapper can easily link
> against it.
>
> Best regards,
> Simon
Cheers,
Ouch, yes.
Note that these timings are for the school book multiplication (i.e. 9 = 3^2
multiplications) instead of Karatsuba-style, Python overhead and Sage <=>
LinBox overhead.
It seems one can bring this down to 7 multiplications:
Btw. Magma does it in 4.37 seconds on my computer, to get a sense of what is
the state-of-the-art.
> > Btw. Magma does it in 4.37 seconds on my computer, to get a sense of what
> > is the state-of-the-art.
>
> You mean for 2000x2000 over GF(125)? So, it is a long way to go, even
> for MeatAxe.
Yes, and it should get worse as you increase the degree or did they implement
Strassen in MeatAxe by now?
> BTW, Sage's current implementation needs 906.69 s for a
> single multiplication.
We suck at this it seems :)
> If I understand correctly, the basic idea can be easily automatised:
> - The given base field F (=GF(125), for instance) determines a
> (Conway) polynomial p of degree d.
> - One starts with a polynomial ring R over the corresponding prime
> field F0 and 2*d variables, constructs the polynomial ring in x over
> R, and mods out p. The result is a quotient ring Q.
> - Multiplication of two "general" elements of Q, represented by degree-
> (d-1) polynomials whose coefficients are the 2*d variables from R,
> results in a d-tuple T of elements of R.
> - Matrices over F are represented by d-tuples of matrices over F0, and
> multiplication is obtained by inserting the elements of these d-tuples
> into the elements of T.
Yep & it would be much faster already than what we have currently.
> But is it possible to *automatically* find a Karatsuba- or Toom-Cook-
> type formula that reduces the number of products that appear in T?
>
> An example from the paper: Over GF(4), the multiplication of A=A0+xA1
> with B=B0+xB1 (using p=x^2+x+1) would naively be
> (A0*B0+A1*B1) + x*(A1*B1+A1*B0+A0*B1)
> But using Karatsuba, it becomes
> (A0*B0+A1*B1) + x*((A0+A1)*(B0+B1)+A0*B0),
> which saves one product.
>
> Is there an automated procedure that finds the second formula, given
> the first? If not, how could one do better than with schoolboy
> multiplication, in the general case?
Well, the paper does search for Karatsuba-style formulas automatically, but
it's a non-trivial (brute-force-ish) computation. So far, there does not seem
to be anything better than this. But other people would know better than me.
Yes, it disqualifies it.
> Would GPL 2+ be required?
Yes, that would be required.
I think people will have some issues with including an spkg that
replicates functionality available probably in code already in Sage,
with sufficient work. Every single spkg that is added to sage
increases the maintenance work of Sage a lot. Removing spkg's later
is very hard.
-- William
Note that these operations are quadratic just as the conversion Sage =>
LinBox. Hence, I'm not that surprised that our LinBoX wrapper performs poorly
on those.
Hi,
linbox.matrix_matrix_multiply
calls
linbox_modn_dense_matrix_matrix_multiply
from the LinBox SPKG (src/interfaces/sage/linbox.C)
which calls
static DenseMatrix<ModInt> linbox_new_modn_matrix(mod_int modulus, mod_int**
matrix, size_t nrows, si\
ze_t ncols) {
ModInt F((double)modulus);
DenseMatrix<ModInt> A ( F, nrows, ncols);
size_t i, j, k;
for (i=0; i < nrows; i++) {
for (j=0; j < ncols; j++) {
A.setEntry(i, j, (double)matrix[i][j]);
}
}
return A;
};
which copies the matrix to a new matrix A.
AFAIK, there are 3 tickets on trac: 4260, 4968, 4258
We could merge these and close 2 of them as duplicates.
When I get a chance I will take a look at redoing the template
structure in the dense_ctypes.patch. I am back to normal schedule
only today, so this will probably happen sometime tomorrow.
Perhaps you can help with the pickling, coercion, etc. problems once I
get the linbox wrapper working. ;)
Cheers,
Burcin
Cool! My plan was to work on this during the upcoming Sage Days in Seattle at
the end of August. If you have a working prototype by then I'll try to finish
it off :)
I am new to sage-devel. Sorry if I asked questions that you perhaps discussed in
the the past.
Why do you wan't to use Strassen for Matrix Multiplication? This algorithms is
not as stable as naive algorithm (if one uses floating point numbers, if you use
it for symbol computations that is no problem). It is also not very fast for
small matrices. And if you don't use the in-memory version of the algorithm it
has extremly memory requirements.
Why not Winograd's method. Or something like Tiling or Loop-Unrolling?
Again, sorry if you discussed this in the past.
Best wishes,
Ivo.
Hi Ivo,
> I am new to sage-devel. Sorry if I asked questions that you perhaps
> discussed in the the past.
>
> Why do you wan't to use Strassen for Matrix Multiplication? This algorithms
> is not as stable as naive algorithm (if one uses floating point numbers,
> if you use it for symbol computations that is no problem).
We are discussing using it over finite fields and hence there is no stability
issue.
> It is also not very fast for small matrices.
That's why one usually implements a cutoff: below a certain dimension naive or
or some other multiplication is used: both when the input matrix is small and
when the recursion reaches the small dimension.
> And if you don't use the in-memory version
> of the algorithm it has extremly memory requirements.
The memory overhead is 2/3 n^2 if you set it up right.
> Why not Winograd's method.
Usually, when people talk about Strassen they mean Strassen-Winograd, is that
what you mean? It just reduces the number of additions.
> Or something like Tiling or Loop-Unrolling?
Of course, one implements the base case as efficiently as possible. In
LinBox's case it's outsourced to ATLAS, in other cases there are dedicated
implementations. While these implementation tricks are crucial for good
performance, they won't beat Strassen-Winograd since they are asymptotically
slower: n^3 vs. n^2.807.
PS:
On 18 Jul., 12:44, Simon King <simon.k...@uni-jena.de> wrote:
> > Usually, when people talk about Strassen they mean Strassen-Winograd, is that
> > what you mean? It just reduces the number of additions.
>
> My guess is that Ivo refers to Coppersmith-Winograd, which, according
> to Wikipedia, *is* faster than Strassen-Winograd: n^2.376 versus
> n^2807.
... but only pays off for matrix sizes too big to be practical. I
guess that's why it is not in Sage.
--
To post to this group, send an email to sage-...@googlegroups.com
To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org
what would be a nice piece of work. Good luck and have fun ;-)
Please test the following if your implementation is ready: Try to use your Strassen implementation only with 1 step of recursion (in the next step only the naive algorithm). And a second version with only 3 steps of recursion. This will bound the memory requirements and give a speed up. This is a known heuristic (to stop the recursion early). But I don't know if it is better than the parameters for the algorithm found by the benchmark.
By the way: With "Winograd's method" I mean the one from Section 2.3 (page 10) from http://arxiv.org/abs/1106.1347 .
But please note that the survey article mentioned above is the first draft. It is not complete and the implementations are not optimized.
Best wishes, Ivo.
wow, I am impressed.
If the cutoff-parameter is determined by the benchmark: Yes, submit it!
In the other case I see the problem of how to define the cutoff in a good way. In this case the method could be significantly slower.
I look forward to review your patch.
Best wishes from Halle to Jena,
Ivo
Am 18.07.2011 um 18:36 schrieb Simon King:
> Anyway, I think Strassen multiplication should become the default for
> Matrix_modn_dense.
>
> What do people think?
I am no fan of Strassen as standard method.
I usually work with very large matrices. For this size the memory issue is very important. I think that it would be better to use the current standard method and only give Strassen as a additional method. Then one can use Strassen if he knows what he is doing.
Ivo.
it's the last day of Sage Days 32 and I push Burcin's and Clément's patch far
enough to have all doctests pass on sage.math, cf.
http://trac.sagemath.org/sage_trac/ticket/4260
With respect to performance the difference is like night and day:
== Night (without #4260)==
sage: A = random_matrix(GF(previous_prime(2^23)),1000,1000)
sage: B = random_matrix(GF(previous_prime(2^23)),1000,1000)
sage: %time A*B
CPU times: user 326.12 s, sys: 1.27 s, total: 327.39 s
Wall time: 328.79 s
1000 x 1000 dense matrix over Finite Field of size 8388593
sage: %time A.echelonize()
CPU times: user 92.59 s, sys: 0.64 s, total: 93.23 s
Wall time: 93.67 s
== Day (with #4260) ==
sage: A = random_matrix(GF(previous_prime(2^23)),1000,1000)
sage: B = random_matrix(GF(previous_prime(2^23)),1000,1000)
sage: %time A*B
CPU times: user 0.26 s, sys: 0.02 s, total: 0.28 s
Wall time: 0.28 s
1000 x 1000 dense matrix over Finite Field of size 8388593
sage: %time A.echelonize()
CPU times: user 0.41 s, sys: 0.00 s, total: 0.41 s
Wall time: 0.42 s
Rob Beezer (sitting next to me :)) is currently looking at the patch, adding
more doctests/documentation and stress testing the whole thing. It would be
great if people could give #4260 a spin on different platforms to check
whether it works for them. Especially
- big endian machines to test whether pickling works
- 32-bit machines and
- non-Linux machines
would be greatly appreciated. Also, people who care deeply about matrices over
ZZ and QQ should give it a spin and check for performance regressions or
improvements (because of multi-modular stuff) There shouldn't be any, but who
knows.
Cheers,
Martin
PS: After SD32 I probably wont' have time to push this patch much further so
if you care about switching from very slow to very very fast for operations
with dense matrices mod n (n < 2^23) then feel free to take charge.
--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
_www: http://martinralbrecht.wordpress.com/
_jab: martinr...@jabber.ccc.de
_www: http://www.informatik.uni-bremen.de/~malb
_jab: martinr...@jabber.ccc.de