# efficient determinant of matrix over polynomial ring

41 views

### Martin Albrecht

Sep 9, 2008, 8:33:04 AM9/9/08
to Sage Development, phil
Hi there,

I'm putting this thread on [sage-devel] because it isn't a support question
anymore. First, some background:

On Friday 05 September 2008, phil wrote on [sage-support]:
> I have a matrix that is composed of multivariant polynomial
> entries. I want to compute its determinant. The problem is that it
> is very slow or runs out of memory. For example,
> R.<x,y> = QQ[]
> C = random_matrix(R,10,10)
> Cdet = C.determinant() # this line takes a long time
>
> If you have more variables, it will run out of memory instead (on a 32
> bit installation).
>
> Is there a more efficient way to do this? Would using symbolic
> expressions then coercing back to the polynomial ring be better?

My reply was on [sage-support] too:

> Here's a workaround:
>
> sage: R.<x,y> = QQ[]
> sage: C = random_matrix(R,8,8)
> sage: %time d = C.determinant()
> CPU times: user 2.64 s, sys: 0.00 s, total: 2.65 s
> Wall time: 2.67 s
>
> sage: %time d2 = R(C._singular_().det())
> CPU times: user 0.04 s, sys: 0.01 s, total: 0.05 s
> Wall time: 0.15 s
>
> sage: d2 == d
> True

and eventually I provided a patch which is going to be in 3.1.2.

I also wrote on [sage-support]:

> One thing that really puzzles me is that Magma doesn't seem to scale well
> w.r.t. to this particular benchmark.
>
> sage: R.<x,y> = QQ[]
> sage: C = random_matrix(R,10,10)
>
> sage: %time d = C.determinant() # going to be in 3.1.2
> CPU times: user 0.34 s, sys: 0.00 s, total: 0.34 s
> Wall time: 0.34 s
>
> sage: CM = magma(C)
> sage: t = magma.cputime(); d2 = CM.Determinant(); magma.cputime(t)
> 0.59999999999999998
>
> sage: C = random_matrix(R,14,14)
> sage: %time d = C.determinant()
> CPU times: user 2.58 s, sys: 0.00 s, total: 2.58 s
> Wall time: 2.60 s
> sage: CM = magma(C)
> sage: t = magma.cputime(); d2 = CM.Determinant(); magma.cputime(t)
> 27.84 # note that Magma also eats lots and lots of memory
>
> sage: C = random_matrix(R,15,15)
> sage: %time d = C.determinant()
> CPU times: user 4.49 s, sys: 0.00 s, total: 4.49 s
> Wall time: 4.55 s
> sage: CM = magma(C)
> sage: t = magma.cputime(); d2 = CM.Determinant(); magma.cputime(t)
> 68.590000000000003
>
> sage: C = random_matrix(R,16,16)
> sage: %time d = C.determinant()
> CPU times: user 6.98 s, sys: 0.00 s, total: 6.98 s
> Wall time: 7.00 s
> sage: CM = magma(C)
> sage: t = magma.cputime(); d2 = CM.Determinant(); magma.cputime(t)
> 168.41
> sage: magma(d) == d2
> True
>
> sage: R.<x,y> = GF(32003)[]
> sage: C = random_matrix(R,16,16)
> sage: %time d = C.determinant()
> CPU times: user 0.78 s, sys: 0.00 s, total: 0.78 s
> Wall time: 0.92 s
> sage: CM = magma(C)
> sage: t = magma.cputime(); d2 = CM.Determinant(); magma.cputime(t)
> 64.920000000000002
> sage: magma(d) == d2
> True
>
> So I wonder if Singular's implementations are just really good or if Magma
> is just particularly bad for this particular benchmark. I have no feeling
> how fast these things should be.

an finally on Tuesday 09 September 2008, phil wrote on [sage-support]

> Thanks for the tip.  After making that change, Sage no longer crashes
> with an out of memory error on 64 bit Debian. However, I may need to
> make additional changes. The computation is still going after 3
> days. Memory usage is slowly increasing and is now at 6.4 GB.
>
> The entries of the matrix are composed of coefficients extracted from
> some matrix operations on 4 3x3 matrices so there are 38 variables in
> the polynomial ring. Specifically, if anyone is wondering, I am
> trying to compute left hand side of equation 7 in "Five point motion
> estimation made easy" by Li and Hartley (http://users.rsise.anu.edu.au/
> The solver given on Li's webpage using Maple within Matlab to compute
> the determinant at runtime after the coefficients are given as numbers
> in the problem. I want to pre-compute the determinant with the
> coefficients specified by constants. That way at run time all you
> need to do is evaluate the expression replacing the constants with the
> numerical values.

Thoughts?
Martin

--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_www: http://www.informatik.uni-bremen.de/~malb
_jab: martinr...@jabber.ccc.de

### parisse

Sep 10, 2008, 3:17:44 AM9/10/08
to sage-devel
> Thoughts?
> Martin

It's most probably the algorithm used. I have observed that computing
a determinant with multivariate polynomial coefficients is most of the
time faster if you expand minors 2x3 then 3x3 ... up to nxn using
previously computed minors, avoiding divisions. If the coefficients of
the polynomials are rationals with denominators having a not too large
lcm, it is also a good idea to multiply the matrix by the lcm before
computing the det.
If the matrix is not sparse, Fadeev algorithm or other algorithms may
be better.
What is the algorithm used by Singular?

### Martin Albrecht

Sep 10, 2008, 5:16:46 AM9/10/08
Hi parisse,

I agree that this is a question of which algorithm is used.

Singular uses two algorithms:

poly smCallDet(ideal I)

which uses Bareiss

and

poly singclap_det(matrix m)

which seems to be a multi-modular approach. The heuristic to choose between
the two is implemented in smCheckDet.

How fast ist Giac for this particular computation?

Cheers,

### parisse

Sep 10, 2008, 12:54:55 PM9/10/08
to sage-devel

On Sep 10, 11:16 am, Martin Albrecht <m...@informatik.uni-bremen.de>
wrote:
> Hi parisse,
>
> I agree that this is a question of which algorithm is used.
>
> Singular uses two algorithms:
>
> poly smCallDet(ideal I)
>
> which uses Bareiss
>
> and
>
> poly singclap_det(matrix m)
>
> which seems to be a multi-modular approach. The heuristic to choose between
> the two is implemented in smCheckDet.
>
> How fast ist Giac for this particular computation?
>

I did not really test, since I don't know how to print sage matrices
using compatible (e.g. maple) syntax. I made just the test with a 8x8
matrix, where Bareiss and minor expansion are comparable (around 0.25s
on sage which means slower than singular). I guess, multi-modular
approach is the best for dense matrices.
Did you check how fast singular is for the Lewis-Wester determinants?

### parisse

Sep 10, 2008, 1:57:24 PM9/10/08
to sage-devel
BTW, one can define a random matrix somewhat like those generated by
sage inside giac:
f():=randpoly(1,x)*randpoly(1,y)/rand(100)
A:=ranm(8,8,f)
det(A) (Bareiss)
det_minor(A) (minor expansion)

### Martin Albrecht

Sep 11, 2008, 5:41:37 AM9/11/08
> Did you check how fast singular is for the Lewis-Wester determinants?

Are these some kind of benchmark(et)ing examples? Where can I find the input
matrices? Sorry, never heard that name before ... which isn't that
report on [sage-support]. :-)

### parisse

Sep 11, 2008, 6:48:59 AM9/11/08
to sage-devel
> f():=randpoly(1,x)*randpoly(1,y)/rand(100)

Actually it should be f():=randpoly(1,x)*randpoly(1,y)/(1+rand(100)),
otherwise you would get sometimes infinity. It is not quite
equivalent, however I could manage to make some comparisons using sage
C=random_matrix() and maxima(C) to import C into giac.
For dense matrices, Bareiss seems to be faster than minor expansion,
unlike for sparse matrices like those of the Lewis-Wester test.
For 2 variables, singular is faster than giac, around 4* (but I'm
unsure how I should interpret the CPU time total vs Wall time).
Using interpolation is probably the best algorithm in this situation
because you make the multiplication and division by the previous pivot
on already reduced expression and you interpolate the final smaller
result. I'll try to program it to see how faster it is.

On 11 sep, 11:41, Martin Albrecht <m...@informatik.uni-bremen.de>
wrote:
> > Did you check how fast singular is for the Lewis-Wester determinants?
>
> Are these some kind of benchmark(et)ing examples? Where can I find the input
> matrices? Sorry, never heard that name before ... which isn't that
> report on [sage-support]. :-)
>

I meant O1 and O2 of
http://home.bway.net/lewis/calatex.html

### Martin Albrecht

Sep 11, 2008, 7:40:53 AM9/11/08
On Thursday 11 September 2008, Martin Albrecht wrote:
> > Did you check how fast singular is for the Lewis-Wester determinants?
>
> Are these some kind of benchmark(et)ing examples? Where can I find the
> input matrices? Sorry, never heard that name before ... which isn't that
> report on [sage-support]. :-)
>
> Cheers,
> Martin

Okay, found it:

# the original benchmark is over ZZ, but that's really slow in Sage
---------------
P.<x1,x2,x3,x4,x5> = PolynomialRing(QQ, 5)
M = MatrixSpace(P, 26)

w = [ [ 1, 1, 1, 7, x4, 12, x3, 17, x2, 22, x1 ],
[ 2, 2, 1, 8, x4, 13, x3, 18, x2, 23, x1 ],
[ 3, 3, 1, 9, x4, 14, x3, 19, x2, 24, x1 ],
[ 4, 4, 1, 10, x4, 15, x3, 20, x2, 25, x1 ],
[ 5, 5, 1, 26, 1, 1, 0, 1, 0, 1, 0 ],
[ 6, 2, x5, 6, 1, 12, x3, 17, x2, 22, x1 ],
[ 7, 3, x5, 7, 1, 13, x3, 18, x2, 23, x1 ],
[ 8, 4, x5, 8, 1, 14, x3, 19, x2, 24, x1 ],
[ 9, 5, x5, 9, 1, 15, x3, 20, x2, 25, x1 ],
[10, 10, 1, 26, 1, 1, 0, 1, 0, 1, 0 ],
[11, 2, x5, 7, x4, 11, 1, 17, x2, 22, x1 ],
[12, 3, x5, 8, x4, 12, 1, 18, x2, 23, x1 ],
[13, 4, x5, 9, x4, 13, 1, 19, x2, 24, x1 ],
[14, 5, x5, 10, x4, 14, 1, 20, x2, 25, x1 ],
[15, 15, 1, 26, 1, 1, 0, 1, 0, 1, 0 ],
[16, 2, x5, 7, x4, 12, x3, 16, 1, 22, x1 ],
[17, 3, x5, 8, x4, 13, x3, 17, 1, 23, x1 ],
[18, 4, x5, 9, x4, 14, x3, 18, 1, 24, x1 ],
[19, 5, x5, 10, x4, 15, x3, 19, 1, 25, x1 ],
[20, 20, 1, 26, 1, 1, 0, 1, 0, 1, 0 ],
[21, 2, x5, 7, x4, 12, x3, 17, x2, 21, 1 ],
[22, 3, x5, 8, x4, 13, x3, 18, x2, 22, 1 ],
[23, 4, x5, 9, x4, 14, x3, 19, x2, 23, 1 ],
[24, 5, x5, 10, x4, 15, x3, 20, x2, 24, 1 ],
[25, 25, 1, 26, 1, 1, 0, 1, 0, 1, 0 ],
[26, 1, x5, 6, x4, 11, x3, 16, x2, 21, x1 ] ]

m = M.matrix()

for i in range(0,26):
for j in range(0,5):
m[i, (w[i][2*j+1])-1] = w[i][2*j+2]

tinit = cputime()
qqq = m.determinant()
print "M1 =", cputime(tinit), "(SAGE)";

del P, M, w, m, qqq
---------------
This takes 0.001 seconds on my notebook, Magma is in the same ballpark, i.e.
the example is way too small for a comparison.

M2 (again, over QQ) takes 2.2 seconds in Sage 3.1.2.rc2 and 2.02 seconds in
Magma (over ZZ).

Cheers,
Martin

### parisse

Sep 11, 2008, 10:16:14 AM9/11/08
to sage-devel
> Okay, found it:
>

That's the M benchmarks (the matrices are dense), could you try O1,
O2?
(at least the det and gcd parts).
I guess you had to turn to QQ to call singular via your patch,
correct?

### Martin Albrecht

Sep 11, 2008, 10:36:17 AM9/11/08
On Thursday 11 September 2008, parisse wrote:
> > Okay, found it:
>
> That's the M benchmarks (the matrices are dense), could you try O1,
> O2?

It turns out, here Sage/Singular is doing worse than Magma:

O1 (det1) = 0.720 (MAGMA)
O1 (det2) = 1.550 (MAGMA)
O1 = 2.420 (MAGMA)
O2 = 2.110 (MAGMA)
factor = 1.180 (MAGMA)

O1 (det1) = 41.457698 (Sage)
O1 (det2) = 58.837056 (Sage)
O1 = 78.369086 (Sage)
O2 = 4.439325 (Sage)
factor = ????? (didn't wait till it finished)

Thanks for bringing those benchmarks to my attention, it really helps me to
get an idea on how these things are supposed to behave.

> (at least the det and gcd parts).
> I guess you had to turn to QQ to call singular via your patch,
> correct?

Yep.

### rjf

Sep 11, 2008, 11:42:19 AM9/11/08
to sage-devel
Unless there is something I'm missing here, it seems to me this is a
classic problem that has been explored in the literature using several
approaches.
Depending on the sparsity of the entries, (sparse as polynomials) or
sparsity of the matrix (as zero entries) and size of coefficients, it
could be
possible to expand by minors, use a modular method, use one of several
versions of Gaussian elimination (eliminating some divisions that
might seem standard), or some graph-based method.

The answer might be so huge that it won't fit in memory, and that
certainly bears some analysis before you try to make it "faster".
RJF

### parisse

Sep 12, 2008, 2:44:23 AM9/12/08
to sage-devel

On 11 sep, 16:36, Martin Albrecht <m...@informatik.uni-bremen.de>
wrote:
> On Thursday 11 September 2008, parisse wrote:
>
> > > Okay, found it:
>
> > That's the M benchmarks (the matrices are dense), could you try O1,
> > O2?
>

Oops, M are indeed sparse.

> It turns out, here Sage/Singular is doing worse than Magma:
>
> O1 (det1) =  0.720 (MAGMA)
> O1 (det2) =  1.550 (MAGMA)
> O1 =  2.420 (MAGMA)
> O2 =  2.110 (MAGMA)
> factor =  1.180 (MAGMA)
>
> O1 (det1) =  41.457698 (Sage)
> O1 (det2) =  58.837056 (Sage)
> O1 =  78.369086 (Sage)
> O2 =  4.439325 (Sage)
> factor = ????? (didn't wait till it finished)
>

For giac on sage, I get around 5.5 s for each determinant (det_minor),
then 8s and 3s. for the gcd and 4.4s for factor.
M1 takes 0.006s and M2 13.7s (Bareiss).
Not bad but there is still room for improvements.

> Thanks for bringing those benchmarks to my attention, it really helps me to
> get an idea on how these things are supposed to behave.
>

It's indeed intersting to compare and I could even fix a small bug in
polynomial multivariate mult optimizations trying 4 variables random
matrix det.

### parisse

Sep 12, 2008, 2:48:01 AM9/12/08
to sage-devel

On 11 sep, 17:42, rjf <fate...@gmail.com> wrote:
> Unless there is something I'm missing here, it seems to me this is a
> classic problem that has been explored in the literature using several
> approaches.

Correct, but we never said it was new :-)

> Depending on the sparsity of the entries, (sparse as polynomials)  or
> sparsity of the matrix (as zero entries) and size of coefficients, it
> could be
> possible to expand by minors, use a modular method, use one of several
> versions of Gaussian elimination (eliminating some divisions that
> might seem standard), or some graph-based method.
>
> The answer might be so huge that it won't fit in memory, and that
> certainly bears some analysis before you try to make it "faster".

Is there somewhere a detailled analysis on what algorithm should be
taken? I mean, should the CAS determine what algorithm to use, or
should it be left to the user, or should we launch several algorithms
in parallel and the first one finishing would kill the others or a
combination?

### rjf

Sep 15, 2008, 1:05:30 AM9/15/08
to sage-devel

On Sep 11, 11:48 pm, parisse <bernard.pari...@ujf-grenoble.fr> wrote:
> On 11 sep, 17:42, rjf <fate...@gmail.com> wrote:
>
> > Unless there is something I'm missing here, it seems to me this is a
> > classic problem that has been explored in the literature using several
> > approaches.
>
> Correct, but we never said it was new :-)
>
> > Depending on the sparsity of the entries, (sparse as polynomials) or
> > sparsity of the matrix (as zero entries) and size of coefficients, it
> > could be
> > possible to expand by minors, use a modular method, use one of several
> > versions of Gaussian elimination (eliminating some divisions that
> > might seem standard), or some graph-based method.
>
> > The answer might be so huge that it won't fit in memory, and that
> > certainly bears some analysis before you try to make it "faster".
>
> Is there somewhere a detailled analysis on what algorithm should be
> taken?

Probably not. Such an analysis could not depend just on O() and other
complexity measures favored by theoreticians. It would probably be
specific to particular implementations and perhaps even particular
hardware platforms. An analysis of this nature would necessarily
be incomplete (because it would leave out some platforms, etc.)
Even if the results showed that one method was 10X or 100X better
and worthy of serious consideration, such a paper would likely
be rejected by the theoreticians who now populate program committees
and editorial boards which should be publishing CAS related
papers on such topics.

>I mean, should the CAS determine what algorithm to use,

Yes, it should.

or
> should it be left to the user,

Macsyma has at least 3 determinant algorithms, each of which is
superior for some input.

>or should we launch several algorithms
> in parallel and the first one finishing would kill the others or a
> co>mbination?

This idea has been proposed at various times for various tasks since
(at least) 1968 or so.
Back then it was impractical because the worst algorithm would use up
all the memory
and all the processes would fail. Maybe with multiple CPUs and 1,000X
more memory
there would be some different experiments worth trying out. But in the
past it has
usually been left to the user to pick the right algorithm, or
occasionally try some heuristics
to pick a plausible one. Most often, I think, the "best" algorithm
is used and it is
set up with some self-checking (hmm, I'm not finding the solution
this way, maybe I should
give that other method a chance...)

What other problems? GCD, Integration (numeric, symbolic), limit,
zero-equivalence, probably others.
RJF

### parisse

Sep 15, 2008, 9:53:19 AM9/15/08
to sage-devel

> Probably not. Such an analysis could not depend just on O() and other
> complexity measures favored by theoreticians. It would probably be
> specific to particular implementations and perhaps even particular
> hardware platforms.  An analysis of this nature would necessarily
> be incomplete (because it would leave out some platforms, etc.)
> Even if the results showed that one method was 10X or 100X better
> and worthy of serious consideration, such a paper would likely
> be rejected by the theoreticians who now populate program committees
> and editorial boards which should be publishing CAS related
> papers on such topics.
>

That's unfortunate:-( We have to make our own opinion. I thought that
interpolation would be nice, but it appears that it is only if the
bound for the degree of the determinant with respect to the
interpolation variable is not too large compared to the actual degree
of the determinant (e.g. for a char poly). Otherwise you would
interpolate too much, unless you accept a probabilistic answer.

> >or should we launch several algorithms
> > in parallel and the first one finishing would kill the others or a
> > co>mbination?
>
> This idea has been proposed at various times for various tasks since
> (at least) 1968 or so.
> Back then it was impractical because the worst algorithm would use up
> all the memory
> and all the processes would fail.

Isn't there some hope that the fastest algorithm would kill the slower
ones before they eat all the memory? Or maybe these algorithms should
detect themselves that they require too much memory.

### William Stein

Sep 15, 2008, 4:42:28 PM9/15/08

Flamebait. Discouraging. Useless, etc.

Bernard, I think your suggestion to more and more try actually
using multiple algorithms in parallel and terminating the slower
one should be pursued further. To what extent have you tried
this already with giac, and how has it gone?

William

### mhampton

Sep 15, 2008, 5:29:58 PM9/15/08
to sage-devel

On Sep 15, 2:42 pm, "William Stein" <wst...@gmail.com> wrote:
> Bernard, I think your suggestion to more and more try actually
> using multiple algorithms in parallel and terminating the slower
> one should be pursued further. To what extent have you tried
> this already with giac, and how has it gone?

I've also been thinking about something like this for convex hull
algorithms. There are at least three different algorithms for convex
hulls, and for different problems you can get any ordering of
efficiency you want for those three. But its very hard to determine
ahead of time what that ordering will be. Currently Sage uses cddlib
and its "double description" algorithm, and I made an optional package
for lrs (linear reverse search) that I hope to integrate as an
optional method soon (I have unfortunately been busy with other
things). Polymake has those two, plus its own "beneath and beyond"
method. One interesting thing is that the lrs algorithm is guaranteed
to use a limited amount of memory, and in practice that bound tends to
be pretty low, so its not very expensive to throw it in.

Anyway, it seems like a very general approach that could be used in
many contexts (perhaps Groebner bases?). It would be nice to have
framework/template for such things. As we get more and more cores in
standard machines this seems increasingly practical.

M. Hampton

### rjf

Sep 15, 2008, 6:09:33 PM9/15/08
to sage-devel

On Sep 15, 2:29 pm, mhampton <hampto...@gmail.com> wrote:
...

As we get more and more cores in
> standard machines this seems increasingly practical.
>
> M. Hampton

I have been told that there are likely to be severe issues with memory
interference for
many potential applications. Something to keep in mind when coming up
with
algorithms.

### William Stein

Sep 15, 2008, 6:14:20 PM9/15/08

What is "memory interference"?

William

>
> >
>

--
William Stein
Associate Professor of Mathematics
University of Washington
http://wstein.org

### rjf

Sep 15, 2008, 6:23:19 PM9/15/08
to sage-devel

In order to determine which of several algorithms is slowest, one
could try running each of them
sequentially, keeping track of the time. Then the times can be
compared. Presumably you
don't want to do this.

To determine which of N algorithms is fastest by running them in
parallel until one completes
is likely to take several times longer than if you were to choose the
time. Even if you have N processors, you will have some interference
regarding memory. It is
possible but unlikely that you would run without interference on an n-
core machine such as
is being sold for PCs today.

So this suggests that if you have 4 potential algorithms, and running
them together will take
2.5T, where T is the time for the fastest algorithm, then a pre-
computation of some sort that takes
less than time (say) 1.5T that correctly predicts the winner, would
overall save time.

The details may differ somewhat, but it boils down to this: if you
are faced with a substantial
calculation and you are tempted to do this "in parallel" , an
alternative might be to spend time comparable
what you suppose might be taken by the fastest algorithm, to try to
figure out which that algorithm might be.

But, as they sometimes say, a few weeks hacking on the computer can
save an hour in the library.

### root

Sep 16, 2008, 1:15:03 AM9/16/08
> Anyway, it seems like a very general approach that could be used in
> many contexts (perhaps Groebner bases?). It would be nice to have
> framework/template for such things. As we get more and more cores in
> standard machines this seems increasingly practical.

The user interface to Magnus is structured to use multiple procedures
at the same time. Magnus is a special CAS for infinite group theory.
Most of the procedures are not guaranteed to terminate so the idea
used was to run the procedures in parallel. The interface allows you
to choose from the applicable procedures and to specify the percentage
of CPU to give to each procedure. If one procedure does succeed it
"poisons" the other procedures. Magnus is available on Sourceforge.
You might find some interesting ideas from its unique lab workbench
front end (known as the zero-learning curve interface) by Gilbert
Baumslag.

A similar mechanism could be using the Sage notebook.

Tim Daly

### parisse

Sep 16, 2008, 8:07:10 AM9/16/08
to sage-devel
>
> Flamebait.  Discouraging. Useless, etc.
>
> Bernard, I think your suggestion to more and more try actually
> using multiple algorithms in parallel and terminating the slower
> one should be pursued further.   To what extent have you tried
> this already with giac, and how has it gone?
>

There is currently no algorithm of this kind in giac. I thought about
it for gcd but never implemented anything, there are some technical
obstructions like the fact that some multivariate gcd algorithms are
recursive and applying parallel algorithms in a recursive framework is
probably not a good idea.
There is some infrastructure support to do this at the user level
(context pointers for preventing different threads to modify shared
global context information, like variables content or assumptions). It
should be easy to add a maximum memory allowed for a context in a
thread (if it's not possible at the libpthread level). Then I must
find a way to give control for the user, something like
background(a,det(m),1e9,1e4)
would launch a thread with maximal memory allowed 1e9 bytes, maximal
time 1e4s, that should compute the det of m and once finished store