MatrixSpace.zero_matrix

16 views
Skip to first unread message

Martin Albrecht

unread,
Jul 10, 2011, 10:21:52 AM7/10/11
to Sage Development
Hi there,

currently, MatrixSpace's zero_matrix command is implemented as follows:

@cached_method
def zero_matrix(self):
res = self.__matrix_class(self, 0, coerce=False, copy=False)
res.set_immutable()
return res

which means that whenever on calls matrix(K,m,n) for the first time, it
creates two matrices which is a very very bad idea when working with big
matrices (== RAM full)

I found

http://trac.sagemath.org/sage_trac/ticket/8276

and a few discussions about it on [sage-devel]. However, all those discussions
seem to assume that caching zero_matrix() is a good idea. Hence, I wonder if I
just missed the discussion or the reason on this?

In any case, there should at least be an option to disable this (say after
certain dimensions)?

Cheers,
Martin


--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
_www: http://martinralbrecht.wordpress.com/
_jab: martinr...@jabber.ccc.de

Simon King

unread,
Jul 10, 2011, 12:45:38 PM7/10/11
to sage-devel
Hi Martin,

On 10 Jul., 16:21, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> which means that whenever on calls matrix(K,m,n) for the first time, it
> creates two matrices which is a very very bad idea when working with big
> matrices (== RAM full)

I was searching the code of the "matrix" function, and the
"zero_matrix" method is not mentioned there.

But I found something in that the __call__ method of a matrix space:
If there are no entries given, then the resulting empty matrix is
created by copying zero_matrix, rather than creating a new zero matrix
from scratch. It is stated that copying is faster than creating from
scratch (and I recall that I had verified that claim a while ago).

Hence, indeed, if you create an empty matrix by matrix(K,m,n) or by
MatrixSpace(K,m,n)() then zero_matrix is called and the (cached)
result copied.

> and a few discussions about it on [sage-devel]. However, all those discussions
> seem to assume that caching zero_matrix() is a good idea. Hence, I wonder if I
> just missed the discussion or the reason on this?

I generally believe that one_element and zero_element of a
multiplicative resp. additive monoid should be cached, and thus
zero_matrix as well -- unless there is a good reason not to do so.

Of course, memory consumption could be a good reason.

> In any case, there should at least be an option to disable this (say after
> certain dimensions)?

There are ways to create an empty matrix from scratch, whithout
calling zero_matrix first (namely by calling the matrix class). So, if
you have an application where RAM is an issue and an any additional
matrix is one matrix too many then you can easily take care of it.

And if you have many matrix computations, then you presumably have
many matrices in RAM anyway. So, just one additional matrix is
unlikely to hurt much.

Hence, I find it reasonable that the default implementation cares more
about speed than about RAM.

Providing an option might be nice, though. But I think it is essential
that there is no time penalty for small matrices. Namely, checking
whether nrows and ncols exceed a threshold whenever a matrix is
created, might already be quite noticeable in computations involving
small matrices.

Cheers,
Simon

Martin Albrecht

unread,
Jul 10, 2011, 2:38:19 PM7/10/11
to sage-...@googlegroups.com
Hi Simon,

> There are ways to create an empty matrix from scratch, whithout
> calling zero_matrix first (namely by calling the matrix class).

Well, it's not very straight-forward, but I agree there is a way (which I
didn't think of before)

sage: from sage.matrix.matrix_mod2_dense import Matrix_mod2_dense
sage: MS = MatrixSpace(GF(2),64000,64000)
sage: A = Matrix_mod2_dense(MS,None,True,True)

Btw. the speed argument does not seem to hold true for dense over GF(2):

sage: MS = MatrixSpace(GF(2),10000,10000)
sage: %timeit A = Matrix_mod2_dense(MS,None,True,True)
125 loops, best of 3: 2.25 ms per loop
sage: %timeit _ = copy(A)
125 loops, best of 3: 6.37 ms per loop

Which makes sense because we always return a zero matrix when we allocate in
M4RI, so copy() is alloc + memcpy while just creating it is just an alloc.

I guess one approach could be a class attribute which determines whether a
cache should be used or not (i.e. by type). Another one would decide based on
some dimension (I don't know which since that depends on the implementation
heavily).

Simon King

unread,
Jul 10, 2011, 3:05:31 PM7/10/11
to sage-devel
Hi Martin,

On 10 Jul., 20:38, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> Btw. the speed argument does not seem to hold true for dense over GF(2):
>
> sage: MS = MatrixSpace(GF(2),10000,10000)
> sage: %timeit A = Matrix_mod2_dense(MS,None,True,True)
> 125 loops, best of 3: 2.25 ms per loop
> sage: %timeit _ = copy(A)
> 125 loops, best of 3: 6.37 ms per loop
>
> Which makes sense because we always return a zero matrix when we allocate in
> M4RI, so copy() is alloc + memcpy while just creating it is just an alloc.

But it DOES hold for dense matrices over GF(3):

sage: from sage.matrix.matrix_modn_dense import Matrix_modn_dense
sage: MS = MatrixSpace(GF(3),100,100)
sage: M = Matrix_modn_dense(MS, None,True,True)
sage: timeit("M = Matrix_modn_dense(MS, None,True,True)",
number=10000)
10000 loops, best of 3: 25.8 µs per loop
sage: timeit("N = copy(M)", number=10000)
10000 loops, best of 3: 13 µs per loop

> I guess one approach could be a class attribute which determines whether a
> cache should be used or not (i.e. by type). Another one would decide based on  
> some dimension (I don't know which since that depends on the implementation
> heavily).

If copying an M4RI matrix is systematically slower than creation from
scratch, then I suggest to change one line in the __call__ method of
MatrixSpace_generic:
Currently, it says:
if entries is None or entries == 0:
if self.__is_sparse: # faster to create a new one than
copy.
return self.__matrix_class(self, {}, coerce=coerce,
copy=copy)
else:
return self.zero_matrix().__copy__()

You could try what happens (speed-wise, both over GF(2) and in the
general case) if you do something like
...
if self.__is_sparse:
...
else:
C = self.__matrix_class
if C ==
sage.matrix.matrix_mod2_dense.Matrix_mod2_dense:
return C(self, None,coerce=coerce, copy=copy)
else:
return self.zero_matrix().__copy_()

Cheers,
Simon

Robert Goss

unread,
Jul 10, 2011, 3:20:04 PM7/10/11
to sage-...@googlegroups.com
Simon,


> But it DOES hold for dense matrices over GF(3):
>
> sage: from sage.matrix.matrix_modn_dense import Matrix_modn_dense
> sage: MS = MatrixSpace(GF(3),100,100)
> sage: M = Matrix_modn_dense(MS, None,True,True)
> sage: timeit("M = Matrix_modn_dense(MS, None,True,True)",
> number=10000)
> 10000 loops, best of 3: 25.8 µs per loop
> sage: timeit("N = copy(M)", number=10000)
> 10000 loops, best of 3: 13 µs per loop
>

I had a go at this and found that for larger matrices it doesnt seem to
hold (well for me at least)

sage: from sage.matrix.matrix_modn_dense import Matrix_modn_dense
sage: MS = MatrixSpace(GF(3),100,100)
sage: M = Matrix_modn_dense(MS, None,True,True)

sage: timeit("M = Matrix_modn_dense(MS, None,True,True)",number=100)
100 loops, best of 3: 373 ms per loop


sage: timeit("N = copy(M)", number=10000)

100 loops, best of 3: 402 ms per loop

with copying becoming more expensive the larger the matrices.

Robert Goss

Simon King

unread,
Jul 10, 2011, 3:21:21 PM7/10/11
to sage-devel
PS:

I just got an alternative idea, that might be a little faster and more
flexible.

Let MS be a matrix space.

Note that it only (or at least mainly) depends on MS.__matrix_class
whether it is faster to copy or faster to create from scratch.

Hence, why not have a static method of MS.__matrix_class that creates
the empty matrix for you?

In other words, there could be a static method for matrix classes,
called create_zero. By default, it would be
@staticmethod
def create_zero(MS):
return MS.zero_matrix().__copy__()

but for sparse matrix classes and for
sage.matrix.matrix_mod2_dense.Matrix_mod2_dense, it would be overriden
to create the zero matrix of MS from scratch.

Then, in MatrixSpace_generic.__call__, you would have:
if entries is None or entries == 0:
return self.__matrix_class.create_zero(self)
...

That should be both fast and flexible.

Cheers,
Simon

Simon King

unread,
Jul 10, 2011, 3:27:16 PM7/10/11
to sage-devel
Hi Robert,

On 10 Jul., 21:20, "Robert Goss" <goss.rob...@gmail.com> wrote:
> I had a go at this and found that for larger matrices it doesnt seem to  
> hold (well for me at least)
>
> sage: from sage.matrix.matrix_modn_dense import Matrix_modn_dense
> sage: MS = MatrixSpace(GF(3),100,100)
> sage: M = Matrix_modn_dense(MS, None,True,True)
> sage: timeit("M = Matrix_modn_dense(MS, None,True,True)",number=100)
> 100 loops, best of 3: 373 ms per loop
> sage: timeit("N = copy(M)", number=10000)
> 100 loops, best of 3: 402 ms per loop
>
> with copying becoming more expensive the larger the matrices.

You have 100x100 matrices in both cases, the same as in my example.
But what you do here is to use a different number of loops in the
"timeit" function -- you only make 100 runs, and that is not reliable
timing.

In fact, that is why I chose number=10000. The default would be 625,
but even with 625 runs I observed huge differences in the timings.

Cheers,
Simon

Robert Goss

unread,
Jul 10, 2011, 3:55:35 PM7/10/11
to sage-...@googlegroups.com
Hi Simon,


> You have 100x100 matrices in both cases, the same as in my example.

No unfortunately I ended up making a mistake when I was moving code
between sage and my email client.

> But what you do here is to use a different number of loops in the
> "timeit" function -- you only make 100 runs, and that is not reliable
> timing.
> In fact, that is why I chose number=10000. The default would be 625,
> but even with 625 runs I observed huge differences in the timings.
>

I had tried to up the number of loops my main issue is that the inner
part of the loop is taking about half a second and so i didn't want to
set the number too high.

Having another go with 625 loops gives me:

sage: MS = MatrixSpace(GF(3),10000,10000)


sage: M = Matrix_modn_dense(MS, None,True,True)
sage: timeit("M = Matrix_modn_dense(MS, None,True,True)",

number=625)
625 loops, best of 3: 374 ms per loop
sage: timeit("N = copy(M)", number=625)
100 loops, best of 3: 401 ms per loop

sage: MS = MatrixSpace(GF(3),100,100)
sage: M = Matrix_modn_dense(MS, None,True,True)
sage: timeit("M = Matrix_modn_dense(MS, None,True,True)",

number=625)
625 loops, best of 3: 19 µs per loop
sage: timeit("N = copy(M)", number=625)
625 loops, best of 3: 16.2 µs per loop


> Cheers,
> Simon
>

Thanks for your comments,

Robert


--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Simon King

unread,
Jul 11, 2011, 1:29:58 AM7/11/11
to sage-devel
Hi Martin,

On 10 Jul., 20:38, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> Which makes sense because we always return a zero matrix when we allocate in
> M4RI, so copy() is alloc + memcpy while just creating it is just an alloc.

How does that work? I learnt the hard way: Allocating memory does not
necessarily mean that the memory is initialised by zeroes. Perhaps the
other matrix classes can benefit from M4RI's way to create initialised
memory?

> I guess one approach could be a class attribute which determines whether a
> cache should be used or not (i.e. by type). Another one would decide based on  
> some dimension (I don't know which since that depends on the implementation
> heavily).

According to Robert, the fastest way to create an empty matrix depends
not only on the class but on the dimensions. My main concern is that
doing an elaborate test - repeated each time a zero matrix is created
- would considerably slow things down. Therefore, my first suggestion
was to introduce a static method of the matrix classes (see post
above).

But I forgot that the test actually needs to be done only once for
each matrix class: It is an ideal candidate for a lazy attribute of
MatrixSpace_generic! Hence, we may do something like
@lazy_attribute
def _copy_zero(self):
if self.__is_sparse:
return False
if self.__matrix_class ==
sage.matrix.matrix_mod2_dense.Matrix_mod2_dense:
return False
if self.__matrix_class ==
sage.matrix.matrix_modn_dense.Matrix_modn_dense:
if self._nrows>... and self._ncols>...:
return False
else:
return True
...<do something for matrices over ZZ and so on>...
return True

def __call__(self,...):
...
if entries is None or entries == 0:
if self._copy_zero: # faster to copy than to create a new
one.
return self.zero_matrix().__copy__()
else:
return self.__matrix_class(self, {}, coerce=coerce,
copy=copy)

Cheers,
Simon

Martin Albrecht

unread,
Jul 11, 2011, 6:58:43 AM7/11/11
to sage-...@googlegroups.com
On Monday 11 July 2011, Simon King wrote:
> Hi Martin,

Hi Simon,



> On 10 Jul., 20:38, Martin Albrecht <martinralbre...@googlemail.com>
>
> wrote:
> > Which makes sense because we always return a zero matrix when we allocate
> > in M4RI, so copy() is alloc + memcpy while just creating it is just an
> > alloc.
>
> How does that work? I learnt the hard way: Allocating memory does not
> necessarily mean that the memory is initialised by zeroes. Perhaps the
> other matrix classes can benefit from M4RI's way to create initialised
> memory?

We are not doing anything special except for calling calloc which zeros the
memory or we memset() the memory to zero which should still be faster than a
memcpy().

Well, you are still adding one conditional jump (if self._copy_zero), but I'll
give this approach a try.

Simon King

unread,
Jul 11, 2011, 10:51:08 AM7/11/11
to sage-devel
Hi Martin,

On 11 Jul., 12:58, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> We are not doing anything special except for calling calloc which zeros the
> memory or we memset() the memory to zero which should still be faster than a
> memcpy().

I see. Perhaps one should check whether it would work for other
matrices as well.

> Well, you are still adding one conditional jump (if self._copy_zero), but I'll
> give this approach a try.

No, I am not *adding* a conditional jump. I am *replacing*
if self.__is_sparse:
<bla 1>
else:
<bla 2>
by
if self._copy_zero:
<bla 2>
else:
<bla 1>

The advantage is that self._copy_zero (being a lazy attribute) relies
on a better criterion (perhaps taking into account the base ring and
the matrix dimension) than mere sparseness.

Cheers,
Simon

Simon King

unread,
Jul 11, 2011, 2:04:01 PM7/11/11
to sage-devel
Hi Martin,

On 11 Jul., 12:58, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> We are not doing anything special except for calling calloc which zeros the
> memory or we memset() the memory to zero which should still be faster than a
> memcpy().

Matrix_modn_dense does sage_malloc instead. If I understand correctly,
it won't initialize the memory, right?

I also searched if there exists sage_calloc, but apparently it is a
missing (though requested) feature:
sage: search_src("sage_calloc")
plot/plot3d/index_face_set.pyx:285: memset(point_map, 0,
sizeof(int) * self.vcount) # TODO: sage_calloc

Would it be difficult to provide sage_calloc? Or would calloc itself
be just fine?

Cheers,
Simon

Martin Albrecht

unread,
Jul 11, 2011, 2:23:21 PM7/11/11
to sage-...@googlegroups.com
> Would it be difficult to provide sage_calloc? Or would calloc itself
> be just fine?

No, all that is needed is that (c|m)alloc and free match, hence the definition
of sage_malloc and sage_free. But one could just call sage_malloc() +
memset(), there shouldn't be much overhead compared with calloc().

Simon King

unread,
Jul 11, 2011, 2:40:18 PM7/11/11
to sage-devel
Hi Martin,

On 11 Jul., 20:23, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> > Would it be difficult to provide sage_calloc? Or would calloc itself
> > be just fine?
>
> No, all that is needed is that (c|m)alloc and free match, hence the definition
> of sage_malloc and sage_free. But one could just call sage_malloc() +
> memset(), there shouldn't be much overhead compared with calloc().

I found (in c_lib/include/memory.h) that sage_calloc is actually
defined, but it is nowhere used in Sage. I guess the creation of a
null matrix could be a good first use case.

Cheers,
Simon

Willem Jan Palenstijn

unread,
Jul 11, 2011, 2:48:25 PM7/11/11
to sage-...@googlegroups.com
On Mon, Jul 11, 2011 at 07:23:21PM +0100, Martin Albrecht wrote:
> > Would it be difficult to provide sage_calloc? Or would calloc itself
> > be just fine?
>
> No, all that is needed is that (c|m)alloc and free match, hence the
> definition of sage_malloc and sage_free. But one could just call
> sage_malloc() + memset(), there shouldn't be much overhead compared with
> calloc().

Depending on the platform (OS+libc) calloc will be _much_ faster for large
blocks, especially if the memory isn't written to afterwards, such as for a
zero matrix. (Because with some page table trickery it can get away with only
have one zeroed page of memory, with copy-on-write.)

(I have to admit I haven't really followed this discussion, but since the
thread title mentions zero_matrix, this may be relevant.)

-Willem Jan

Simon King

unread,
Jul 11, 2011, 5:08:36 PM7/11/11
to sage-devel
Hi!

I changed sage_malloc into sage_calloc when an empty matrix is to be
created. However, the last matrix entry is then usually not
initialized to zero. No idea why.

Cheers,
Simon


Simon King

unread,
Jul 11, 2011, 5:12:53 PM7/11/11
to sage-devel
On 11 Jul., 23:08, Simon King <simon.k...@uni-jena.de> wrote:
> I changed sage_malloc into sage_calloc when an empty matrix is to be
> created. However, the last matrix entry is then usually not
> initialized to zero. No idea why.

Sorry, my mistake. I misinterpreted the first argument of calloc. Now
it seems to work, and I'll do tests and timings tomorrow.

Cheers,
Simon

Simon King

unread,
Jul 11, 2011, 5:20:38 PM7/11/11
to sage-devel
First timing looks good.

This is the same example as above, which (in unpatched sage) was 25.8
µs per loop for the creation of a new zero matrix and 13 µs per loop
for copying an existing zero matrix.

With calloc instead of malloc, I obtain:

sage: from sage.matrix.matrix_modn_dense import Matrix_modn_dense
sage: MS = MatrixSpace(GF(3),100,100)
sage: M = Matrix_modn_dense(MS, None,True,True)
sage: timeit("M = Matrix_modn_dense(MS, None,True,True)",
number=10000)
10000 loops, best of 3: 9.56 µs per loop

So, it seems to help.

If tests pass, I'll open a ticket and post a patch tomorrow.

Best regards,
Simon

Martin Albrecht

unread,
Jul 11, 2011, 5:22:09 PM7/11/11
to sage-...@googlegroups.com

Hi Simon, I created http://trac.sagemath.org/sage_trac/ticket/11589 which has a patch attached.

Cheers,
Martin

> --
> To post to this group, send an email to sage-...@googlegroups.com
> To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/sage-devel
> URL: http://www.sagemath.org

Simon King

unread,
Jul 11, 2011, 5:25:28 PM7/11/11
to sage-devel
Hi Martin,

On 11 Jul., 23:22, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> Hi Simon, I createdhttp://trac.sagemath.org/sage_trac/ticket/11589which
> has a patch attached.

Yep, I just noticed.

So, further discussion should take place there.

Cheers,
Simon
Reply all
Reply to author
Forward
0 new messages