Constructor of a very large matrix doesn't copy entries correctly

68 views
Skip to first unread message

Anton Mellit

unread,
Dec 1, 2021, 11:39:47 AM12/1/21
to sage-devel
When I create a very large matrix (the threshold may be 2^32 entries) from a list of vectors, the elements are not copied over correctly.
Here is a "small" example:

FF = GF(next_prime(1000000))
M = 70000
N = 80000
v1 = vector(FF, [FF(k) for k in range(N)])
v2 = vector(FF, [N] * N)
vecs = [v1 + v2*FF(i) for i in range(M)]
mat = matrix(FF, M, N, vecs)
mat[0] == vecs[0]

the last line returns "false".
I understand the bug is not so easy to reproduce: the code above runs for about 40 minutes and needs about 50G of RAM. In fact, it is also strange that it takes so long: all the lines until "mat = matrix(FF, M, N, vecs)" take only about a minute. It shouldn't take so much more time to simply copy a bunch of vectors than it took to compute them.

Sage version: 9.3 running on a 64 bit linux system


Dima Pasechnik

unread,
Dec 1, 2021, 2:56:16 PM12/1/21
to sage-devel
Matrices in Sage are not just array, they come with parents - vector spaces of matrices of this size.
Perhaps it's where the time bottleneck is - in creating the parent, that code wasn't obviously optimised for these kinds of dimensions.

As to the error... Hard to say, one needs to look at the code (and find a machine to reproduce the example). 

Dima


> Sage version: 9.3 running on a 64 bit linux system
>
>
> --
> You received this message because you are subscribed to the Google Groups "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sage-devel/d06f39c6-f066-4e83-a2c1-2bbcba345f36n%40googlegroups.com.

jonatha...@googlemail.com

unread,
Dec 2, 2021, 6:17:51 AM12/2/21
to sage-devel
FF = GF(next_prime(1000000))
M = Matrix(FF, [[1, 2], [3, 4]])
M.__init__??

reveals that the indices are casted into a long. This might be the problem.

Using Py_ssize_t or size_t might have been a better choice.

Jonathan

Dima Pasechnik

unread,
Dec 2, 2021, 6:31:40 AM12/2/21
to sage-devel
On Thu, Dec 2, 2021 at 11:17 AM 'jonatha...@googlemail.com' via
sage-devel <sage-...@googlegroups.com> wrote:
>
> FF = GF(next_prime(1000000))
> M = Matrix(FF, [[1, 2], [3, 4]])
> M.__init__??
>
> reveals that the indices are casted into a long. This might be the problem.

right, seems to be the right place

sage: M*N
5600000000
sage: 2^32 # long is 4 bytes, i.e. maximal value it holds is 2^32-1
4294967296

It's probably not the only place where <long> is used for indexing of
matrices (which are stuffed into 1-dim. arrays)

Dima
> To view this discussion on the web visit https://groups.google.com/d/msgid/sage-devel/9c38f81a-f471-41e1-a36f-07b618232f47n%40googlegroups.com.

Dima Pasechnik

unread,
Dec 2, 2021, 3:56:06 PM12/2/21
to sage-devel
I've opened https://trac.sagemath.org/ticket/32961 to deal with this.
At least we should check that the number of entries can be
handled by the current implementation.
Reply all
Reply to author
Forward
0 new messages