Finite field matrix speeds

Stefan

unread,

May 20, 2013, 3:04:14 PM5/20/13

to sage-...@googlegroups.com

Hi all,

On trac:7477, Volker Braun asked about slowness of Sage's finite field matrices. I put together a small comparison between a plain double array of GF(7) elements (B), Sage's matrices (A), and the GenericMatrix class used internally by the (to-be-included-in-Sage) matroids package (C). The worksheet is

http://www.math.princeton.edu/~svanzwam/MatrixSpeedTests.sws

For the impatient: the key methods are these (Xs has 10000 coordinate pairs, Vs has 10000 elements from GF(7)):

{{{

%cython

from sage.matroids.lean_matrix cimport GenericMatrix

from sage.matrix.matrix2 cimport Matrix

cpdef setA(A, X, V, m):

AA = <Matrix> A

for i in xrange(m):

AA.set_unsafe(X[i][0], X[i][1], AA.get_unsafe(X[i][0], X[i][1]) + V[i])

cpdef setB(B, X, V, m):

for i in xrange(m):

B[X[i][0]][X[i][1]] = B[X[i][0]][X[i][1]] + V[i]

cpdef setC(C, X, V, m):

CC = <GenericMatrix> C

for i in xrange(m):

CC.set_unsafe(X[i][0], X[i][1], CC.get_unsafe(X[i][0], X[i][1]) + V[i])

}}}

Here's the test:

{{{

timeit('setA(A, Xs, Vs, 10000)')

timeit('setB(B, Xs, Vs, 10000)')

timeit('setC(C, Xs, Vs, 10000)')

}}}

and here are the timings:

{{{

25 loops, best of 3: 34.3 ms per loop
625 loops, best of 3: 1.5 ms per loop
625 loops, best of 3: 1.12 ms per loop

}}}

Surely a 20-fold slowdown for Sage's matrix structure is unacceptable?

--Stefan.

Simon King

unread,

May 20, 2013, 4:46:49 PM5/20/13

to sage-...@googlegroups.com

On 2013-05-20, Stefan <stefan...@gmail.com> wrote:
> For the impatient: the key methods are these (Xs has 10000 coordinate
> pairs, Vs has 10000 elements from GF(7)):
>
> {{{
> %cython
> from sage.matroids.lean_matrix cimport GenericMatrix
> from sage.matrix.matrix2 cimport Matrix
>
> cpdef setA(A, X, V, m):
> AA = <Matrix> A
> for i in xrange(m):
> AA.set_unsafe(X[i][0], X[i][1], AA.get_unsafe(X[i][0], X[i][1]) +
> V[i])
>
> cpdef setB(B, X, V, m):
> for i in xrange(m):
> B[X[i][0]][X[i][1]] = B[X[i][0]][X[i][1]] + V[i]
>
> cpdef setC(C, X, V, m):
> CC = <GenericMatrix> C
> for i in xrange(m):
> CC.set_unsafe(X[i][0], X[i][1], CC.get_unsafe(X[i][0], X[i][1]) +
> V[i])
> }}}
>

> ...

>
> 25 loops, best of 3: 34.3 ms per loop
> 625 loops, best of 3: 1.5 ms per loop
> 625 loops, best of 3: 1.12 ms per loop
>
> }}}
>
>
> Surely a 20-fold slowdown for Sage's matrix structure is unacceptable?

I think it would be a mistake to restrict the benchmarks to setting/getting
items of the matrix. Lists (your example B, isn't it?) do not provide matrix
multiplication, computation of kernels, etc. How does GenericMatrix compete
with Linbox (that's the backend Sage uses for matrices over GF(7), if I
recall correctly) in classical linear algebra problems?

That said, your timings indicate that the communication between Sage and
Linbox can probably be improved.

Best regards,
Simon

Volker Braun

unread,

May 20, 2013, 5:07:02 PM5/20/13

to sage-...@googlegroups.com

The slow part seems to be get_unsafe, which is constructing a new GF(7) element each time. This could probably be fixed by providing an interface where you provide the output GF(7) object, or by adding a specialized matrix method to increment certain values from a dictionary {(i,j):increment} or so. Is it really crucial to increment certain values in matrices, or are you just using that as a testcase? And if yes, what is it that you really want to do?

Volker Braun

unread,

May 20, 2013, 6:20:08 PM5/20/13

to sage-...@googlegroups.com

Actually, this seems pretty bad: The matrix_modn_dense_float limits itself to modulus up to 256, but the C mod_int type is still unsigned long. This means that all <mod_int> casts go through PyLongObject since a PyInt is only long enough for a signed(!) long.

Volker Braun

unread,

May 20, 2013, 6:46:59 PM5/20/13

to sage-...@googlegroups.com

Here are some timings to illustrate the problem:

--------------------------------------

ctypedef unsigned long mod_int # this is how we define it

cpdef caster_slow():

cdef int i

foo = None

for i in range(10000000):

foo = <mod_int>(1.0)

cpdef caster_fast():

cdef int i

foo = None

for i in range(10000000):

foo = <int>(1.0)

--------------------------------------

Then we get

sage: timeit('caster_slow()')

5 loops, best of 3: 137 ms per loop

sage: timeit('caster_fast()')

25 loops, best of 3: 34.4 ms per loop

This accounts for the majority of the slowdown in getting matrix entries out of matrix_modn_dense_float.

Should we just switch to signed long? This wastes one bit (factor of 2) of maximum modulus length.... Or provide extra cdef apis to get/set the C unsigned long without conversion to a PyInt?

Volker Braun

unread,

May 22, 2013, 4:11:54 PM5/22/13

to sage-...@googlegroups.com

I posted a patch at http://trac.sagemath.org/14627 (needs review)

With the patch there is no measurable performance difference between setA and setB

Reply all

Reply to author

Forward