Improving a Sage program that is heavy on matrix multipliaction

109 views
Skip to first unread message

Jernej

unread,
Dec 3, 2014, 4:33:46 PM12/3/14
to sage-s...@googlegroups.com, Nathann Cohen, Nino Bašić, Nejc Trdin
Dear sage-support,

I have stumbled into a performance bottleneck in one of my Sage programs. I would like to share the relevant problem  here in hope anyone has a constructive suggestion for optimizing the given program.

I am given a n x n, (0,1) matrix C  where n < 20. C has up to 30% of nonzero elements.

I define the inner product <u,v> = u (2I - C) v^t and need to compute the set S of all (0,1) n-vectors u such that <u,u> = 2 and <u,j> = -1, where j is the vector with all ones. Finally I need to record all pairs u,v from S such that <u,v> == -1 or 0.

The one optimization that I use is to cache the product u*(2I-C). Other than that I don't see any other way to optimize the program hence it looks like

====
global vectors
field = RR # this looks like the fastest option

def init_vectors(n):
    global vectors
    vectors = [matrix(field,b) for b in product(*[[0,1]]*n)]
def foo(C,output):
       
    global vectors

    n = C.nrows()
    # we compute the inverse in QQ just to gain a bit of numerical stability
    D  = (2*identity_matrix(n)-C).inverse()
    D = Matrix(field,D)

    j = matrix(field, tuple([1 for _ in xrange(n)]))
    cur = 1

    vec2int = {}
    cache = {}

    for bm in vectors:
        val = bm*D
        if abs( (val*bm.transpose())[0,0]-2) <= 0.000001 and abs((val*j.transpose())[0,0]+1) <= 0.000001:
            vec2int[cur] = bm
            output.write(str(el for el in bm[0])+'\n')
            cache[cur] = val
            cur+=1

    for i in xrange(1, cur):
        for j in xrange(i+1, cur):
            iv = (cache[i]*vec2int[j].transpose())[0,0]
            if abs(iv) <= 0.0000001 or abs(iv+1) <= 0.000001:
               output.write(str(i) + ' ' + str(j) + '\n')
===

For convenience's sake I have attached the result of %prun on one instance of the problem. From the output of prun I suppose it would make sense to cache the transpose of the vectors as well, though I am not sure this is going to make the crucial difference.

On average it takes 200s to process one matrix and given that I have millions of such matrices it would take years to process all the input. 

Other than rewriting the whole thing in C I currently do not see any viable solution. Hence I am wondering: Do you guys happen to see any clever optimization? Is there a way to compute the named product more efficiently?

Best,

Jernej
prof.out

Nils Bruin

unread,
Dec 3, 2014, 5:14:38 PM12/3/14
to sage-s...@googlegroups.com, nathan...@gmail.com, nino....@gmail.com, nejc....@gmail.com
On Wednesday, December 3, 2014 1:33:46 PM UTC-8, Jernej wrote:

    for i in xrange(1, cur):
        for j in xrange(i+1, cur):
            iv = (cache[i]*vec2int[j].transpose())[0,0]

It looks like you should rewrite this loop so that j is the out variable, so that you can pull
    vec2int[j].transpose()
to the out loop. No need to cache it.
 
In any case, your profiling data indicates:

        1   85.572   85.572  113.024  113.024 <string>:9(constructGraph_fast)
  3733128   10.413    0.000   10.413    0.000 matrix_space.py:145(__classcall__)
  1604418    9.434    0.000   19.921    0.000 {method 'transpose' of 'sage.matrix.matrix_dense.Matrix_dense' objects}
  1604418    3.186    0.000    4.174    0.000 {method '__copy__' of 'sage.matrix.matrix_generic_dense.Matrix_generic_dense' objects}

so most time is spent in "constructGraph_fast", which I guess is your program itself. The transpose is a bit noticeable but hardly constitutes the majority of the time. The construction for the matrix spaces is incurred twice: both for the transpose and for computing the product. By pulling the transpose to the outside loop that should roughly be cut in half (and you'd only have the sqrt of the number of transposes).

It looks like most time is getting lost in basic python interpretation and shuffling around elements. If you make a big matrix out of cache[i] vectors, you can eliminate the loop over i. The only thing left would be inspecting the elements, for which you can then write a simple cython routine (which should be lightning fast).

Nathann Cohen

unread,
Dec 4, 2014, 1:27:21 AM12/4/14
to Nils Bruin, Sage Support, nino....@gmail.com, nejc....@gmail.com
Also, I do not understand why you have so many expressions like:

(val*bm.transpose())[0,0]

If you have performance problems, do not compute a whole matrix if you
are only interested by its [0,0] coordinate O_o

You call j.transpose() repeatedly. Store jt=j.transpose() and use it.

Store and use bm.transpose() instead of bm. You tanspose bm when you
use it in the first loop, and you transpose bm() again in
vec2int[j].transpose().

Also, given that the profiling does not say much of where 99% of the
computations is going, try to run your code without the .write() calls
(and their internal str() calls). Make it run without storing anything
and see how it goes. Perhaps you should store the results directly as
Sage objects using "save" and not as strings... No idea, you need to
test things ^^;

Nathann

Jori Mantysalo

unread,
Dec 4, 2014, 1:33:13 AM12/4/14
to sage-s...@googlegroups.com
On Wed, 3 Dec 2014, Jernej wrote:

> field = RR # this looks like the fastest option
>     D = Matrix(field,D)

I have used scipy when doing matrix arithmetic. I.e.

import scipy
M=scipy.matrix(. . .)

--
Jori Mäntysalo

Jernej

unread,
Dec 4, 2014, 1:37:03 AM12/4/14
to sage-s...@googlegroups.com, nbr...@sfu.ca, nino....@gmail.com, nejc....@gmail.com


On Thursday, 4 December 2014 07:27:21 UTC+1, Nathann Cohen wrote:
Also, I do not understand why you have so many expressions like:

(val*bm.transpose())[0,0]
val*bm.transpose() is actually a number but the way Sage handles it is awkward:

sage: M = Matrix(RR,[[1],[1]])
sage: Matrix(RR,[[-1,0]])*M
[-1.00000000000000]
sage: abs(Matrix(RR,[[-1,0]])*M)
-1.00000000000000 # some norm of the matrix??

soo I either have to do abs(abs(...)) or access the (only) element of the matrix.
 

If you have performance problems, do not compute a whole matrix if you
are only interested by its [0,0] coordinate O_o

You call j.transpose() repeatedly. Store jt=j.transpose() and use it.

Store and use bm.transpose() instead of bm. You tanspose bm when you
use it in the first loop, and you transpose bm() again in
vec2int[j].transpose().

Also, given that the profiling does not say much of where 99% of the
computations is going, try to run your code without the .write() calls
(and their internal str() calls). Make it run without storing anything
and see how it goes. Perhaps you should store the results directly as
Sage objects using "save" and not as strings... No idea, you need to
test things ^^;
Thanks for the suggestions I'll check them out.

Vincent Delecroix

unread,
Dec 4, 2014, 2:32:42 AM12/4/14
to sage-s...@googlegroups.com
2014-12-04 7:37 UTC+01:00, Jernej <azi.s...@gmail.com>:
> val*bm.transpose() is actually a number but the way Sage handles it is
> awkward:
>
> sage: M = Matrix(RR,[[1],[1]])
> sage: Matrix(RR,[[-1,0]])*M
> [-1.00000000000000]
> sage: abs(Matrix(RR,[[-1,0]])*M)
> -1.00000000000000 # some norm of the matrix??

For this very particular problem, you can reproduce it without any
multiplication

sage: M = matrix(RR, [[-1]])
sage: abs(M)
-1.00000000000000

So the problem is with abs(M). The reason is that abs(M) is calling
the method M.__abs__(). The latter one is just a shortcut for the
determinant. I really do not understand why and it looks like a bug to
me.

For comparison, in scipy the same function just return the matrix
where each entry is replace with its absolute value. Which is much
more natural.

I opened a ticket for that:
http://trac.sagemath.org/ticket/17443
It should be corrected in the next stable release.

Thanks for the report
Vincent

Simon King

unread,
Dec 4, 2014, 3:54:59 AM12/4/14
to sage-s...@googlegroups.com
Hi Vincent,

On 2014-12-04, Vincent Delecroix <20100.d...@gmail.com> wrote:
> sage: M = matrix(RR, [[-1]])
> sage: abs(M)
> -1.00000000000000
>
> So the problem is with abs(M). The reason is that abs(M) is calling
> the method M.__abs__(). The latter one is just a shortcut for the
> determinant. I really do not understand why and it looks like a bug to
> me.

No, see discussion on sage-devel.

> For comparison, in scipy the same function just return the matrix
> where each entry is replace with its absolute value. Which is much
> more natural...

... which *is* a bug, IMHO.

> I opened a ticket for that:
> http://trac.sagemath.org/ticket/17443
> It should be corrected in the next stable release.

Better not. Expecting abs(M) to return the matrix formed by the absolute
values of matrix M is a misuse. Generally, matrices in Sage are not
considered to be arrays. Hence, element-wise operations generally don't
seem natural.

The standard notation for abs(x) is |x|, and if M is a matrix then it is
relatively common to write |M| for its determinant, which justifies that
use of abs(M) for that purpose (although M.det() would be clearer).

Best regards,
Simon


Jernej

unread,
Dec 7, 2014, 7:17:53 AM12/7/14
to sage-s...@googlegroups.com, nathan...@gmail.com, nino....@gmail.com, nejc....@gmail.com
Just for fun I wrote an equivalent program in C and tested the Sage function and an equivalent C program on 1 instance of the problem.

===
$ time ./a.out test

real    0m0.366s
user    0m0.306s
sys     0m0.060s
===

And Sage

===
$ time sage test.sage test

real    2m6.285s
user    2m5.256s
sys     0m0.858s
===

I thought that may be interesting to you as well.

Best,

Jernej

Nathann Cohen

unread,
Dec 8, 2014, 2:06:07 AM12/8/14
to Jernej, Sage Support, Nino Bašić, Nejc Trdin
> Just for fun I wrote an equivalent program in C and tested the Sage function
> and an equivalent C program on 1 instance of the problem.

Honestly I am not surprised, what you do in this code can be done with
elementary processor operations, and Python definitely is not the best
language for that !

If re-writing this thing in C told you how the current Python code in
Sage should be rewritten, however, .... ;-)

Nathann

Dima Pasechnik

unread,
Dec 9, 2014, 8:59:05 AM12/9/14
to sage-s...@googlegroups.com
On 2014-12-07, Jernej <azi.s...@gmail.com> wrote:
> ------=_Part_3308_1073346028.1417954673282
> Content-Type: multipart/alternative;
> boundary="----=_Part_3309_733817265.1417954673282"
>
> ------=_Part_3309_733817265.1417954673282
> Content-Type: text/plain; charset=UTF-8
<u,j>=-1 is a linear condition on u.
Shouldn't you only pick up u from the corresponding hyperplane?

>>
>> ====
>> global vectors
>> field = RR # this looks like the fastest option
RDF should be faster, IMHO.
You could use Cython instead of C, and numpy arrays instead of Sage
matrices (numpy arrays interface quite well with Cython).
Alas, currently plain Sage matrices are very slow.

As your matrices are quite small, alternatively you might experiment
with computing modulo p, for p a prime or order bigger than n^2.

>>
>> Best,
>>
>> Jernej
>>
>

Reply all
Reply to author
Forward
0 new messages