[Numpy-discussion] Matrix dot product over an axis(for a 3d array/list of matrices)

2,097 views
Skip to first unread message

Emmanuel Bengio

unread,
Jul 15, 2010, 12:38:57 PM7/15/10
to numpy-di...@scipy.org
Hello,

I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise).
Making a for loop that calculates the dot product of each is extremely slow,
I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead.

I do something like this:
>> for a,b in izip(Rot,Trans):
>>     c.append(numpy.dot(a,b))

Is there a way to do this in one instruction?
Or is there a way to do this all using weave.inline?

--
        

         Emmanuel

John Salvatier

unread,
Jul 15, 2010, 12:45:21 PM7/15/10
to Discussion of Numerical Python
Could you place all Rot's into the same array and all the Trans's into the same array? If you have the first index of each array refer to which array it is numpy.dot should work fine, since numpy.dot just does the dot product over the second to last and last indexes. http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html

John

_______________________________________________
NumPy-Discussion mailing list
NumPy-Di...@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Keith Goodman

unread,
Jul 15, 2010, 12:45:32 PM7/15/10
to Discussion of Numerical Python
On Thu, Jul 15, 2010 at 9:38 AM, Emmanuel Bengio <ben...@gmail.com> wrote:
>

How about using list comprehension? And setting dot = numpy.dot. Would
initializing the the c list first help?

Keith Goodman

unread,
Jul 15, 2010, 12:54:04 PM7/15/10
to Discussion of Numerical Python
On Thu, Jul 15, 2010 at 9:45 AM, Keith Goodman <kwgo...@gmail.com> wrote:
> On Thu, Jul 15, 2010 at 9:38 AM, Emmanuel Bengio <ben...@gmail.com> wrote:
>>
>> Hello,
>>
>> I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise).
>> Making a for loop that calculates the dot product of each is extremely slow,
>> I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead.
>>
>> I do something like this:
>> >> for a,b in izip(Rot,Trans):
>> >>     c.append(numpy.dot(a,b))
>>
>> Is there a way to do this in one instruction?
>> Or is there a way to do this all using weave.inline?
>
> How about using list comprehension? And setting dot = numpy.dot. Would
> initializing the the c list first help?

Doesn't buy much:

>> def forloop(a, b):
.....: c = []
.....: for x, y in izip(a, b):
.....: c.append(np.dot(x, y))
.....: return c
.....:
>> a = [np.random.rand(4,4) for i in range(10000)]
>> b = [np.random.rand(4,4) for i in range(10000)]
>>
>> timeit forloop(a, b)
10 loops, best of 3: 21.2 ms per loop
>>
>> dot = np.dot
>> timeit [dot(x, y) for x, y in izip(a, b)]
100 loops, best of 3: 19.2 ms per loop

Charles R Harris

unread,
Jul 15, 2010, 12:58:54 PM7/15/10
to Discussion of Numerical Python

Yes, there is a trick for this using a multiply with properly placed newaxis followed by a sum. It uses more memory but for stacks of small arrays that shouldn't matter. See the post here

Chuck

Emmanuel Bengio

unread,
Jul 15, 2010, 1:32:50 PM7/15/10
to Discussion of Numerical Python
>Could you place all Rot's into the same array and all the Trans's into the same array?
Well I guess since they're all the same size. I would just have to do array(a). But the result of the dot product of two 3d arrays is most unexpected:
>>> a = numpy.ones((4,5,6))
>>> a = numpy.ones((10,4,4))
>>> b = numpy.ones((10,4,4))
>>> c = numpy.dot(a,b)
>>> c.shape
(10, 4, 10, 4) #Hmm, not what a newbie expects D:


>Yes, there is a trick for this using a multiply with properly placed newaxis followed by a sum. It uses more memory but for stacks of small arrays that shouldn't matter. See the post here
Hmm, I'm not sure I understand what is being done there.
--
        

         Emmanuel

Charles R Harris

unread,
Jul 15, 2010, 1:46:48 PM7/15/10
to Discussion of Numerical Python
On Thu, Jul 15, 2010 at 11:32 AM, Emmanuel Bengio <ben...@gmail.com> wrote:
>Could you place all Rot's into the same array and all the Trans's into the same array?
Well I guess since they're all the same size. I would just have to do array(a). But the result of the dot product of two 3d arrays is most unexpected:
>>> a = numpy.ones((4,5,6))
>>> a = numpy.ones((10,4,4))
>>> b = numpy.ones((10,4,4))
>>> c = numpy.dot(a,b)
>>> c.shape
(10, 4, 10, 4) #Hmm, not what a newbie expects D:


>Yes, there is a trick for this using a multiply with properly placed newaxis followed by a sum. It uses more memory but for stacks of small arrays that shouldn't matter. See the post here
Hmm, I'm not sure I understand what is being done there.



It's just matrix multipy considered as a sum of the outer products of column and row vectors, i.e., outer product of first column in first matrix and first row in second matrix plus outer product of second column in first matrix plus second row in second matrix, etc. That form is easily adapted to stacks of matrices and is sometimes used on vector architectures, see Golub and Van Loan.

In [6]: a = ones((10,4,4))

In [7]: b = ones((10,4,4))

In [8]: sum(a[...,:,:,newaxis]*b[...,newaxis,:,:], axis=-2)
Out[8]:
array([[[ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.]],

       [[ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.]],

       [[ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.]],

       [[ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.]],

       [[ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.]],

       [[ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.]],

       [[ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.]],

       [[ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.]],

       [[ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.]],

       [[ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.],
        [ 4.,  4.,  4.,  4.]]])

<snip>

Chuck

Emmanuel Bengio

unread,
Jul 15, 2010, 2:00:46 PM7/15/10
to Discussion of Numerical Python
Ok I get it. Thanks!

Numpy syntax that works for me:
numpy.sum(a[:,:,:,numpy.newaxis]*b[:,numpy.newaxis,:,:],axis=-2)

_______________________________________________
NumPy-Discussion mailing list
NumPy-Di...@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion




--
        

         Emmanuel

Charles R Harris

unread,
Jul 15, 2010, 2:09:57 PM7/15/10
to Discussion of Numerical Python
On Thu, Jul 15, 2010 at 12:00 PM, Emmanuel Bengio <ben...@gmail.com> wrote:
Ok I get it. Thanks!

Numpy syntax that works for me:
numpy.sum(a[:,:,:,numpy.newaxis]*b[:,numpy.newaxis,:,:],axis=-2)


The leading "..." gives the same thing, but iterates over all the leading indicies in case you want multidimensional arrays of matrices ;) You can also use the sum method which might be a bit more economical:

(a[:,:,:,numpy.newaxis]*b[:,numpy.newaxis,:,:]).sum(axis=-2)

How do the execution times compare?

<snip>

Chuck

Emmanuel Bengio

unread,
Jul 15, 2010, 2:30:44 PM7/15/10
to Discussion of Numerical Python
I get about 60% of the original execution times for about any size of stack.

_______________________________________________
NumPy-Discussion mailing list
NumPy-Di...@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion




--
        

         Emmanuel

David Warde-Farley

unread,
Jul 15, 2010, 4:31:14 PM7/15/10
to Discussion of Numerical Python, numpy-di...@scipy.org
On 2010-07-15, at 12:38 PM, Emmanuel Bengio <ben...@gmail.com> wrote:

> Hello,
>
> I have a list of 4x4 transformation matrices, that I want to "dot with" another list of the same size (elementwise).
> Making a for loop that calculates the dot product of each is extremely slow,
> I thought that maybe it's due to the fact that I have thousands of matrices and it's a python for loop and there's a high Python overhead.
>
> I do something like this:
> >> for a,b in izip(Rot,Trans):
> >> c.append(numpy.dot(a,b))

If you need/want more speed than the solution Chuck proposed, you should check out Cython and Tokyo. Cython lets you write loops that execute at C speed, whereas Tokyo provides a Cython level wrapper for BLAS (no need to go through Python code to call NumPy). Tokyo was designed for exactly your use case: lots of matrix multiplies with relatively small matrices, where you start noticing the Python overhead.

David

David Warde-Farley

unread,
Jul 15, 2010, 6:28:39 PM7/15/10
to Discussion of Numerical Python
On 2010-07-15, at 4:31 PM, David Warde-Farley wrote:

> If you need/want more speed than the solution Chuck proposed, you should check out Cython and Tokyo. Cython lets you write loops that execute at C speed, whereas Tokyo provides a Cython level wrapper for BLAS (no need to go through Python code to call NumPy). Tokyo was designed for exactly your use case: lots of matrix multiplies with relatively small matrices, where you start noticing the Python overhead.

It occurred to me I neglected to provide a link (cursed iPhone):

http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-math/

Charles R Harris

unread,
Jul 15, 2010, 7:44:53 PM7/15/10
to Discussion of Numerical Python
On Thu, Jul 15, 2010 at 4:28 PM, David Warde-Farley <d...@cs.toronto.edu> wrote:
On 2010-07-15, at 4:31 PM, David Warde-Farley wrote:

> If you need/want more speed than the solution Chuck proposed, you should check out Cython and Tokyo. Cython lets you write loops that execute at C speed, whereas Tokyo provides a Cython level wrapper for BLAS (no need to go through Python code to call NumPy). Tokyo was designed for exactly your use case: lots of matrix multiplies with relatively small matrices, where you start noticing the Python overhead.

It occurred to me I neglected to provide a link (cursed iPhone):

http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-math/


For speed I'd go straight to c and avoid BLAS since the matrices are so small. There might also be a cache advantage to copying the non-contiguous columns of the rhs to the stack.

Chuck

Reply all
Reply to author
Forward
0 new messages