optimizing calls to numpy.dot?

373 views
Skip to first unread message

Skipper Seabold

unread,
Dec 5, 2010, 9:16:19 PM12/5/10
to cython...@googlegroups.com
Hi all,

I am new to Cython and my knowledge of C is pretty rudimentary.  I have attached two
scripts one pure python implementation kalman_python and one is the best I have been
able to do with cython kalman_cython.  The Cython version is still not as fast as I would
think it could be.

I suspect that much time is wasted in the slicing and the calls to dot.  I imagine that I can
avoid the Python call overhead, but what is the easiest way?  I started to try and include
dgemm.c but was hoping there was an easier way, perhaps using the numpy C-api?

I am compiling the Cython version like this with gcc 4.4.3

cython -a kalman_cython.pyx
gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.6 -I/usr/local/lib/python2.6/dist-packages/numpy/core/include/ -o kalman_cython.so kalman_cython.c


Inlined Cython version
--------------------------------

from numpy import dot, identity, kron, zeros, pi, exp, eye, abs, empty, sum
from numpy cimport ndarray, float64_t
from numpy.linalg import inv

ctypedef float64_t DOUBLE

cdef extern from "math.h":
   double log(double x)
   double pow(double x, double y)

   
def loglike_cython(int k,int p, int q,int r,int nobs,
       ndarray[DOUBLE, ndim=2] Z_mat,ndarray[DOUBLE, ndim=2] T_mat,
       ndarray[DOUBLE, ndim=2] R_mat, ndarray[DOUBLE, ndim=1] y):
   cdef int i
   m = Z_mat.shape[1]

   loglikelihood = zeros((1,1))
   alpha = zeros((m,1)) # if constant (I-T)**-1 * c
   P = dot(inv(identity(pow(m,2))-kron(T_mat,T_mat)),
                   dot(R_mat,R_mat.T).ravel('F')).reshape(r,r,order='F')
   F = zeros((nobs,1))
   v = zeros((nobs,1))
   cdef DOUBLE sigma2
   for i in xrange(int(nobs)):
       # Predict
       v_mat = y[i] - dot(Z_mat,alpha) # one-step forecast error
       v[i] = v_mat
       F_mat = dot(dot(Z_mat, P), Z_mat.T)
       F[i] = F_mat
       Finv = 1./F_mat # always scalar for univariate series
       K = dot(dot(dot(T_mat,P),Z_mat.T),Finv) # Kalman Gain Matrix
       # update state
       alpha = dot(T_mat, alpha) + dot(K,v_mat)
       L = T_mat - dot(K,Z_mat)
       P = dot(dot(T_mat, P), L.T) + dot(R_mat, R_mat.T)
       loglikelihood += log(F_mat.item())

   sigma2 = 1./nobs * sum(v**2 / F)
   loglike = -.5 *(loglikelihood + nobs*log(sigma2))
   loglike -= nobs/2. * (log(2*pi) + 1)
   return sigma2, loglike



A profile run of the python version
------------------------------------------------
Sun Dec  5 21:04:49 2010    Profile.prof
                                                                                                                                                                                                              
         1200093 function calls in 5.399 CPU seconds                                                                                                                                                          
                                                                                                                                                                                                              
   Ordered by: internal time                                                                                                                                                                                  
                                                                                                                                                                                                              
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)                                                                                                                                       
        1    3.520    3.520    5.399    5.399 kalman_python.py:5(loglike_python)                                                                                                                              
  1200002    1.878    0.000    1.878    0.000 {numpy.core._dotblas.dot}                                                                                                                                       
        6    0.001    0.000    0.001    0.000 {numpy.core.multiarray.zeros}                                                                                                                                   
        1    0.000    0.000    0.000    0.000 {method 'sum' of 'numpy.ndarray' objects}                                                                                                                       
        1    0.000    0.000    0.000    0.000 numeric.py:789(outer)                                                                                                                                           
        1    0.000    0.000    0.000    0.000 {numpy.linalg.lapack_lite.dgesv}                                                                                                                                
        2    0.000    0.000    0.000    0.000 numeric.py:1875(identity)                                                                                                                                       
        1    0.000    0.000    0.000    0.000 shape_base.py:665(kron)                                                                                                                                         
        2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}                                                                                                                             
        1    0.000    0.000    0.000    0.000 linalg.py:244(solve)                                                                                                                                            
        1    0.000    0.000    0.000    0.000 shape_base.py:639(get_array_prepare)                                                                                                                            
        1    0.000    0.000    0.000    0.000 fromnumeric.py:1353(sum)                                                                                                                                        
        1    0.000    0.000    0.000    0.000 linalg.py:99(_commonType)                                                                                                                                       
        1    0.000    0.000    5.399    5.399 <string>:1(<module>)                                                                                                                                            
        7    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}                                                                                                                                   
        1    0.000    0.000    0.000    0.000 linalg.py:404(inv)                                                                                                                                              
        2    0.000    0.000    0.000    0.000 {numpy.core.multiarray._fastCopyAndTranspose}                                                                                                                   
        3    0.000    0.000    0.000    0.000 linalg.py:66(_makearray)                                                                                                                                        
        1    0.000    0.000    0.000    0.000 {isinstance}                                                                                                                                                    
        1    0.000    0.000    0.000    0.000 linalg.py:127(_to_native_byte_order)                                                                                                                            
        2    0.000    0.000    0.000    0.000 {method 'sort' of 'list' objects}                                                                                                                               
        3    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}                                                                                                                     
        2    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}                                                                                                                   
        5    0.000    0.000    0.000    0.000 numeric.py:216(asarray)                                                                                                                                         
        1    0.000    0.000    0.000    0.000 shape_base.py:652(get_array_wrap)                                                                                                                               
        2    0.000    0.000    0.000    0.000 linalg.py:84(_realType)                                                                                                                                         
        1    0.000    0.000    0.000    0.000 linalg.py:139(_fastCopyAndTranspose)                                                                                                                            
        1    0.000    0.000    0.000    0.000 numeric.py:286(asanyarray)                                                                                                                                      
        1    0.000    0.000    0.000    0.000 linalg.py:157(_assertSquareness)                                                                                                                                
        3    0.000    0.000    0.000    0.000 linalg.py:71(isComplexType)                                                                                                                                     
        1    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}                                                                                                                    
        4    0.000    0.000    0.000    0.000 {hasattr}                                                                                                                                                       
        7    0.000    0.000    0.000    0.000 {getattr}
        2    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'transpose' of 'numpy.ndarray' objects}
        5    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 {max}
        5    0.000    0.000    0.000    0.000 {issubclass}
        3    0.000    0.000    0.000    0.000 {method '__array_prepare__' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 {min}
        1    0.000    0.000    0.000    0.000 {method '__array_wrap__' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 linalg.py:151(_assertRank2)



A profile of the cython version
-------------------------------------------
Sun Dec  5 21:03:46 2010    Profile.prof

        86 function calls in 4.737 CPU seconds

  Ordered by: internal time

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       1    4.737    4.737    4.737    4.737 {kalman_cython.loglike_cython}
       1    0.000    0.000    0.000    0.000 {method 'sum' of 'numpy.ndarray' objects}
       1    0.000    0.000    0.000    0.000 {numpy.linalg.lapack_lite.dgesv}
       2    0.000    0.000    0.000    0.000 numeric.py:1875(identity)
       1    0.000    0.000    0.000    0.000 numeric.py:789(outer)
       2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
       1    0.000    0.000    0.000    0.000 shape_base.py:665(kron)
       1    0.000    0.000    0.000    0.000 linalg.py:244(solve)
       1    0.000    0.000    0.000    0.000 shape_base.py:639(get_array_prepare)
       1    0.000    0.000    0.000    0.000 fromnumeric.py:1353(sum)
       3    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
       1    0.000    0.000    0.000    0.000 linalg.py:99(_commonType)
       7    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
       1    0.000    0.000    0.000    0.000 linalg.py:404(inv)
       1    0.000    0.000    4.737    4.737 <string>:1(<module>)
       5    0.000    0.000    0.000    0.000 numeric.py:216(asarray)
       3    0.000    0.000    0.000    0.000 linalg.py:66(_makearray)
       1    0.000    0.000    0.000    0.000 {isinstance}
       1    0.000    0.000    0.000    0.000 linalg.py:139(_fastCopyAndTranspose)
       1    0.000    0.000    0.000    0.000 linalg.py:127(_to_native_byte_order)
       1    0.000    0.000    0.000    0.000 shape_base.py:652(get_array_wrap)
       2    0.000    0.000    0.000    0.000 {method 'sort' of 'list' objects}
       1    0.000    0.000    0.000    0.000 linalg.py:157(_assertSquareness)
       2    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
       1    0.000    0.000    0.000    0.000 numeric.py:286(asanyarray)
       2    0.000    0.000    0.000    0.000 {numpy.core.multiarray._fastCopyAndTranspose}
       1    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
       7    0.000    0.000    0.000    0.000 {getattr}
       2    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
       1    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
       2    0.000    0.000    0.000    0.000 linalg.py:84(_realType)
       5    0.000    0.000    0.000    0.000 {issubclass}
       5    0.000    0.000    0.000    0.000 {len}
       1    0.000    0.000    0.000    0.000 {max}
       1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       1    0.000    0.000    0.000    0.000 {method 'transpose' of 'numpy.ndarray' objects}
       3    0.000    0.000    0.000    0.000 linalg.py:71(isComplexType)
       3    0.000    0.000    0.000    0.000 {method '__array_prepare__' of 'numpy.ndarray' objects}
       1    0.000    0.000    0.000    0.000 {min}
       1    0.000    0.000    0.000    0.000 {method '__array_wrap__' of 'numpy.ndarray' objects}
       4    0.000    0.000    0.000    0.000 {hasattr}
       2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
       1    0.000    0.000    0.000    0.000 linalg.py:151(_assertRank2)

Thanks,

Skipper
y.txt
kalman_cython.pyx
kalman_python.py
profile_cython.py
profile_python.py

Robert Kern

unread,
Dec 6, 2010, 11:52:31 AM12/6/10
to cython...@googlegroups.com
On 12/5/10 8:16 PM, Skipper Seabold wrote:
> Hi all,
>
> I am new to Cython and my knowledge of C is pretty rudimentary. I have attached
> two
> scripts one pure python implementation kalman_python and one is the best I have
> been
> able to do with cython kalman_cython. The Cython version is still not as fast
> as I would
> think it could be.
>
> I suspect that much time is wasted in the slicing and the calls to dot. I
> imagine that I can
> avoid the Python call overhead, but what is the easiest way? I started to try
> and include
> dgemm.c but was hoping there was an easier way, perhaps using the numpy C-api?

For small matrices inside tight loops, numpy.dot() and even dgemm() is overkill.
It is straightforward to write it yourself.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Skipper Seabold

unread,
Dec 6, 2010, 11:58:02 AM12/6/10
to cython...@googlegroups.com
On Mon, Dec 6, 2010 at 11:52 AM, Robert Kern <rober...@gmail.com> wrote:
> On 12/5/10 8:16 PM, Skipper Seabold wrote:
>>
>> Hi all,
>>
>> I am new to Cython and my knowledge of C is pretty rudimentary.  I have
>> attached
>> two
>> scripts one pure python implementation kalman_python and one is the best I
>> have
>> been
>> able to do with cython kalman_cython.  The Cython version is still not as
>> fast
>> as I would
>> think it could be.
>>
>> I suspect that much time is wasted in the slicing and the calls to dot.  I
>> imagine that I can
>> avoid the Python call overhead, but what is the easiest way?  I started to
>> try
>> and include
>> dgemm.c but was hoping there was an easier way, perhaps using the numpy
>> C-api?
>
> For small matrices inside tight loops, numpy.dot() and even dgemm() is
> overkill. It is straightforward to write it yourself.
>

Thanks, Robert. That's what I figured. I shouldn't ever have
dimensions bigger than 10x10 or so, so I'll try that.

Skipper

Reply all
Reply to author
Forward
0 new messages