Re: [cython-users] Cython newbie: low performance gain in recursive code (fully typed)

43 views
Skip to first unread message

Robert Bradshaw

unread,
May 23, 2013, 5:18:56 PM5/23/13
to cython...@googlegroups.com
On Mon, May 20, 2013 at 11:51 AM, wieland <wieland...@gmx.net> wrote:
> Dear all,
> today I gave Cython a try to speed up some performance critical code of mine
> that has lots of potential:
>
> for i in arange(0,dim)[::-1]:
>
> HY = dot(H[i,i:],Y[i:,:])
>
> for j in arange(0,dim)[::-1]:
>
> Y[i,j] = (T[i,j] + dot(HY[j:],HH[j:,j]))/(1-H[i,i]*HH[j,j])
>
> HY[j] += H[i,i]*Y[i,j]
>
>
> return Y
>
>
> This code is very simple but hard to vectorize any further due to the
> recursive interplay between Y and HY. Now, since it is hard to vectorize I
> thought it should have lots of potential once written in C. So I turned to
> Cython and wrote the following .pyx-file:
>
> #cython: boundscheck=False
>
> #cython: wraparound=False
>
>
> import numpy as cnp
>
> cimport numpy as cnp
>
>
> def c_schur_method(cnp.ndarray[double complex, ndim=2] H ,
> cnp.ndarray[double complex, ndim=2] HH, cnp.ndarray[double complex, ndim=2]
> T):
>
> cdef unsigned int i, ii, j, jj
>
> cdef int dim = H.shape[0]
>
> cdef cnp.ndarray[cnp.complex128_t, ndim=1] HY = cnp.zeros(dim,
> dtype=cnp.complex128)
>
> cdef cnp.ndarray[cnp.complex128_t, ndim=2] Y = cnp.zeros((dim,dim),
> dtype=cnp.complex128)
>
>
> for ii in xrange(1,dim+1):
>
> i = dim - ii
>
> HY = cnp.dot(H[i,i:],Y[i:,:])
>
> for jj in xrange(1,dim+1):
>
> j = dim - jj
>
> Y[i,j] = (T[i,j] + cnp.dot(HY[j:],HH[j:,j]))/(1-H[i,i]*HH[j,j])
>
> HY[j] = HY[j] + H[i,i]*Y[i,j]
>
>
> return Y
>
>
> (the weird i vs ii is due to some bug - xrange(dim-1,-1,-1) did not work in
> Cython).

That's because you're trying to use unsigned ints.

> The setup.py file looks like this
>
> from distutils.core import setup
>
> from distutils.extension import Extension
>
> from Cython.Distutils import build_ext
>
>
> import numpy
>
>
> ext = Extension("c_lyapunov_solver", ["c_lyapunov_solver.pyx"],
>
> include_dirs = [numpy.get_include()])
>
>
> setup(ext_modules=[ext],
>
> cmdclass = {'build_ext': build_ext})
>
>
> and I use python setup.py build_ext --inplace to compile the code. The
> speedup of Cython vs Numpy is only about 30% - on 200x200 matrices it's
> 0.89s for Python and 0.58s for Cython (on average).
>
>
> Since I am very new to Cython I might just have overlooked something and
> maybe someone can help me out!? Your help is really appreciated!

You might want to try running cython -a c_lyapunov_solver.pyx to get
an annotated .html file. However, I bet the expensive part is
cnp.dot(...). Try unrolling that for a big speedup.

- Robert

Ghislain Vaillant

unread,
May 24, 2013, 4:18:42 AM5/24/13
to cython...@googlegroups.com
(the weird i vs ii is due to some bug - xrange(dim-1,-1,-1) did not work in Cython)

I would simply drop the unsigned, specially because dim is declared signed and use for computation of i, ii, j and jj.

I agree with Robert, your call to cnp.dot is likely to remain expensive.
Reply all
Reply to author
Forward
0 new messages