Cython 0.20 don't speedup with boundscheck(False)

99 views
Skip to first unread message

Franco Nicolas Bellomo

unread,
Mar 28, 2015, 2:18:29 PM3/28/15
to cython...@googlegroups.com
Hi!. I'm starting to take my first steps with cython and I have several questions. The idea is to solve a problem of heat transfer . The first thing was to use pure numpy probe but is slow to time I need. Also, I'm using cython to implement OpenMP.

This is my setup.py:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy as np

ext_modules = [Extension("explicit_cython2", ["explicit_cython2.pyx"])]

setup(
  name = 'Explicit method using Cython',
  cmdclass = {'build_ext': build_ext},
  include_dirs = [np.get_include()],
  ext_modules = ext_modules
)

And this is my cython code (explicit_cython2.pyx):

import cython
import numpy as np
cimport numpy as np.
DTYPE = np.int
ctypedef np.int_t DTYPE_t

@cython.boundscheck(False)
@cython.wraparound(False)
cpdef double [:,:] explicit_cython(np.ndarray[np.float_t,ndim=1] u, float kappa, float dt, float dz, np.ndarray[np.float_t,ndim=1] term_const,
                    unsigned int nz, np.ndarray[long,ndim=1] plot_time):
    '''Cython version of explicit method'''
   
    #Defining C types
    cdef unsigned int i, k, j
    cdef unsigned int len_plot = len(plot_time) - 1
    cdef float lamnda = kappa*dt/dz**2
   
    # Memoryview on a NumPy array
    cdef double [:] u_view = u
    cdef double [:] un_view = u
    cdef double [:] const_view = term_const
    cdef double [:,:] uOut_view = np.zeros([len_plot + 1, nz])
    cdef long [:] plot_view = plot_time

    uOut_view[0] = u_view
   
    for i in range(len_plot):
        for k in range(plot_view[i], plot_view[i+1]):
            un_view = u_view
            for j in range(1, nz-1):
                u_view[j] = un_view[j] + lamnda*(un_view[j+1] - 2*un_view[j] + un_view[j-1]) + const_view[j]
        uOut_view[i+1] = u_view
 
    return uOut_view

I have some
questions:

# W
hy I get no time difference when I put `boundscheck` and ``wraparound` to False?

# Is correncto the setup I implemented? Because I saw the cython doc using `from Cython.Build import cythonize`
#
This code is faster than the implementation of numpy but slower than numba. Can you think of any other improvement that can be implemented?

Thank you very much for your help.

Best regards,
Franco


[0] This is the mathematical problem, but I apologize for that wrote in Spanish.
http://nbviewer.ipython.org/github/pewen/transferencia_calor/blob/master/transporte_calor_metodo_explicito.ipynb

Robert Bradshaw

unread,
Apr 1, 2015, 1:17:28 AM4/1/15
to cython...@googlegroups.com
C compiler and modern processor branch prediction is pretty good these days...

> # Is correncto the setup I implemented? Because I saw the cython doc using
> `from Cython.Build import cythonize`

That works, but the cythonize setup is preferable.

> # This code is faster than the implementation of numpy but slower than
> numba. Can you think of any other improvement that can be implemented?

If you declare (and hence require) your arrays to be contiguous, that
might help: "cdef double [::1] u_view = ..." etc. This will avoid the
constant multiplication by runtime "stride" value of 1.

> Thank you very much for your help.
>
> Best regards,
> Franco
>
> [0] This is the mathematical problem, but I apologize for that wrote in
> Spanish.
> http://nbviewer.ipython.org/github/pewen/transferencia_calor/blob/master/transporte_calor_metodo_explicito.ipynb
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "cython-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cython-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Sturla Molden

unread,
Apr 1, 2015, 2:15:31 PM4/1/15
to cython...@googlegroups.com
Robert Bradshaw <robe...@gmail.com> wrote:

>> # Why I get no time difference when I put `boundscheck` and ``wraparound` to
>> False?
>
> C compiler and modern processor branch prediction is pretty good these days...

This is also why the old notion that "Fortran is faster than C" is no
longer true. And this is also why the "restrict" keyword in C99 almost
never have an effect. Ditto for __builtin_expect and likely/unlikely
macros. Modern CPUs with long pipelines and branch prediction have evolved
to be good at running C and Java code. That takes care of things like
bounds checking. We only get an effect on the runtime if the bounds check
fails and the pipeline must be flushed to run the unexpected branch. There
is a lot of stuff that mattered 10 years ago which has no consequence
today. That is nice, yes.

Sturla

Reply all
Reply to author
Forward
0 new messages