Calling functions within a prange block causes a segfault

Jacob Biesinger

unread,

Oct 22, 2011, 11:05:50 AM10/22/11

to cython...@googlegroups.com

Hi Cythoners,

Perhaps this has been asked before but I didn't find it in the archives. I have some code within a prange block that calls a simple function to aid in the aggregation but every time it is called, I get a segfault. I'm not very familiar with OpenMP-- perhaps this is not allowed?

Here's some code to repro:

cdef double sum_elements(np.ndarray[np.double_t, ndim=1] X) nogil:

cdef double total = 0.

cdef Py_ssize_t T = X.shape[0]

cdef Py_ssize_t t

for t in xrange(T):

total += X[t]

return total

cpdef test_parallel(np.ndarray[np.double_t, ndim=1] X):

cdef Py_ssize_t T = X.shape[0]

cdef double total = 0.

cdef Py_ssize_t t

for t in prange(T, nogil=True):

total += sum_elements(X)

print total

And to run the code:

$ python -c "import scipy; import mytest; my_test.test_parallel(scipy.ones(5, dtype=scipy.float64))"

Running Intel-based 64bit Ubuntu 11.04 with Cython 0.15.1 and python 2.7.1 and gcc 4.5.2.

Thanks for your eyes on my code!

--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine

mark florisson

unread,

Oct 22, 2011, 11:36:25 AM10/22/11

to cython...@googlegroups.com

On 22 October 2011 16:05, Jacob Biesinger <jake.bi...@gmail.com> wrote:
> Hi Cythoners,
> Perhaps this has been asked before but I didn't find it in the archives. I
> have some code within a prange block that calls a simple function to aid in
> the aggregation but every time it is called, I get a segfault. I'm not very
> familiar with OpenMP-- perhaps this is not allowed?
> Here's some code to repro:
> cdef double sum_elements(np.ndarray[np.double_t, ndim=1] X) nogil:

Basically, the buffer syntax (np.ndarray[...]) is not allowed for cdef
nogil functions (it tries to acquire a buffer without the GIL, which
is invalid). The compiler should give an error for this code, but
unfortunately it doesn't (this is a known bug).

Instead you could try out memoryviews, which will be new in the
upcoming Cython release. For that you'd have to get a copy of Cython
on github: https://github.com/cython/cython

You basically have to write

cdef double sum_elements(double[:] X) nogil:
...

and the same goes for your test_parallel function. If your buffers are
contiguous you could specify 'double[::1]', which is the same as
'mode="c"' in the current buffer support. The buffers can also be
sliced, transposed and some other tricks can be done without the GIL.
The change from one syntax to another is unfortunate, but it basically
looks cleaner and is more powerful. The older syntax will be
deprecated (in the upcoming release or one or two releases thereafter)
and maybe even removed eventually.

Jacob Biesinger

unread,

Oct 22, 2011, 12:41:03 PM10/22/11

to cython...@googlegroups.com

How would you specify multidimensional arrays in this syntax? Something like

cdef double sum_elements(double[:,:,:] X) nogil:

for a 3 dimensional array? The updated syntax is certainly more succinct (a good thing IMHO). I'll go grab the source and give it a shot. Is there any additional documentation on this? Or is the source readable enough to figure out?

It seems another option would be to inline the function completely, but perhaps that also tries to create the new buffer? inlining segfaults as well on my machine.

mark florisson

unread,

Oct 22, 2011, 12:50:13 PM10/22/11

to cython...@googlegroups.com

Indeed. You can find the hudson documentation here:
https://sage.math.washington.edu:8091/hudson/job/cython-docs/doclinks/1/src/userguide/memoryviews.html
. You can also build the documentation yourself in the docs
subdirectory by typing 'make html'.

> The updated syntax is certainly more succinct (a
> good thing IMHO). I'll go grab the source and give it a shot. Is there any
> additional documentation on this? Or is the source readable enough to
> figure out?
> It seems another option would be to inline the function completely, but
> perhaps that also tries to create the new buffer? inlining segfaults as
> well on my machine.

Do you have an example? It shouldn't segfault. The buffer syntax also
shouldn't create two buffers (in fact, they acquire buffers, they
don't allocate them, although such functionality is also added).

Jacob Biesinger

unread,

Oct 22, 2011, 2:10:19 PM10/22/11

to cython...@googlegroups.com

Thanks for that.

> The updated syntax is certainly more succinct (a
> good thing IMHO). I'll go grab the source and give it a shot. Is there any
> additional documentation on this? Or is the source readable enough to
> figure out?
> It seems another option would be to inline the function completely, but
> perhaps that also tries to create the new buffer? inlining segfaults as
> well on my machine.

Do you have an example? It shouldn't segfault. The buffer syntax also
shouldn't create two buffers (in fact, they acquire buffers, they
don't allocate them, although such functionality is also added).

Maybe I'm using it wrong. I was under the impression from the docs that to inline a function, you just add the inline keyword to the function header:

cdef inline double sum_elements(np.ndarray[np.double_t, ndim=1] X) nogil:

though according to http://groups.google.com/group/cython-users/browse_thread/thread/8fc8686315d7f3fe perhaps the inline doesn't actually happen for np.ndarray's. According to my tests, this bug *looks* fixed with the new memoryviews:

Inlining the summation with the memory view syntax makes it run 25-50% faster:

**** With ndarray[] syntax ****

In [4]: %timeit test.test_parallel(s)

10000 loops, best of 3: 221 us per loop

In [5]: %timeit test.test_parallel_inlined_sum_elements(s)

1000 loops, best of 3: 198 us per loop

**** With memoryview syntax ****

In [5]: %timeit test.test_parallel(s)

10000 loops, best of 3: 216 us per loop

In [6]: %timeit test.test_parallel_inlined_sum_elements(s)

1000 loops, best of 3: 130 us per loop

On a different note, in the memory view syntax, the array typing seems a bit more ambiguous. I have some numpy boolean arrays which are represented as np.ndarray[np.int8_t, ndim=2] in the previous syntax. I can now use char [:,:] to get the striding right, but doesn't the size of char (or int or whatever) depend on the system (32bit vs 64)? A chart with the 1:1 correspondence of each C type with each numpy type might be helpful, if there is such a mapping.

Thanks for the quick responses!

mark florisson

unread,

Oct 22, 2011, 2:22:40 PM10/22/11

to cython...@googlegroups.com

Wow, that's a large difference between inlining and not inlining.

> On a different note, in the memory view syntax, the array typing seems a bit
> more ambiguous. I have some numpy boolean arrays which are represented as
> np.ndarray[np.int8_t, ndim=2] in the previous syntax. I can now use char
> [:,:] to get the striding right, but doesn't the size of char (or int or
> whatever) depend on the system (32bit vs 64)? A chart with the 1:1
> correspondence of each C type with each numpy type might be helpful, if
> there is such a mapping.
> Thanks for the quick responses!

You can still use your numpy types as you were doing before:
'np.int8_t[:, :] myarray' will work just fine.

Jacob Biesinger

unread,

Oct 22, 2011, 2:25:22 PM10/22/11

to cython...@googlegroups.com

> **** With ndarray[] syntax ****
> In [4]: %timeit test.test_parallel(s)
> 10000 loops, best of 3: 221 us per loop
> In [5]: %timeit test.test_parallel_inlined_sum_elements(s)
> 1000 loops, best of 3: 198 us per loop
> **** With memoryview syntax ****
> In [5]: %timeit test.test_parallel(s)
> 10000 loops, best of 3: 216 us per loop
> In [6]: %timeit test.test_parallel_inlined_sum_elements(s)
> 1000 loops, best of 3: 130 us per loop
>

Wow, that's a large difference between inlining and not inlining.

Yeah, thanks to the cython devs for that syntax sugar!

> On a different note, in the memory view syntax, the array typing seems a bit
> more ambiguous. I have some numpy boolean arrays which are represented as
> np.ndarray[np.int8_t, ndim=2] in the previous syntax. I can now use char
> [:,:] to get the striding right, but doesn't the size of char (or int or
> whatever) depend on the system (32bit vs 64)? A chart with the 1:1
> correspondence of each C type with each numpy type might be helpful, if
> there is such a mapping.
> Thanks for the quick responses!

You can still use your numpy types as you were doing before:
'np.int8_t[:, :] myarray' will work just fine.

Ah, of course. Thanks!

mark florisson

unread,

Oct 22, 2011, 2:46:43 PM10/22/11

to cython...@googlegroups.com

On 22 October 2011 16:36, mark florisson <markflo...@gmail.com> wrote:
> On 22 October 2011 16:05, Jacob Biesinger <jake.bi...@gmail.com> wrote:
>> Hi Cythoners,
>> Perhaps this has been asked before but I didn't find it in the archives. I
>> have some code within a prange block that calls a simple function to aid in
>> the aggregation but every time it is called, I get a segfault. I'm not very
>> familiar with OpenMP-- perhaps this is not allowed?
>> Here's some code to repro:
>> cdef double sum_elements(np.ndarray[np.double_t, ndim=1] X) nogil:
>
> Basically, the buffer syntax (np.ndarray[...]) is not allowed for cdef
> nogil functions (it tries to acquire a buffer without the GIL, which
> is invalid). The compiler should give an error for this code, but
> unfortunately it doesn't (this is a known bug).

This should be fixed now:
https://github.com/markflorisson88/cython/commit/5008e863c4a035fe3b29d1389be5acd9367d0890

Jacob Biesinger

unread,

Oct 22, 2011, 2:53:14 PM10/22/11

to cython...@googlegroups.com

You can still use your numpy types as you were doing before:
'np.int8_t[:, :] myarray' will work just fine.

Ah, of course. Thanks!

Can memoryviews be used as if they were np.ndarrays for slicing? I expected:

cdef np.double_t[:,:] phi = np.ones((T,K))

phi[:] = 0.

to fill the entire array with zeros, but instead I get a compile-time error:

phi[:] = 0.

^

------------------------------------------------------------

vb_mf.pyx:101:19: Cannot convert 'double' to memoryviewslice

phi[:,:] = 0. also doesn't work. Perhaps a cast to a numpy array?

numpy_array = np.asarray(<np.double_t[:T,:K]> phi)

numpy_array[:] = 0.

Nope. That gives me:

tmp = np.asarray(<np.double_t[:T,:K]> phi)

^

------------------------------------------------------------

vb_mf.pyx:102:49: Can only create cython.array from pointer

Jacob Biesinger

unread,

Oct 22, 2011, 2:55:11 PM10/22/11

to cython...@googlegroups.com

On Sat, Oct 22, 2011 at 11:46 AM, mark florisson <markflo...@gmail.com> wrote:

On 22 October 2011 16:36, mark florisson <markflo...@gmail.com> wrote:
> On 22 October 2011 16:05, Jacob Biesinger <jake.bi...@gmail.com> wrote:
>> Hi Cythoners,
>> Perhaps this has been asked before but I didn't find it in the archives. I
>> have some code within a prange block that calls a simple function to aid in
>> the aggregation but every time it is called, I get a segfault. I'm not very
>> familiar with OpenMP-- perhaps this is not allowed?
>> Here's some code to repro:
>> cdef double sum_elements(np.ndarray[np.double_t, ndim=1] X) nogil:
>
> Basically, the buffer syntax (np.ndarray[...]) is not allowed for cdef
> nogil functions (it tries to acquire a buffer without the GIL, which
> is invalid). The compiler should give an error for this code, but
> unfortunately it doesn't (this is a known bug).

This should be fixed now:
https://github.com/markflorisson88/cython/commit/5008e863c4a035fe3b29d1389be5acd9367d0890

From the warning messages there, I take it that using memoryviews is more efficient than ndarray buffers?

mark florisson

unread,

Oct 22, 2011, 3:08:46 PM10/22/11

to cython...@googlegroups.com

On 22 October 2011 19:53, Jacob Biesinger <jake.bi...@gmail.com> wrote:
>>> You can still use your numpy types as you were doing before:
>>> 'np.int8_t[:, :] myarray' will work just fine.
>>
>> Ah, of course. Thanks!
>>
>
> Can memoryviews be used as if they were np.ndarrays for slicing? I
> expected:
> cdef np.double_t[:,:] phi = np.ones((T,K))
> phi[:] = 0.
> to fill the entire array with zeros, but instead I get a compile-time error:
> phi[:] = 0.
> ^
> ------------------------------------------------------------
> vb_mf.pyx:101:19: Cannot convert 'double' to memoryviewslice

You can slice typed memoryviews, but the problem is that slice
assignment (and vector operations) are not supported yet. You can get
your original object back though, using myarray.base. However, it may
have been sliced in the meantime, so depending on what you're doing
that may not be valid.

So usually its easier to initialize the array first using numpy vector
assignment, and then acquiring a typed memoryview.

Slice assignment will definitely be supported in the future, it's just
that someone needs to find the time. Of course, contributions are most
welcome. The good thing with typed memoryviews is that you can do
(basically) everything without the GIL: passing around, transposing,
indexing, slicing, etc. Of course we're not going to re-implement all
of numpy here :). It would be great if NumPy had a nogil API that
worked on raw pointers + stride info, though.

> phi[:,:] = 0. also doesn't work. Perhaps a cast to a numpy array?
>
> numpy_array = np.asarray(<np.double_t[:T,:K]> phi)
> numpy_array[:] = 0.
> Nope. That gives me:
> tmp = np.asarray(<np.double_t[:T,:K]> phi)
> ^
> ------------------------------------------------------------
> vb_mf.pyx:102:49: Can only create cython.array from pointer
>
>

Yes, this kind of casting syntax is useful to convert C data (i.e., C
pointers or arrays) to memoryviews, which can be coerced to
numpy.ndarrays (all without copying any data).

mark florisson

unread,

Oct 22, 2011, 3:14:46 PM10/22/11

to cython...@googlegroups.com

It depends on how you're using them. Passing them around should
typically be cheaper as it doesn't reacquire the buffers. Indexing
should have the same speed. Slicing (and indexing with x < ndim
dimensions) should be faster as it won't go through NumPy.

The most important thing about memoryslices is that they are more
elegant to write, and that a lot more functionality is available
without the GIL. This seems to be especially important for
cython.parallel users.

Jacob Biesinger

unread,

Oct 22, 2011, 10:36:33 PM10/22/11

to cython...@googlegroups.com

Is the plan to include element-wise operations, s.t. I could write something like:

phi[t,:] /= X[i,t,:] + Y[:,k]

without the GIL? That would be very powerful. On my current project, a lot of time has gone into rewriting numpy slices and element-wise operations as ugly for loops. How involved do you think implementing this would be?

I'm also following the discussion over on pypy about the micronumpy port-- they are trying to tackle similar issues. I'm Excited for faster numerical python! I wonder if cython's upcoming python backend will somehow support these memoryviews as well? Any hope of that making pypy's numpy port a bit easier/more comprehensive?

Thanks again.

mark florisson

unread,

Oct 23, 2011, 6:10:51 AM10/23/11

to cython...@googlegroups.com

Yes, and preferably it will use SIMD whenever it can, and will avoid
temporary buffers.

> On my current project, a lot
> of time has gone into rewriting numpy slices and element-wise operations as
> ugly for loops. How involved do you think implementing this would be?

Implementing just that (without any regard for SIMD) wouldn't be so
hard, the memoryview infrastructure is already there along with a lot
of functionality. It would take time, of course, but you can probably
do it in a weekend.

> I'm also following the discussion over on pypy about the micronumpy port--
> they are trying to tackle similar issues. I'm Excited for faster numerical
> python! I wonder if cython's upcoming python backend will somehow support
> these memoryviews as well?

Probably not, in fact, I'm not even sure how pypy handles the buffer
interface (I think they're still working on python 2.7, which
introduced the buffer interface?).

> Any hope of that making pypy's numpy port a bit
> easier/more comprehensive?
> Thanks again.
>

Hmm, as much as that would make things easier for a lot of people,
it's really not what they want. They want users to write their normal
Python NumPy code, and to make it fast with their JIT. For that
they're reimplementing it in RPython I believe, and they don't want to
just call into the NumPy C API or into C libraries, because the JIT
cannot reach there.

Our goals share some similarity but they are different. We only have
to re-implement very selective and simple parts of NumPy, such as
indexing, slicing etc, and it will make most things fast for many
users. This does mean that you cannot use NumPy functionality without
(expensive) coercion back to NumPy if you have the GIL. If you don't
have the GIL you can't do it at all without reacquiring the GIL (which
can be very expensive).

For pypy this is not good enough (or they believe it isn't), so they
have to go through a more involved process. I really don't think
Cython's pypy backend does (or would do if fully complete) what they
want. At some point if plain NumPy runs on PyPy, you might be able to
use ctypes memoryviews. None of that is currently implemented though,
and this situation may just lead people to choose between Cython and
PyPy. How the rest of the scientific libraries are going to be ported
or interface with PyPy's NumPy remains an open question.

mark florisson

unread,

Oct 23, 2011, 6:13:50 AM10/23/11

to cython...@googlegroups.com

BTW, if you're interested in that, raise an issue on cython-dev and we
can give you pointers. Don't feel obligated though :)

Reply all

Reply to author

Forward