Best Practices for passing numpy data pointer to C ?

8,837 views
Skip to first unread message

Chris Barker

unread,
Jun 27, 2012, 4:23:13 PM6/27/12
to cython-users
Hi folks,

We need to be able to pass the data pointer from a numpy array to C --
so that the data can be modified in place, and the changes seen in
the numpy array, without any data copying.

I found this thread on the list:

https://groups.google.com/forum/?fromgroups#!topic/cython-users/VW_AH2HEFfU

but I'm still a bit confused about best practices.

I've got two ways working now -- one is using the numpy array's data
member, and the other the address of the first element:

c_multiply (<double*> input.data, value, m, n)

c_multiply (&input[0,0], value, m, n)

In both cases, the array has been cdef'd to:

np.ndarray[double, ndim=2, mode="c"]


why might I choose one over the other? Or is there an even better way?

I've enclosed a complete working example, with setup.py, etc. if
anyone wants to take a look, or try another approach. Also the cython
code below:

I'll post the final result on the Wiki -- I think this would be
helpful to have there.

-Chris


import numpy as np
cimport numpy as np

cdef extern void c_multiply (double* array, double value, int m, int n)

def multiply(np.ndarray[double, ndim=2, mode="c"] input not None, double value):

cdef int m, n

m, n = input.shape[0], input.shape[1]

c_multiply (<double*> input.data, value, m, n)

return None

def multiply2(np.ndarray[double, ndim=2, mode="c"] input not None,
double value):
"""
same thing, but using the address of the first element instead of the data
member
"""
cdef int m, n

m, n = input.shape[0], input.shape[1]

c_multiply (&input[0,0], value, m, n)

return None



--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris....@noaa.gov
Cython-numpy_test.zip

Jérôme Kieffer

unread,
Jun 28, 2012, 1:39:24 AM6/28/12
to cython...@googlegroups.com
On Wed, 27 Jun 2012 13:23:13 -0700
Chris Barker <chris....@noaa.gov> wrote:


> I've got two ways working now -- one is using the numpy array's data
> member, and the other the address of the first element:
>
> c_multiply (<double*> input.data, value, m, n)
>
> c_multiply (&input[0,0], value, m, n)

I (personally) prefer the first one ... but I would be interested by the answer of gurus.


--
Jérôme Kieffer <goo...@terre-adelie.org>

Gregor Thalhammer

unread,
Jun 28, 2012, 3:54:11 AM6/28/12
to cython...@googlegroups.com

Am 28.6.2012 um 07:39 schrieb Jérôme Kieffer:

> On Wed, 27 Jun 2012 13:23:13 -0700
> Chris Barker <chris....@noaa.gov> wrote:
>
>
>> I've got two ways working now -- one is using the numpy array's data
>> member, and the other the address of the first element:
>>
>> c_multiply (<double*> input.data, value, m, n)
>>
>> c_multiply (&input[0,0], value, m, n)
>
> I (personally) prefer the first one ... but I would be interested by the answer of gurus.
>
I am neither a guru, but I also think that the first one should be preferred. However, since you require a C contiguous array both methods are equivalent in your example. In case you forget about this, using the .data attribute is more safe:

>>> np.arange(3)[::-1].data
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: cannot get single-segment buffer for discontiguous array

(this applies to pure python, does this also happen with cython?)

Gregor

Robert Bradshaw

unread,
Jun 28, 2012, 4:20:43 AM6/28/12
to cython...@googlegroups.com
On Thu, Jun 28, 2012 at 12:54 AM, Gregor Thalhammer
<gregor.t...@gmail.com> wrote:
>
> Am 28.6.2012 um 07:39 schrieb Jérôme Kieffer:
>
>> On Wed, 27 Jun 2012 13:23:13 -0700
>> Chris Barker <chris....@noaa.gov> wrote:
>>
>>
>>> I've got two ways working now -- one is using the numpy array's data
>>> member, and the other the address of the first element:
>>>
>>> c_multiply (<double*> input.data, value, m, n)
>>>
>>> c_multiply (&input[0,0], value, m, n)
>>
>> I (personally) prefer the first one ... but I would be interested by the answer of gurus.
>>
> I am neither a guru, but I also think that the first one should be preferred.

It's a matter of taste. There are cases where &input[0,0] would work
but <double*>input.data would not, e.g. input is offset in the major
dimension. For 1-d slices, I'd say writing &input[n,m] is preferable
doing pointer arithmetic, but if you know you're OK using .data in
your case than that's fine.

> However, since you  require a C contiguous array both methods are equivalent in your example. In case you forget about this, using the .data attribute is more safe:
>
>>>> np.arange(3)[::-1].data
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> AttributeError: cannot get single-segment buffer for discontiguous array
>
> (this applies to pure python, does this also happen with cython?)

No, I don't think it does.

- Robert

mark florisson

unread,
Jun 28, 2012, 6:05:31 AM6/28/12
to cython...@googlegroups.com
On 28 June 2012 09:20, Robert Bradshaw <robe...@gmail.com> wrote:
> On Thu, Jun 28, 2012 at 12:54 AM, Gregor Thalhammer
> <gregor.t...@gmail.com> wrote:
>>
>> Am 28.6.2012 um 07:39 schrieb Jérôme Kieffer:
>>
>>> On Wed, 27 Jun 2012 13:23:13 -0700
>>> Chris Barker <chris....@noaa.gov> wrote:
>>>
>>>
>>>> I've got two ways working now -- one is using the numpy array's data
>>>> member, and the other the address of the first element:
>>>>
>>>> c_multiply (<double*> input.data, value, m, n)
>>>>
>>>> c_multiply (&input[0,0], value, m, n)
>>>
>>> I (personally) prefer the first one ... but I would be interested by the answer of gurus.
>>>
>> I am neither a guru, but I also think that the first one should be preferred.
>
> It's a matter of taste. There are cases where &input[0,0] would work
> but <double*>input.data would not, e.g. input is offset in the major
> dimension. For 1-d slices, I'd say writing &input[n,m] is preferable
> doing pointer arithmetic, but if you know you're OK using .data in
> your case than that's fine.

I think &input[0, 0] should still be preferred, as accessing the .data
attribute is deprecated in numpy and the rewrite to PyArray_DATA() not
yet merged. Taking the pointer to the first element is also more
consistent with memoryviews.

Chris Barker

unread,
Jun 28, 2012, 1:20:45 PM6/28/12
to cython...@googlegroups.com
On Thu, Jun 28, 2012 at 3:05 AM, mark florisson
> I think &input[0, 0] should still be preferred, as accessing the .data
> attribute is deprecated in numpy and the rewrite to PyArray_DATA() not
> yet merged. Taking the pointer to the first element is also more
> consistent with memoryviews.

Sounds compelling to me.

I've put up a page on the Wiki:

http://wiki.cython.org/tutorials/NumpyPointerToC

and linked to it from:

http://wiki.cython.org/tutorials/numpy

I can't edit the main page, but maybe someone would like to put a link
there, too. Or maybe it's time to collect links to all the
numpy-specific pages in one place

Please to go in a take a look and improve it if you like -- particular
little C function -- I"m a pretty pathetic C coder -- it's not the
point of the tutorial, but it doesn't follow best practices, it might
be good to clean it up.

-Chris

Sturla Molden

unread,
Jun 29, 2012, 10:07:16 AM6/29/12
to cython...@googlegroups.com
I like to point out that

&input[0,0]

is consistent with the standard way of casting C++ std::vector<T> to T*.
The syntax &array[0] is seen in all C++ code needing to provide a data
pointer from STL and Boost containers to C functions. So this syntax is
not special to Cython at all. It means that Cython and C++ will call C
in the same way.

Also observe that input(0,0) is how we pass a Fortran 77 array as a
pointer to C. In a subroutine call statement, input(0,0) in Fortran 77
means the same as &input[0,0] will do in Cython.

Sturla



On 28.06.2012 12:05, mark florisson wrote:
> On 28 June 2012 09:20, Robert Bradshaw<robe...@gmail.com> wrote:
>> On Thu, Jun 28, 2012 at 12:54 AM, Gregor Thalhammer
>> <gregor.t...@gmail.com> wrote:
>>>
>>> Am 28.6.2012 um 07:39 schrieb Jérôme Kieffer:
>>>
>>>> On Wed, 27 Jun 2012 13:23:13 -0700
>>>> Chris Barker<chris....@noaa.gov> wrote:
>>>>
>>>>
>>>>> I've got two ways working now -- one is using the numpy array's data
>>>>> member, and the other the address of the first element:
>>>>>
>>>>> c_multiply (<double*> input.data, value, m, n)
>>>>>
>>>>> c_multiply (&input[0,0], value, m, n)
>>>>
>>>> I (personally) prefer the first one ... but I would be interested by the answer of gurus.
>>>>
>>> I am neither a guru, but I also think that the first one should be preferred.
>>
>> It's a matter of taste. There are cases where&input[0,0] would work
>> but<double*>input.data would not, e.g. input is offset in the major
>> dimension. For 1-d slices, I'd say writing&input[n,m] is preferable
>> doing pointer arithmetic, but if you know you're OK using .data in
>> your case than that's fine.
>
> I think&input[0, 0] should still be preferred, as accessing the .data

Sturla Molden

unread,
Jun 29, 2012, 10:18:19 AM6/29/12
to cython...@googlegroups.com
On 29.06.2012 16:07, Sturla Molden wrote:
> I like to point out that
>
> &input[0,0]
>
> is consistent with the standard way of casting C++ std::vector<T> to T*.
> The syntax &array[0] is seen in all C++ code needing to provide a data
> pointer from STL and Boost containers to C functions. So this syntax is
> not special to Cython at all. It means that Cython and C++ will call C
> in the same way.

The only exception is std::string, which has a special method for
casting to char* to ensure the character array is nul terminated.

The same issue is present in Cython as well, if we need to convert a
numpy array (e.g. of numpy.uint8) or a memoryview to a C character
array. A method like .c_str() might be useful in Cython as well.

Sturla



Chris Barker

unread,
Jun 29, 2012, 1:04:21 PM6/29/12
to cython...@googlegroups.com
On Fri, Jun 29, 2012 at 7:07 AM, Sturla Molden <sturla...@yahoo.no> wrote:
> I like to point out that
>
>   &input[0,0]
>
> is consistent with the standard way of casting C++ std::vector<T> to T*.
> The syntax &array[0] is seen in all C++ code needing to provide a data
> pointer from STL and Boost containers to C functions. So this syntax is
> not special to Cython at all. It means that Cython and C++ will call C
> in the same way.

Thanks -- another good reason. I do note that &input[0,0] does
generate a fair bit more code than input.data:

/* "multiply.pyx":34
* m, n = input.shape[0], input.shape[1]
*
* c_multiply (&input[0,0], value, m, n) # <<<<<<<<<<<<<<
*
* return None
*/
__pyx_t_3 = 0;
__pyx_t_4 = 0;
__pyx_t_5 = -1;
if (__pyx_t_3 < 0) {
__pyx_t_3 += __pyx_bshape_0_input;
if (unlikely(__pyx_t_3 < 0)) __pyx_t_5 = 0;
} else if (unlikely(__pyx_t_3 >= __pyx_bshape_0_input)) __pyx_t_5 = 0;
if (__pyx_t_4 < 0) {
__pyx_t_4 += __pyx_bshape_1_input;
if (unlikely(__pyx_t_4 < 0)) __pyx_t_5 = 1;
} else if (unlikely(__pyx_t_4 >= __pyx_bshape_1_input)) __pyx_t_5 = 1;
if (unlikely(__pyx_t_5 != -1)) {
__Pyx_RaiseBufferIndexError(__pyx_t_5);
{__pyx_filename = __pyx_f[0]; __pyx_lineno = 34; __pyx_clineno =
__LINE__; goto __pyx_L1_error;}
}
c_multiply((&(*__Pyx_BufPtrCContig2d(double *,
__pyx_bstruct_input.buf, __pyx_t_3, __pyx_bstride_0_input, __pyx_t_4,
__pyx_bstride_1_input))), __pyx_v_value, __pyx_v_m, __pyx_v_n);


vs.:

c_multiply(((double *)__pyx_v_input->data), __pyx_v_value, __pyx_v_m,
__pyx_v_n);

(i.e. no extra code)

I doubt it's a performance hit that matters, but it was interesting.
IF you turn boundcheck off, it gets a fair bit smaller:

__pyx_t_3 = 0;
__pyx_t_4 = 0;
if (__pyx_t_3 < 0) __pyx_t_3 += __pyx_bshape_0_input;
if (__pyx_t_4 < 0) __pyx_t_4 += __pyx_bshape_1_input;
c_multiply((&(*__Pyx_BufPtrCContig2d(double *,
__pyx_bstruct_input.buf, __pyx_t_3, __pyx_bstride_0_input, __pyx_t_4,
__pyx_bstride_1_input))), __pyx_v_value, __pyx_v_m, __pyx_v_n);

Though I"m a little confused by this code -- it appears to be setting
a couple values to 0, then checking if they are less than zero. --
with bound-check off, why is this still there?

Probably not worth special-casing, but in teh cse of a hard-coded
[0,0], wouldn't we always know that it's OK -- and there would never
be a need for any bound checking or offset computing of any sort?

-Thanks,
-Chris

Robert Bradshaw

unread,
Jun 29, 2012, 2:00:11 PM6/29/12
to cython...@googlegroups.com
Yes, we do often make the assumption that "useless" code like this is
optimized away by the C compiler, allowing us to emit generic code
(rather than creating a big suite of special cases in our code).

- Robert

Sturla Molden

unread,
Jun 29, 2012, 7:04:15 PM6/29/12
to cython...@googlegroups.com
On Fri, 29 Jun 2012 10:04:21 -0700, Chris Barker
<chris....@noaa.gov> wrote:

> I doubt it's a performance hit that matters, but it was interesting.
> IF you turn boundcheck off, it gets a fair bit smaller:
>
> __pyx_t_3 = 0;
> __pyx_t_4 = 0;
> if (__pyx_t_3 < 0) __pyx_t_3 += __pyx_bshape_0_input;
> if (__pyx_t_4 < 0) __pyx_t_4 += __pyx_bshape_1_input;
> c_multiply((&(*__Pyx_BufPtrCContig2d(double *,
> __pyx_bstruct_input.buf, __pyx_t_3, __pyx_bstride_0_input, __pyx_t_4,
> __pyx_bstride_1_input))), __pyx_v_value, __pyx_v_m, __pyx_v_n);
>
> Though I"m a little confused by this code -- it appears to be setting
> a couple values to 0, then checking if they are less than zero. --
> with bound-check off, why is this still there?

It is there because you did not turn off 'wraparound'. That is,
array[-n] is valid Python indexing. That is what the checks for less
than zero does. To optimize it away from the C code, use
@cython.wraparound(False). In practice, all speed-critical numerical
code written in Cython should turn off both 'boundscheck' and
'wraparound' when needed. When they are both turned off, the C compiler
will optimize away the rest of the overhead.

Turning off boundscheck means that some errors can go silent. Turning
off wraparound will break compatibility with Python. I prefer to limit
these optimizations as much as I can, and therefore only use them as
decorators and never as global compiler flags to Cython.

Another issue is that this extra overhead hardly matters in the
majority of numerical Cython code, as the heavy lifing should be
referred to libraries like blas, lapack and fftw. Only when the
computational kernel is written in Cython do these optimizations matter.

Sturla

Sturla Molden

unread,
Jun 29, 2012, 7:19:16 PM6/29/12
to cython...@googlegroups.com
On Sat, 30 Jun 2012 01:04:15 +0200, Sturla Molden
<sturla...@yahoo.no> wrote:

>> __pyx_t_3 = 0;
>> __pyx_t_4 = 0;
>> if (__pyx_t_3 < 0) __pyx_t_3 += __pyx_bshape_0_input;
>> if (__pyx_t_4 < 0) __pyx_t_4 += __pyx_bshape_1_input;
>> c_multiply((&(*__Pyx_BufPtrCContig2d(double *,
>> __pyx_bstruct_input.buf, __pyx_t_3, __pyx_bstride_0_input,
>> __pyx_t_4,
>> __pyx_bstride_1_input))), __pyx_v_value, __pyx_v_m, __pyx_v_n);
>>
>> Though I"m a little confused by this code -- it appears to be
>> setting
>> a couple values to 0, then checking if they are less than zero. --
>> with bound-check off, why is this still there?
>
> It is there because you did not turn off 'wraparound'. That is,
> array[-n] is valid Python indexing. That is what the checks for less
> than zero does. Sciences,

Just to make this clear: They are initially set to zero as you indexed
&input[0,0]. So here the C compiler will optimize away the tests for
negative values. But if you instead wrote something like &input[i,j],
this optimization might perhaps not be possible -- it depends on what
the C compiler can infer. So to manually turn it off, tell Cython to
turn off wraparound (i.e. all indexes are assumed to be positive).

Sturla

Chris Barker

unread,
Jul 2, 2012, 2:30:31 PM7/2/12
to cython...@googlegroups.com
On Fri, Jun 29, 2012 at 4:04 PM, Sturla Molden <sturla...@yahoo.no> wrote:
> It is there because you did not turn off 'wraparound'.

Good point -- forgot about that one. I've added both to the Wiki page.

Chris Barker

unread,
Jul 9, 2012, 12:06:53 PM7/9/12
to cython...@googlegroups.com
On Mon, Jul 9, 2012 at 6:01 AM, Larry Eisenman <eisen...@gmail.com> wrote:

> Is there some reason that you did not pass your input array through
> np.ascontiguousarray() as Sturla did in the thread you originally cited?

IIUC, ascontiguousarray() will pass through a C-contiguous array, but
make a copy if it needs to in order to give you one. So in this case,
the goal is to have the array alterened in place, --that wont happen
if the a copy is made first -- the copy will be altered instead.

by declaring the input: np.ndarray[double, ndim=2, mode="c"]

Cython will check for C-contiguous on input I think -- I hopet hat's
right! if not, how do I do that check?

Sturla Molden

unread,
Jul 13, 2012, 11:54:59 AM7/13/12
to cython...@googlegroups.com
On 12.07.2012 20:45, Larry Eisenman wrote:

> After poking through some of the numpy docs, it seems that the default
> for almost all ndarray creation routines is C-contiguous. Fortran style
> arrays would need to be explicitly created as such. Non-contiguous
> arrays may also be possible but it's not clear to me how (much less why)
> to create one.
>
> If you edit your test code to explicitly create a Fortran style array (a
> = np.arange(12, dtype=np.float64).reshape((3,4), order='F')), your test
> code gives the following error when calling multiply.multiply(a, 3):
>
> ValueError: ndarray is not C-contiguous
>
> Your function definition seems to produce code testing that the passed
> array is C-contiguous.
>
> I would definitely like to see your example added to the top level of
> the Wiki so it is more visible or perhaps even to the tutorial section
> in the Cython docs.


You seem to have a problem understanding NumPy, which is not a Cython
issue. NumPy does what it does, independently of Cython. We were
discussing how to access the contents of a NumPy array, not how to use
NumPy.

This might give you a hint:

Create arrays in C order:
np.empty((m,n))
np.zeros((m,n))
np.ones((m,n))
np.ascontiguousarray(obj)
array.copy() or array.copy(order="C")
np.asarray(obj,order="C")
array.reshape((m,n))
array.reshape((m,n),order="C")

Create arrays in Fortran order:
np.empty((m,n), order="F")
np.zeros((m,n), order="F")
np.ones((m,n), order="F")
np.asfortranarray(obj)
array.copy(order="F")
np.asarray(obj,order="F")
array.reshape((4,4),order="F")

Create arrays in any order:
np.asarray(obj,order="A")
array.copy(order="A")
array.reshape((4,4),order="A")

np.asarray(obj) might return Fortran order (countrary to doc)
array.T swappes between C and Fortran order

Transpose from C order:
array.T # fortran order, no copy
array.T.copy() # copy, C order, binary transpose
array.T.copy(order="F") # copy, Fortran order, no binary transpose

Transpose from Fortran order:
array.T # C order, no copy
array.T.copy() # C order, no binary transpose
array.T.copy(order="F") # copy, Fortran order, binary transpose

Note that there are other options too, in which NumPy arrays are neither
C nor Fortran contiguous.

array.reshape(shape) usually preserves element order.


Sturla

Chris Barker

unread,
Jul 13, 2012, 12:10:09 PM7/13/12
to cython...@googlegroups.com
On Fri, Jul 13, 2012 at 8:54 AM, Sturla Molden <sturla...@yahoo.no> wrote:

>> After poking through some of the numpy docs, it seems that the default
>> for almost all ndarray creation routines is C-contiguous. Fortran style
>> arrays would need to be explicitly created as such.

yup.

>> Non-contiguous
>> arrays may also be possible but it's not clear to me how (much less why)
>> to create one.

the common way is slicing -- slicing numpy arrays does not make a copy
of the underlying data, and if the slice is not a contiguous chunk of
memory, you get a non-contiguous array:

In [112]: a = np.ones((4,5))

In [113]: a.flags
Out[113]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False

In [114]: b = a [:, 2]

In [115]: b.flags
Out[115]:
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False

>> If you edit your test code to explicitly create a Fortran style array (a
>> = np.arange(12, dtype=np.float64).reshape((3,4), order='F')), your test
>> code gives the following error when calling multiply.multiply(a, 3):
>>
>> ValueError: ndarray is not C-contiguous
>>
>> Your function definition seems to produce code testing that the passed
>> array is C-contiguous.

exactly -- numpy arrays provide a very nice high-level abstraction
around a block of data in memory -- you can work with them in Python
with little concern for type, order, contiguousness (is that a word?),
etc. HOwever, the point os this example is to pass a pointer to that
block of data to C or C++ -- in that case, the data ranged in memory
darn well better be arranged as expected.

One of the great thinks about Cython and its numpy support is that I
can do that with a simple declaration.

>> I would definitely like to see your example added to the top level of
>> the Wiki so it is more visible or perhaps even to the tutorial section
>> in the Cython docs.

It's in the Wiki -- and I've put in references to it at a couple of
spots -- I"m not sure it belongs at the top leve, but I dont' think I
have write permission to that anyway.

I do think we could use a "numpy examples" page or something like
that, where we could reference these sorts of things.

Sturla Molden

unread,
Jul 13, 2012, 12:16:14 PM7/13/12
to cython...@googlegroups.com
On 12.07.2012 20:45, Larry Eisenman wrote:

> If you edit your test code to explicitly create a Fortran style array (a
> = np.arange(12, dtype=np.float64).reshape((3,4), order='F')), your test
> code gives the following error when calling multiply.multiply(a, 3):
>
> ValueError: ndarray is not C-contiguous
>
> Your function definition seems to produce code testing that the passed
> array is C-contiguous.

As it should, otherwise data access get messed up. In Cython you must
define arrays to either be C or Fortran contiguous, or in any order:

Which means we should preferably code like this:

cdef np.ndarray[np.float64_t, ndim=2, order="c"] carray
cdef np.ndarray[np.float64_t, ndim=2, order="f"] farray
cdef np.ndarray[np.float64_t, ndim=2] any_order_array
carray = np.ascontiguousarray(some_numpy_array, dtype=np.float64)
farray = np.asfortranarray(some_numpy_array, dtype=np.float64)
any_order_array = np.asarray(some_numpy_array, dtype=np.float64)

Or with memoryviews:

cdef np.float64_t[::,::1] carray
cdef np.float64_t[::1,::] farray
cdef np.float64_t[::,::] any_order_array
carray = np.ascontiguousarray(some_numpy_array, dtype=np.float64)
farray = np.asfortranarray(some_numpy_array, dtype=np.float64)
any_order_array = np.asarray(some_numpy_array, dtype=np.float64)

Or raw C pointer:

cdef np.float64_t *data
data = <np.float64_t *> cpython.PyLong_AsVoidPtr(
some_numpy_array.__array_interface__['data'][0])

cdef np.ndarray any_order_array
cdef np.float64_t *data
any_order_array = some_numpy_array
data = <np.float64_t *> np.PyArray_DATA(any_order_array)



Sturla




Sturla Molden

unread,
Jul 16, 2012, 1:49:18 PM7/16/12
to cython...@googlegroups.com
Den 09.07.2012 18:06, skrev Chris Barker:
> IIUC, ascontiguousarray() will pass through a C-contiguous array, but
> make a copy if it needs to in order to give you one. So in this case,
> the goal is to have the array alterened in place, --that wont happen
> if the a copy is made first -- the copy will be altered instead.

Which is why my Cython codes that wraps inplace modification of an array
in C or Fortran do this:

import numpy as np
cimport numpy as np

cdef extern from *:
void foobar(double*)

def modify_inplace(object array, inplace=True):
cdef object tmp
cdef double[::,::1] arrayview
assert(type(array) is np.ndarray)
tmp = np.ascontiguousarray(array, dtype=np.float64)
if (not inplace) and (tmp is array):
tmp = tmp.copy()
arrayview = tmp
foobar(&arrayview[0,0]) # C function
if tmp is not array:
array[...] = tmp[...]

Or this:


def modify_inplace(object array_like, allow_inplace=True):
cdef object tmp
cdef double[::,::1] arrayview
tmp = np.ascontiguousarray(array_like, dtype=np.float64)
if (not inplace) and (tmp is array):
tmp = tmp.copy()
arrayview = tmp
foobar(&arrayview[0,0]) # C function
return tmp


The latter allows any array-like object as input. But this function must e.g. be called like this:

a =modify_inplace(a)


>by declaring the input: np.ndarray[double, ndim=2, mode="c"]
>
>Cython will check for C-contiguous on input I think -- I hopet hat's
>right! if not, how do I do that check?



It will do the check depending on which keyword arguments you pass to
np.ndarray.


Sturla


Nader Al-Naji

unread,
Jul 24, 2012, 2:11:48 PM7/24/12
to cython...@googlegroups.com
What if the array is empty? The following code results in an IndexError:

print 'This is going to be out of bounds.'
cdef np.ndarray[long, ndim=2, mode="c"] two_d_arr = np.array([[],[]],
dtype=int)
print two_d_arr
printf("pointer to null data: %u\n", <unsigned long>two_d_arr.data)
cdef long* data_ptr = <long*>(&two_d_arr[0,0])

Will there ever be an alternative to using the .data member that doesn't have
this problem?


Robert Bradshaw

unread,
Jul 24, 2012, 2:44:35 PM7/24/12
to cython...@googlegroups.com
That is a good point. Of course .data won't always work if it's an
empty view of a larger array either.

- Robert

Chris Barker

unread,
Jul 24, 2012, 3:35:19 PM7/24/12
to cython...@googlegroups.com
On Tue, Jul 24, 2012 at 11:11 AM, Nader Al-Naji <nb...@princeton.edu> wrote:
> What if the array is empty?

I'm not sure there is any way around checking for an empty array --
indeed, you need to check other things about the array anyway (though
Cython does take care of that for you)

> The following code results in an IndexError:
>
> print 'This is going to be out of bounds.'
> cdef np.ndarray[long, ndim=2, mode="c"] two_d_arr = np.array([[],[]],
> dtype=int)
> print two_d_arr
> printf("pointer to null data: %u\n", <unsigned long>two_d_arr.data)

what is arr.data for an empty array?

> cdef long* data_ptr = <long*>(&two_d_arr[0,0])

Actually raising an error is the right thing to do here, in most
cases, but if boundscheck is False, I suspect it would move right on
through -- but what would that address be? the same as the value of
.data, I suspect.

then, in the C code it's passed too, there is likely to be some sort
of loop though the array, and being of zero size, that code would be a
no-op, so it may all be OK (depending on what .data is...)

Indeed -- it seems to work fine with the example I put in the Wiki --
i.e. it's a no-op, but nothing crashes or anything.

-Chris

Robert Bradshaw

unread,
Jul 26, 2012, 12:41:45 AM7/26/12
to cython...@googlegroups.com
On Tue, Jul 24, 2012 at 12:35 PM, Chris Barker <chris....@noaa.gov> wrote:
> On Tue, Jul 24, 2012 at 11:11 AM, Nader Al-Naji <nb...@princeton.edu> wrote:
>> What if the array is empty?
>
> I'm not sure there is any way around checking for an empty array --
> indeed, you need to check other things about the array anyway (though
> Cython does take care of that for you)
>
>> The following code results in an IndexError:
>>
>> print 'This is going to be out of bounds.'
>> cdef np.ndarray[long, ndim=2, mode="c"] two_d_arr = np.array([[],[]],
>> dtype=int)
>> print two_d_arr
>> printf("pointer to null data: %u\n", <unsigned long>two_d_arr.data)
>
> what is arr.data for an empty array?
>
>> cdef long* data_ptr = <long*>(&two_d_arr[0,0])
>
> Actually raising an error is the right thing to do here, in most
> cases, but if boundscheck is False, I suspect it would move right on
> through -- but what would that address be? the same as the value of
> .data, I suspect.
>
> then, in the C code it's passed too, there is likely to be some sort
> of loop though the array, and being of zero size, that code would be a
> no-op, so it may all be OK (depending on what .data is...)
>
> Indeed -- it seems to work fine with the example I put in the Wiki --
> i.e. it's a no-op, but nothing crashes or anything.

I think raising an error if boundscheck is True is the bad behavior,
but you're right if it's False it should just work (even if it's
offset from .data), thought the actual value of the pointer shouldn't
matter in this corner case as it should never be dereferenced.

- Robert

Chris Barker

unread,
Jul 26, 2012, 2:12:02 PM7/26/12
to cython...@googlegroups.com
On Wed, Jul 25, 2012 at 9:41 PM, Robert Bradshaw <robe...@gmail.com> wrote:

>>> The following code results in an IndexError:
>>>
>>> print 'This is going to be out of bounds.'
>>> cdef np.ndarray[long, ndim=2, mode="c"] two_d_arr = np.array([[],[]],
>>> dtype=int)
>>> print two_d_arr
>>> printf("pointer to null data: %u\n", <unsigned long>two_d_arr.data)
>>
>> what is arr.data for an empty array?
>>
>>> cdef long* data_ptr = <long*>(&two_d_arr[0,0])

> I think raising an error if boundscheck is True is the bad behavior,

why? That's how it works in Python:

In [6]: arr = np.ones( (0,0) )

In [7]: arr
Out[7]: array([], shape=(0, 0), dtype=float64)

In [8]: arr[0,0]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/Users/chris.barker/<ipython-input-8-14d8860cb090> in <module>()
----> 1 arr[0,0]

IndexError: index (0) out of range (0<=index<0) in dimension 0

I think the problem here is that arr[0] kind of means two things:

"get me the zeroth item"

and

"get me the pointer to the beginning of the data array"

In most cases, those are the same thing, but not in this case.

Which is why I suppose we really should have a canonical way to get
that pointer -- thus arr.data, but it has its problems, as well.

Robert Bradshaw

unread,
Jul 26, 2012, 2:28:20 PM7/26/12
to cython...@googlegroups.com
On Thu, Jul 26, 2012 at 11:12 AM, Chris Barker <chris....@noaa.gov> wrote:
> On Wed, Jul 25, 2012 at 9:41 PM, Robert Bradshaw <robe...@gmail.com> wrote:
>
>>>> The following code results in an IndexError:
>>>>
>>>> print 'This is going to be out of bounds.'
>>>> cdef np.ndarray[long, ndim=2, mode="c"] two_d_arr = np.array([[],[]],
>>>> dtype=int)
>>>> print two_d_arr
>>>> printf("pointer to null data: %u\n", <unsigned long>two_d_arr.data)
>>>
>>> what is arr.data for an empty array?
>>>
>>>> cdef long* data_ptr = <long*>(&two_d_arr[0,0])
>
>> I think raising an error if boundscheck is True is the bad behavior,
>
> why?

I didn't mean to imply it's incorrect (it isn't). It's just
unfortunate for this hack.

> That's how it works in Python:
>
> In [6]: arr = np.ones( (0,0) )
>
> In [7]: arr
> Out[7]: array([], shape=(0, 0), dtype=float64)
>
> In [8]: arr[0,0]
> ---------------------------------------------------------------------------
> IndexError Traceback (most recent call last)
> /Users/chris.barker/<ipython-input-8-14d8860cb090> in <module>()
> ----> 1 arr[0,0]
>
> IndexError: index (0) out of range (0<=index<0) in dimension 0
>
> I think the problem here is that arr[0] kind of means two things:
>
> "get me the zeroth item"
>
> and
>
> "get me the pointer to the beginning of the data array"
>
> In most cases, those are the same thing, but not in this case.
>
> Which is why I suppose we really should have a canonical way to get
> that pointer -- thus arr.data, but it has its problems, as well.

Yep. What we really want is a pointer to such an array also assuring
that the data is strided and striped as expected.

- Robert

Sturla Molden

unread,
Jul 26, 2012, 2:59:40 PM7/26/12
to cython...@googlegroups.com
Den 26.07.2012 20:12, skrev Chris Barker:
> Which is why I suppose we really should have a canonical way to get
> that pointer -- thus arr.data, but it has its problems, as well.

That is what

<dtype_t*> np.PyArray_DATA(arr)

does.

NEVER use "arr.data", as it depends on the layout of PyArrayObject,
which is due to change.

Always use the macros in the NumPy C API to access the fields in
np.ndarray directly:

np.PyArray_DATA
np.PyArray_SHAPE
np.PyArray_NDIM
np.PyArray_STRIDES

(which is also what Cython does behind the scenes.)


Sturla




Chris Barker

unread,
Jul 26, 2012, 3:07:41 PM7/26/12
to cython...@googlegroups.com
On Thu, Jul 26, 2012 at 11:59 AM, Sturla Molden <sturla...@yahoo.no> wrote:
> Den 26.07.2012 20:12, skrev Chris Barker:
>
>> Which is why I suppose we really should have a canonical way to get
>> that pointer -- thus arr.data, but it has its problems, as well.
>
> That is what
>
> <dtype_t*> np.PyArray_DATA(arr)
>
> does.

But does that give you the address of the zeroth element? or the
address of the beginning of the data block? -- which I understand may
not be the case, say for an array that is a slice of another array.


i.e should I update that Wiki page with

<dtype_t*> np.PyArray_DATA(arr)

instead of

&arr[0]

mark florisson

unread,
Jul 26, 2012, 3:56:58 PM7/26/12
to cython...@googlegroups.com
On 26 July 2012 20:07, Chris Barker <chris....@noaa.gov> wrote:
> On Thu, Jul 26, 2012 at 11:59 AM, Sturla Molden <sturla...@yahoo.no> wrote:
>> Den 26.07.2012 20:12, skrev Chris Barker:
>>
>>> Which is why I suppose we really should have a canonical way to get
>>> that pointer -- thus arr.data, but it has its problems, as well.
>>
>> That is what
>>
>> <dtype_t*> np.PyArray_DATA(arr)
>>
>> does.
>
> But does that give you the address of the zeroth element? or the
> address of the beginning of the data block? -- which I understand may
> not be the case, say for an array that is a slice of another array.
>

Those will always be the same. When you slice an array that changes
the starting element in some dimension, the data pointer is moved for
the new view.

Sturla Molden

unread,
Jul 26, 2012, 7:37:44 PM7/26/12
to cython...@googlegroups.com
Den 26.07.2012 21:07, skrev Chris Barker:
>
> That is what
>
> <dtype_t*> np.PyArray_DATA(arr)
>
> does.
>
> But does that give you the address of the zeroth element?

Yes.


> or the
> address of the beginning of the data block?

Not always.



>
> i.e should I update that Wiki page with
>
> <dtype_t*> np.PyArray_DATA(arr)
>
> instead of
>
> &arr[0]
>

No.


arr[i,j,k] in Cython is this in C:

*((dtype_t*)(PyArray_DATA(arr)
+ i*PyArray_STRIDES(arr)[0]
+ j*PyArray_STRIDES(arr)[1]
+ k*PyArray_STRIDES(arr)[2]))

Setting i,j,k to zero you can see thatPyArray_DATA(arr)
is the address of the zeroth element in array arr.



Sturla


Sturla Molden

unread,
Jul 26, 2012, 7:49:31 PM7/26/12
to cython...@googlegroups.com
Den 27.07.2012 01:37, skrev Sturla Molden:
> Den 26.07.2012 21:07, skrev Chris Barker:
>>
>> That is what
>>
>> <dtype_t*> np.PyArray_DATA(arr)
>>
>> does.
>>
>> But does that give you the address of the zeroth element?
>
> Yes.
>

Strictly speaking it gives the address of the "zeroth byte of the zeroth
element". I.e. it gives you a char*, and the strides along each axis are
in number of characters, not number of elements. That is why we do
pointer arithmetics on strides first, and then cast to a pointer of the
correct type. If we have certain knowledge about the strides of the
array, e.g. that it is contiguous in C order, we can avoid this
character fiddling and use pointer arithmetics directly on elements
instead. That is why specifying mode="c" and e.g. ndim=2 in the cdef of
the ndarray will speed things up considerably.

Sturla






Chris Barker

unread,
Jul 27, 2012, 2:06:52 PM7/27/12
to cython...@googlegroups.com
On Fri, Jul 27, 2012 at 10:45 AM, Nader Al-Naji <iamno...@gmail.com> wrote:

> <dtype_t*> np.PyArray_DATA(arr)
>
> is the right way to go. It is superior to the &a[0] solution because it
> doesn't require bounds checking, and it is superior to the .data solution

> 2) BONUS: it works even on
> arrays that haven't been statically casted to a Cython np.ndarray[...]. To
> fully understand what I mean by (2), take a look at the following example:
>
> a = np.arange(10, dtype=int)
>
> # This works even if a hasn't been casted yet.
> fprintf(stderr, "%lu %lu\n", <size_t>np.PyArray_DATA(a))

what happens if a is not a numpy array? If you create it in your
Cython code, you can be confident that it is, but if it's passed in,
who knows? and if you have to check, why not cast it?

> Not having to cast the array before extracting the buffer is useful if one
> knows the size of the elements one is dealing with, but not necessarily the
> type.

you can cast it to a ndarray with unspecified type -- I suspect arr[0]
will work there, too.

> So I think Sturla's solution is what we should use in the future. It has the
> same semantics as .data, which I think a lot of people really like, and it
> won't be compromised if numpy changes.

well, as Sturla points out, Cython essentially creates the same code
under the hood if you index the zeroth element anyway. My it's just
aesthetics, but I like the more pyhtonic looking: &arr[0]

-Chris




>
> Anyone have any other reasons why Sturla's solution might not be a good
> idea?
>
> Thanks,
> Nader

Sturla Molden

unread,
Jul 27, 2012, 2:42:50 PM7/27/12
to cython...@googlegroups.com
Den 27.07.2012 20:06, skrev Chris Barker:
> 2) BONUS: it works even on
> arrays that haven't been statically casted to a Cython np.ndarray[...]. To
> fully understand what I mean by (2), take a look at the following example:
>
> a = np.arange(10, dtype=int)
>
> # This works even if a hasn't been casted yet.
> fprintf(stderr, "%lu %lu\n",<size_t>np.PyArray_DATA(a))
> what happens if a is not a numpy array?

np.PyArray_DATA is a C macro that does not check anything.

> well, as Sturla points out, Cython essentially creates the same code
> under the hood if you index the zeroth element anyway. My it's just
> aesthetics, but I like the more pyhtonic looking: &arr[0] -Chris

That's what Cython does, yes. You can see the code in
Cython/Includes/numpy.pxd, i.e. the function __getbuffer__ in np.ndarray.

Two other benefits of &arr[0] are that the code will be easier to port
to typed memoryviews and that it uses the same coding style as C++ code
passing the buffer of a std::vector to C (or even Cython passing
libcpp.vector.vector to C.) In all cases, &arr[0] is what we use.
&arr[0] is also shorter to write than <dtype_t*>np.PyArray_DATA(arr). To
get rid of the bounds checking (which hardly matters here), use
@cython.boundscheck(False) and @cython.wraparound(False). Using &arr[0]
will also provide better run-time error checking, such as ensuring the
buffer is contiguous if that is required by the C code.

I do not prefer <dtype_t*>np.PyArray_DATA(arr) over &arr[0], but this is
a matter of taste I guess. I just want to discourage
"<dtype_t*>arr.data", which is soon due to break.


Sturla














Dag Sverre Seljebotn

unread,
Jul 27, 2012, 3:35:40 PM7/27/12
to cython...@googlegroups.com
(This isn't of consequence to the original poster and sort of goes off topic, I just thought Sturla may be interested:)

Actually there's a half-baked patch to turn arr.data into PyArray_DATA for np.ndarray. The lgoic is that arr.shape is so much used, and we don't want that to break, so we'll throw in some hacks to change that to using a macro, and then we may as well throw in 'data' too.

I think this is all pending a decision about how closely the new memoryviews should emulate the current ndarray/buffer syntax (i.e. can you access the underlying object of a memoryview transparently), and we don't seem to have the energy to discuss that yet.

Dag



>
>
>Sturla

--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Sturla Molden

unread,
Jul 27, 2012, 6:12:32 PM7/27/12
to cython...@googlegroups.com
Den 27.07.2012 21:35, skrev Dag Sverre Seljebotn:
> Actually there's a half-baked patch to turn arr.data into PyArray_DATA for np.ndarray. The lgoic is that arr.shape is so much used, and we don't want that to break, so we'll throw in some hacks to change that to using a macro, and then we may as well throw in 'data' too.

That might prevent some code from breaking when NumPy completes the
transition to PyArray_DATA...

>
> I think this is all pending a decision about how closely the new memoryviews should emulate the current ndarray/buffer syntax (i.e. can you access the underlying object of a memoryview transparently), and we don't seem to have the energy to discuss that yet.
>

I recently suggested using typed memoryviews in SciPy
(scipy.spatial.cKDTree to begin with), but it got clubbed down for not
being compatible with Python 2.4... We finally settled on using
PyArray_DATA and plain old C style pointer arithmetics. But granted, the
code would have looked much better with memoryviews and multidimensional
arrays :(

Sturla

Jake Vanderplas

unread,
Jul 27, 2012, 6:38:01 PM7/27/12
to cython...@googlegroups.com
Hello,


On Fri, Jul 27, 2012 at 3:12 PM, Sturla Molden <sturla...@yahoo.no> wrote:

I recently suggested using typed memoryviews in SciPy (scipy.spatial.cKDTree to begin with), but it got clubbed down for not being compatible with Python 2.4... We finally settled on using PyArray_DATA and plain old C style pointer arithmetics. But granted, the code would have looked much better with memoryviews and multidimensional arrays :(

Sturla

I've been playing around with some similar code in scikit-learn which needs to pass around views of arrays (the ball tree).  Trying out some quick benchmarks, it looks like memoryviews are pretty comparable to pointer arithmetic for individual memory access, but lead to some significant overhead when passing around slices as you'd need to in ckdtree.  For my application, I've settled on simply using pointers for the sake of speed.

I prepared some quick-and-dirty benchmarks of the behavior I need at https://github.com/jakevdp/memview_benchmarks/ -- I'd be interested if people more familiar with memory-views could take a look and let me know if I'm missing anything there.
   Jake

Sturla Molden

unread,
Jul 27, 2012, 7:34:20 PM7/27/12
to cython...@googlegroups.com
On Fri, 27 Jul 2012 15:38:01 -0700, Jake Vanderplas <jak...@gmail.com>
wrote:

> Ive been playing around with some similar code in scikit-learn which
> needs to pass around views of arrays (the ball tree).  Trying out
> some quick benchmarks, it looks like memoryviews are pretty
> comparable
> to pointer arithmetic for individual memory access, but lead to some
> significant overhead when passing around slices as youd need to in
> ckdtree.  For my application, Ive settled on simply using pointers
> for the sake of speed.
>
> I prepared some quick-and-dirty benchmarks of the behavior I need at
> https://github.com/jakevdp/memview_benchmarks/ [2] -- Id be
> interested
> if people more familiar with memory-views could take a look and let
> me
> know if Im missing anything there.

Try to turn off bounds checking and wraparound for slice_func and see
if this changes the timings.

cimport cython

@cython.wraparound(False)
@cython.boundscheck(False)
cdef inline void slice_func(...):
     whatever


Sturla





Sturla Molden

unread,
Jul 27, 2012, 7:44:03 PM7/27/12
to cython...@googlegroups.com
On Fri, 27 Jul 2012 15:38:01 -0700, Jake Vanderplas <jak...@gmail.com>
wrote:


quick-and-dirty benchmarks of the behavior I need at
> https://github.com/jakevdp/memview_benchmarks/ [2] -- Id be
> interested
> if people more familiar with memory-views could take a look and let
> me
> know if Im missing anything there.
>    Jake

You also forgot to declare M no.int_p in compute_distances, which makes
it a Python object.

You might also want to turn on some compiler optimization in setup.py,
e.g. -O2, as Cython often generates C code that needs the C compiler to
optimize in order to be efficient.



Sturla

mark florisson

unread,
Jul 28, 2012, 8:06:53 AM7/28/12
to cython...@googlegroups.com
Indeed. You probably will also get much better performance by not
slicing, i.e. pass in the slices as 2D arrays and pass in the i and j
indices for the first dimension into the function, and perform full
indexing (an index for each dimension).

mark florisson

unread,
Jul 28, 2012, 8:08:39 AM7/28/12
to cython...@googlegroups.com
(Basically slicing often involves function calls as well as
introducing a temporary memoryview slice, which is cleaned up for
every iteration (which happens in a thread-safe manner, using atomics
or locks)).

Sturla Molden

unread,
Jul 28, 2012, 10:40:38 AM7/28/12
to cython...@googlegroups.com

> I prepared some quick-and-dirty benchmarks of the behavior I need at
> https://github.com/jakevdp/memview_benchmarks/ -- I'd be interested if
> people more familiar with memory-views could take a look and let me
> know if I'm missing anything there.
> Jake


I took the liberty to update your banchmarks (see attachment). For
example I noticed that GCC was clever enough to optimize out all the
loops in your pointer_arith.pyx...

Here are the timings I got from the updated version in the attachment. I
think this gives the correct picture:

D:\memview-benchmarks\new>python runme.py
numpy_only: 6.86 sec
cythonized_numpy: 5.74 sec
cythonized_numpy_2: 10.4 sec
cythonized_numpy_2b: 6.25 sec
cythonized_numpy_3: 2.43 sec
cythonized_numpy_4: 1.78 sec
pointer_arith: 1.79 sec
memview: 1.86 sec

There is a table in the attached PDF that should be easier to read.

The overhead from the numpy versions comes from slicing the ndarray. In
comparison, slicing the memoryview has a very small overhead. If we
slice the ndarray in Cython, this is not much better than just using
plain numpy in Python. But if we use memoryviews, slicing is just a
little bit slower than using C style pointer arithmetics.

And consider this: Numerical code using array slicing in Fortran90 with
gfortran is often 2x slower than the same code using pointer arithmetics
in C with GCC. At least in my experience (Fortran 77 is another matter.)

If you wonder why using np.dot was faster than writing out the loop in
Cython, that is due to Intel MKL in Enthought.

Conclusion:

Memoryviews are extremely fast, comparable to pointer arithmetics in C.

Now we need a real benchmark, e.g. some linear algebra solver or an FFT.
Something like Scimark perhaps. Cython vs. C vs. Fortran 90.

Sturla



















memview-benchmarks.zip
benchmark.pdf

Sturla Molden

unread,
Jul 28, 2012, 11:18:57 AM7/28/12
to cython...@googlegroups.com
I found another issue, the memoryview slices were not declared
contiguous. This reduced the runtime from 1.86 to 1.83 seconds. That
puts the overhead from using memoryview slices to 2.2% compared to raw C
pointer arithmetics. The benchmark creates two million memoryview slices
and computes one million dot products, each with vector lengths of 1000.
I am more than willing to accept those 2.2 % to avoid those pesky
pointers, but it remains to be seen how memoryviews perform on a more
realistic problem.

Sturla
memview.pyx

Jake Vanderplas

unread,
Jul 28, 2012, 11:39:21 AM7/28/12
to cython...@googlegroups.com
Sturla,
Thanks for looking at this.  I'm still learning the details of optimizing memviews - these are very impressive benchmarks!  I've updated my github repository with your changes:
https://github.com/jakevdp/memview_benchmarks
Thanks
   Jake

Sturla Molden

unread,
Jul 28, 2012, 11:57:26 AM7/28/12
to cython...@googlegroups.com
I finally managed to make git/github work...
https://github.com/sturlamolden/memview_benchmarks

You got a pull request. I'm not sure if you already updated your code.

I'm very happy with the speed of memoryviews too, particularly slicing. The slowness of slicing np.ndarray was the reason I never could use Cython+NumPy instead of Fortran 95.

I now want to see a more realistic benchmark. I'm not sure if porting Scimark will be too much work. I want preferably to compare these on a set of real-world problems:

Python
Python with NumPy
C
C++ using STL
Fortran 77
Fortran 95 
Cython with memoryviews
Java (perhaps)
C#.NET (perhaps)
MATLAB (perhaps)

Or perhaps we could use the Debian shootout?


Sturla

Bradley Froehle

unread,
Jul 30, 2012, 11:33:27 AM7/30/12
to cython...@googlegroups.com
Can somebody clarify what is meant by the semantics not matching?  I apologize if this was covered earlier in the thread... I must have missed it.

Also, in Numpy 1.6, arr.data and PyArray_DATA are equivalent (up to a cast to the data type):
    #define PyArray_DATA(obj) ((void *)(((PyArrayObject *)(obj))->data))

I think the best approach is for Cython to just translate arr.data into PyArray_DATA(arr) in the generated C code.
 
It seems like there are shortcomings to all the major approaches:
  • arr.data
    • good: intent is clear (but the semantics don't actually match)
    • bad: will break at some point, doesn't handle offsets properly
  • &arr[0]
    • good: intent is mostly clear, follows a common C++ idiom
    • bad: requires completely disabling bounds checking for the entire function to handle 0-length arrays
  • PyArray_DATA(arr)
    • good: has proper semantics
    • bad: verbose and unpythonic
Would it make sense to enhance numpy.pxd to provide a new property that acts like PyArray_DATA(arr) but looks pythonic (we might call it zeroth_elem, first_elem, or something like that)?

Gerald Dalley

unread,
Jul 30, 2012, 11:53:05 AM7/30/12
to cython...@googlegroups.com
On Mon, Jul 30, 2012 at 11:33 AM, Bradley Froehle <brad.f...@gmail.com> wrote:
Can somebody clarify what is meant by the semantics not matching?  I apologize if this was covered earlier in the thread... I must have missed it.

By mismatched semantics, I just meant that apparently arr.data doesn't always refer to the first element of the array.  Sometimes arr.data != &arr[0].
 
Also, in Numpy 1.6, arr.data and PyArray_DATA are equivalent (up to a cast to the data type):
    #define PyArray_DATA(obj) ((void *)(((PyArrayObject *)(obj))->data))

I think the best approach is for Cython to just translate arr.data into PyArray_DATA(arr) in the generated C code.

Agreed.

--
--Gerald Dalley
  dal...@ieee.org

Sturla Molden

unread,
Jul 30, 2012, 1:16:10 PM7/30/12
to cython...@googlegroups.com
On 30.07.2012 17:33, Bradley Froehle wrote:

> Also, in Numpy 1.6, arr.data and PyArray_DATA are equivalent (up to a
> cast to the data type):
> #define PyArray_DATA(obj) ((void *)(((PyArrayObject *)(obj))->data))

That is due to change.


> I think the best approach is for Cython to just translate arr.data into
> PyArray_DATA(arr) in the generated C code.
>
> It seems like there are shortcomings to all the major approaches:
>
> * arr.data
> o good: intent is clear (but the semantics don't actually match)
> o bad: will break at some point, doesn't handle offsets properly
> * &arr[0]
> o good: intent is mostly clear, follows a common C++ idiom
> o bad: requires completely disabling bounds checking for the
> entire function to handle 0-length arrays

No, it does not require bounds checking to be turned off. But it will do
a bounds check if you don't.



> * PyArray_DATA(arr)
> o good: has proper semantics
> o bad: verbose and unpythonic
>
> Would it make sense to enhance numpy.pxd to provide a new property
> that acts like PyArray_DATA(arr) but looks pythonic (we might call
> it zeroth_elem, first_elem, or something like that)?


In my opinion we should use memoryviews instead, unless compatibility
with Python 2.4 is absolutely required. We are discussing how to use an
older, half-broken (useless for anything but trivial code), and soon
deprecated interface to solve a problem that memoryviews do right.

http://docs.cython.org/src/userguide/memoryviews.html

The most important differences:

o Memoryviews are much faster than np.ndarray and expose all the PEP
3118 attributes. Particularly slicing and using them as function
arguments are faster (like Fortran arrays). You will notice the
difference if your code has array slicing or function calls in it.
np.ndarray is only fast and useful if you can inline everything into one
big function, and you never generate an array slice. So for everything
except trivial use-cases, typed memoryviews is what we should use.

o Cython arrays (cython.view.array) have an attribute .data which will
do what you want. They behave like NumPy arrays except they are much
faster.

o Memoryviews have a shorter syntax.

o NumPy arrays can be converted to memoryviews and Cython arrays
automatically.

o This will take build and runtime dependency on NumPy away. And yet we
can use as much of NumPy's Python API as we want.

o np.ndarray might even be deprecated when Cython reach 1.0, so using it
should be discouraged. I'd even go so far as to suggest all references
to suggest all instructions on using np.ndarray removed from the Wiki
and replaced with memoryviews.

o Memoryviews works with all PEP 3118 compliant buffers, including NumPy.


Sturla










Sturla Molden

unread,
Jul 30, 2012, 1:22:13 PM7/30/12
to cython...@googlegroups.com
On 30.07.2012 19:16, Sturla Molden wrote:

>> o good: intent is mostly clear, follows a common C++ idiom
>> o bad: requires completely disabling bounds checking for the
>> entire function to handle 0-length arrays
>
> No, it does not require bounds checking to be turned off. But it will do
> a bounds check if you don't.

Sorry, let me rephrase this:

Bounds checking will prevent you from passing a dangling data pointer to
C if you have a 0-length array. I.e. you get an informative Python
exception (IndexError) with a traceback instead of a segfault -- which
can be nice I guess.

Sturla

Sturla Molden

unread,
Jul 30, 2012, 4:37:12 PM7/30/12
to cython...@googlegroups.com
Here is an example of using memoryviews:

https://github.com/jakevdp/memview_benchmarks/blob/master/memview.pyx

The extra overhead here is just 2.2% compared to plain C pointer
arithmetics (using -O2) when a 1000 x 1000 array is passed to "compute
distances".

As we discussed in the same thread, doing this with np.ndarray is just
too slow:

https://github.com/jakevdp/memview_benchmarks/blob/master/cythonized_numpy_2b.pyx

Anything but trivial functions will need slicing and function calls. You
can see here how memoryviews allows us to code in "Fortran style" with
arrays instead of using C style pointer arithmetics.

Pointer arithmetics is still not too bad in this simple case, but it can
be much more complicated (but why bother?):

https://github.com/jakevdp/memview_benchmarks/blob/master/pointer_arith.pyx

So I suggest one of these idioms for calling C from Cython:

cdef double[:,::1] array = numpy_array
cdef Py_ssize_t m, n
m = array.shape[0]
n = array.shape[1]
foobar(m, n, &array[0,0])

cdef view.array array = <double[:,::1]> numpy_array
cdef Py_ssize_t m, n
m = array.shape[0]
n = array.shape[1]
foobar(m, n, <double*> array.data)


Sturla

Chris Barker

unread,
Aug 6, 2012, 1:28:10 PM8/6/12
to cython...@googlegroups.com
On Mon, Jul 30, 2012 at 1:37 PM, Sturla Molden <sturla...@yahoo.no> wrote:

> So I suggest one of these idioms for calling C from Cython:

> cdef double[:,::1] array = numpy_array
> cdef Py_ssize_t m, n
> m = array.shape[0]
> n = array.shape[1]
> foobar(m, n, &array[0,0])
>
> cdef view.array array = <double[:,::1]> numpy_array
> cdef Py_ssize_t m, n
> m = array.shape[0]
> n = array.shape[1]
> foobar(m, n, <double*> array.data)

The only difference is that first line: will they generate the same C
code? -- why would one choose on or the other?

(there should be one obvious way to do it...)

I'll probably go update the wiki page, but to make sure I'm clear:

Using a view.array will build a Cython "memoryview", using the
extended buffer interface -- so you can pass in any Python object that
supports PEP 3118 buffers (does anything other than numpy suport that
now??)

This is a lightweight way to both get a pointer to the data, and to
have a way, in Cython, to slice and dice the data.

Will this check for C-contiguous and raise an error if not?

-Chris

















>
>
> Sturla

mark florisson

unread,
Aug 6, 2012, 5:12:01 PM8/6/12
to cython...@googlegroups.com
On 6 August 2012 18:28, Chris Barker <chris....@noaa.gov> wrote:
> On Mon, Jul 30, 2012 at 1:37 PM, Sturla Molden <sturla...@yahoo.no> wrote:
>
>> So I suggest one of these idioms for calling C from Cython:
>
>> cdef double[:,::1] array = numpy_array
>> cdef Py_ssize_t m, n
>> m = array.shape[0]
>> n = array.shape[1]
>> foobar(m, n, &array[0,0])
>>
>> cdef view.array array = <double[:,::1]> numpy_array
>> cdef Py_ssize_t m, n
>> m = array.shape[0]
>> n = array.shape[1]
>> foobar(m, n, <double*> array.data)
>
> The only difference is that first line: will they generate the same C
> code? -- why would one choose on or the other?
>
> (there should be one obvious way to do it...)
>
> I'll probably go update the wiki page, but to make sure I'm clear:
>
> Using a view.array will build a Cython "memoryview", using the
> extended buffer interface -- so you can pass in any Python object that
> supports PEP 3118 buffers (does anything other than numpy suport that
> now??)

Not entirely. A cython.view.array is an array itself, that has memory
itself. In other words, it's a (lousy) replacement for numpy. It's
useful for users when they have memory allocated from somewhere and
want to turn it into a numpy array or memoryview. The array is more of
an intermediate thing, usually created implicitly by casting a
pointer: <double[:10:1, :10]> p creates a Fortran contiguous cython
array view on the memory of shape (10, 10).

> This is a lightweight way to both get a pointer to the data, and to
> have a way, in Cython, to slice and dice the data.
>
> Will this check for C-contiguous and raise an error if not?

Yes, buffers are always validated when they are assigned to by objects
at runtime. Assigning memoryviews to each other with wrong types is an
error, e.g.

cdef double[:, ::1] array = numpy_array # runtime check
cdef double[:, ::1] array2 = array # compile-time check

Sturla Molden

unread,
Aug 6, 2012, 8:03:21 PM8/6/12
to cython...@googlegroups.com
Den 06.08.2012 19:28, skrev Chris Barker:
>
>> cdef double[:,::1] array = numpy_array
>> cdef Py_ssize_t m, n
>> m = array.shape[0]
>> n = array.shape[1]
>> foobar(m, n, &array[0,0])
>>
>> cdef view.array array = <double[:,::1]> numpy_array
>> cdef Py_ssize_t m, n
>> m = array.shape[0]
>> n = array.shape[1]
>> foobar(m, n, <double*> array.data)
>
> The only difference is that first line: will they generate the same C
> code? -- why would one choose on or the other?

The last line is also different ;-)

The first creates a typed memoryview of the NumPy array. That is, using
the NumPy array as an PEP 3118 buffer directly from Cython. No auxillary
Python object is created.

The second created an auxillary Cython array that shares memory with the
NumPy array. The Cython array is a Python object with a PEP 3118 buffer.
Unlike np.ndarray, it has a valid .data attribute that will not break.

If you dislike &array[0,0] syntax, pick the latter. If you don't want an
auxillary Python object, pick the former.


>
>
> (there should be one obvious way to do it...)
>
> I'll probably go update the wiki page, but to make sure I'm clear:
>
> Using a view.array will build a Cython "memoryview", using the
> extended buffer interface -- so you can pass in any Python object that
> supports PEP 3118 buffers (does anything other than numpy suport that
> now??)

Typed memoryviews will give you convinient and fast data access and fast
array slicing in Cython. cython.view.array can e.g. be used if you need
to send a C pointer as a Python pep 3118 object to some Python routine.

Thus they are not exactly the same, but "almost".

Typical use cases:

double[:,:] or double[:,::1]:
- You have a C pointer, you want fast "NumPy style" indexing and slicing.
- You have a NumPy array, you want fast indexing and fast slicing.
- You are considering to use Fortran because of its array syntax.

double[::1,:]:
- You need to interface with Fortran code.
- You are porting old Fortran code line-by-line.
- You are working with OpenGL.

cython.view.array:
- You have a C pointer and need a Python PEP 3118 object.
- You want a safe replacement for malloc (one that is faster than NumPy).
- You are porting old NumPy/Cython code that uses the .data attribute everywhere.


Also beware of this:

Common contra-indications for using double[::1,:] orcython.view.array:
- You need to interface with BLAS. Use cblas instead.
- You need to interface with LAPACK. Use LAPACKE instead.
- You need to allocate WORK arrays for LAPACK. Use LAPACKE instead.



Sturla

Chris Barker

unread,
Aug 7, 2012, 3:01:55 PM8/7/12
to cython...@googlegroups.com
On Mon, Aug 6, 2012 at 5:03 PM, Sturla Molden

>>> cdef double[:,::1] array = numpy_array
>>> cdef Py_ssize_t m, n
>>> m = array.shape[0]
>>> n = array.shape[1]
>>> foobar(m, n, &array[0,0])
>>>
>>> cdef view.array array = <double[:,::1]> numpy_array
>>> cdef Py_ssize_t m, n
>>> m = array.shape[0]
>>> n = array.shape[1]
>>> foobar(m, n, <double*> array.data)

> The first creates a typed memoryview of the NumPy array. That is, using the
> NumPy array as an PEP 3118 buffer directly from Cython. No auxillary Python
> object is created.

OK -- the original start of this post was how best to essentially pass
though a numpy data buffer to C/C++ code (or Fortran, I suppose).

So it looks like:

cdef double[:,::1] array = numpy_array

is very lightweight, so may be a good way to go, and would support
other PEP 3118 buffer objects, so seems a good idea.

Should I change the recommendation in the Wiki?

http://wiki.cython.org/tutorials/NumpyPointerToC

-Chris

Prashant

unread,
Aug 11, 2012, 4:12:08 AM8/11/12
to cython...@googlegroups.com
Hi,

Python 2.6.6
GCC 3.4.5
Cython 0.15.1
XP 32

http://wiki.cython.org/tutorials/NumpyPointerToC

I tried building the extension using code but it seems multiply.pyx is having some missing code.

@cython.boundscheck(False)
^
multiply.pyx:18:0: undeclared name not builtin: cython

This is because you are not importing cython. After adding "cimport cython", next error is in generating "multiply.c" file.

multiply.o:multiply.c:(.text+0x5c3): undefined reference to `_imp__c_multiply'

Any pointers? I am not much into cython lust started digging.

Regards

Prashant


Chris Barker

unread,
Aug 20, 2012, 7:02:09 PM8/20/12
to cython...@googlegroups.com
On Sat, Aug 11, 2012 at 1:12 AM, Prashant <anima...@gmail.com> wrote:

> Python 2.6.6
> GCC 3.4.5
> Cython 0.15.1

I'm running Cython 0.16 -- that may be the issue

> XP 32

I haven't tested this code with gcc on Windows -- are you successfully
compiling other stuff?


> http://wiki.cython.org/tutorials/NumpyPointerToC

> multiply.pyx:18:0: undeclared name not builtin: cython
>
> This is because you are not importing cython. After adding "cimport cython",

sorry about that -- I"ve updated the wiki -- I thought I cut&pasted
from a working version...

> next error is in generating "multiply.c" file.
>
> multiply.o:multiply.c:(.text+0x5c3): undefined reference to
> `_imp__c_multiply'
>
> Any pointers? I am not much into cython lust started digging.

try the latest Cython -- then report back.

-Chris

Prashant

unread,
Aug 25, 2012, 1:26:56 AM8/25/12
to cython...@googlegroups.com
This is with cython 0.16

running build_ext
cythoning multiply.pyx to multiply.c
building 'multiply' extension
D:\MinGW\bin\gcc.exe -mno-cygwin -mdll -O -Wall -IC:\Python26\lib\site-packages\numpy\core\include -IC:\Python26\include -IC:\Python26\PC -c multiply.c -o build\temp.win32-2.6\Release\multiply.o
multiply.c: In function `__Pyx_GetBuffer':
multiply.c:4353: warning: unused variable `getbuffer_cobj'
multiply.c: In function `__Pyx_ReleaseBuffer':
multiply.c:4393: warning: unused variable `releasebuffer_cobj'
multiply.c: At top level:
C:/Python26/lib/site-packages/numpy/core/include/numpy/__multiarray_api.h:1533:
warning: '_import_array' defined but not used
C:/Python26/lib/site-packages/numpy/core/include/numpy/__ufunc_api.h:227: warning: '_import_umath' defined but not used
D:\MinGW\bin\gcc.exe -mno-cygwin -mdll -O -Wall -IC:\Python26\lib\site-packages\numpy\core\include -IC:\Python26\include -IC:\Python26\PC -c c_multiply.c -o build\temp.win32-2.6\Release\c_multiply.o
c_multiply.c:22:2: warning: no newline at end of file
writing build\temp.win32-2.6\Release\multiply.def
D:\MinGW\bin\gcc.exe -mno-cygwin -shared -s build\temp.win32-2.6\Release\multiply.o build\temp.win32-2.6\Release\c_multiply.o build\temp.win32-2.6\Release\multiply.def -LC:\Python26\libs -LC:\Python26\PCbuild -lpython26 -lmsvcr90 -o D:\Projects\cython\multiply.pyd
build\temp.win32-2.6\Release\multiply.o:multiply.c:(.text+0x6ff): undefined reference to `_imp__c_multiply'
collect2: ld returned 1 exit status
error: command 'gcc' failed with exit status 1

Chris Barker

unread,
Aug 27, 2012, 3:54:59 PM8/27/12
to cython...@googlegroups.com
Something weird is going on.

I've enclosed my copy of the code, sertup.py and test file, so we at
least make sure we're working with the same code.

Below is what I get on my Mac -- but it works OK for me on Windows, too:

Python 2.7 Cython 0.16, MSVC 2008 compiler

It looks like you're using MinGW, but I"d expect that to work -- what
else is different?

Python 3 maybe?

-Chris

## My results:

$ python setup.py build_ext --inplace
running build_ext
cythoning multiply.pyx to multiply.c
building 'multiply' extension
creating build
creating build/temp.macosx-10.3-fat-2.7
gcc-4.0 -fno-strict-aliasing -fno-common -dynamic -isysroot
/Developer/SDKs/MacOSX10.4u.sdk -arch ppc -arch i386 -g -O2 -DNDEBUG
-g -O3 -I/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/include
-I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7
-c multiply.c -o build/temp.macosx-10.3-fat-2.7/multiply.o
gcc-4.0 -fno-strict-aliasing -fno-common -dynamic -isysroot
/Developer/SDKs/MacOSX10.4u.sdk -arch ppc -arch i386 -g -O2 -DNDEBUG
-g -O3 -I/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/include
-I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7
-c c_multiply.c -o build/temp.macosx-10.3-fat-2.7/c_multiply.o
gcc-4.0 -bundle -undefined dynamic_lookup -arch ppc -arch i386
-isysroot /Developer/SDKs/MacOSX10.4u.sdk -isysroot
/Developer/SDKs/MacOSX10.4u.sdk -g
build/temp.macosx-10.3-fat-2.7/multiply.o
build/temp.macosx-10.3-fat-2.7/c_multiply.o -o
/Users/chris.barker/HAZMAT/GNOME-dev/GNOME-GIT/experiments/Cython-numpy_test/multiply.so
orrw-lm-1689049:Cython-numpy_test chris.barker$ py.test test_multiply.py
========================================================== test
session starts ===========================================================
platform darwin -- Python 2.7.3 -- pytest-2.2.3
collected 4 items

test_multiply.py ....

======================================================== 4 passed in
0.16 seconds ========================================================
orrw-lm-1689049:Cython-numpy_test chris.barker$
Cython-numpy_test.zip
Reply all
Reply to author
Forward
0 new messages