Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Best Practices for passing numpy data pointer to C ?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 26 - 50 of 59 - Collapse all  -  Translate all to Translated (View all originals) < Older  Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Robert Bradshaw  
View profile  
 More options Jul 26 2012, 2:28 pm
From: Robert Bradshaw <rober...@gmail.com>
Date: Thu, 26 Jul 2012 11:28:20 -0700
Local: Thurs, Jul 26 2012 2:28 pm
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?

I didn't mean to imply it's incorrect (it isn't). It's just
unfortunate for this hack.

Yep. What we really want is a pointer to such an array also assuring
that the data is strided and striped as expected.

- Robert


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sturla Molden  
View profile  
 More options Jul 26 2012, 2:59 pm
From: Sturla Molden <sturlamol...@yahoo.no>
Date: Thu, 26 Jul 2012 20:59:40 +0200
Local: Thurs, Jul 26 2012 2:59 pm
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?
Den 26.07.2012 20:12, skrev Chris Barker:

> Which is why I suppose we really should have a canonical way to get
> that pointer -- thus arr.data, but it has its problems, as well.

That is what

<dtype_t*> np.PyArray_DATA(arr)

does.

NEVER use "arr.data", as it depends on the layout of PyArrayObject,
which is due to change.

Always use the macros in the NumPy C API to access the fields in
np.ndarray directly:

np.PyArray_DATA
np.PyArray_SHAPE
np.PyArray_NDIM
np.PyArray_STRIDES

(which is also what Cython does behind the scenes.)

Sturla


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chris Barker  
View profile  
 More options Jul 26 2012, 3:07 pm
From: Chris Barker <chris.bar...@noaa.gov>
Date: Thu, 26 Jul 2012 12:07:41 -0700
Local: Thurs, Jul 26 2012 3:07 pm
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?

On Thu, Jul 26, 2012 at 11:59 AM, Sturla Molden <sturlamol...@yahoo.no> wrote:
> Den 26.07.2012 20:12, skrev Chris Barker:

>> Which is why I suppose we really should have a canonical way to get
>> that pointer -- thus arr.data, but it has its problems, as well.

> That is what

> <dtype_t*> np.PyArray_DATA(arr)

> does.

But does that give you the address of the zeroth element? or the
address of the beginning of the data block? -- which I understand may
not be the case, say for an array that is a slice of another array.

i.e should I update that Wiki page with

<dtype_t*> np.PyArray_DATA(arr)

instead of

&arr[0]

-Chris

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Bar...@noaa.gov


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
mark florisson  
View profile  
 More options Jul 26 2012, 3:56 pm
From: mark florisson <markflorisso...@gmail.com>
Date: Thu, 26 Jul 2012 20:56:58 +0100
Local: Thurs, Jul 26 2012 3:56 pm
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?
On 26 July 2012 20:07, Chris Barker <chris.bar...@noaa.gov> wrote:

Those will always be the same. When you slice an array that changes
the starting element in some dimension, the data pointer is moved for
the new view.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sturla Molden  
View profile  
 More options Jul 26 2012, 7:37 pm
From: Sturla Molden <sturlamol...@yahoo.no>
Date: Fri, 27 Jul 2012 01:37:44 +0200
Local: Thurs, Jul 26 2012 7:37 pm
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?
Den 26.07.2012 21:07, skrev Chris Barker:

> That is what

> <dtype_t*>  np.PyArray_DATA(arr)

> does.

> But does that give you the address of the zeroth element?

Yes.

> or the
> address of the beginning of the data block?

Not always.

> i.e should I update that Wiki page with

> <dtype_t*>  np.PyArray_DATA(arr)

> instead of

> &arr[0]

No.

arr[i,j,k] in Cython is this in C:

*((dtype_t*)(PyArray_DATA(arr)
             + i*PyArray_STRIDES(arr)[0]
             + j*PyArray_STRIDES(arr)[1]
             + k*PyArray_STRIDES(arr)[2]))

Setting i,j,k to zero you can see thatPyArray_DATA(arr)
is the address of the zeroth element in array arr.

Sturla


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sturla Molden  
View profile  
 More options Jul 26 2012, 7:49 pm
From: Sturla Molden <sturlamol...@yahoo.no>
Date: Fri, 27 Jul 2012 01:49:31 +0200
Local: Thurs, Jul 26 2012 7:49 pm
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?
Den 27.07.2012 01:37, skrev Sturla Molden:

> Den 26.07.2012 21:07, skrev Chris Barker:

>> That is what

>> <dtype_t*>  np.PyArray_DATA(arr)

>> does.

>> But does that give you the address of the zeroth element?

> Yes.

Strictly speaking it gives the address of the "zeroth byte of the zeroth
element". I.e. it gives you a char*, and the strides along each axis are
in number of characters, not number of elements. That is why we do
pointer arithmetics on strides first, and then cast to a pointer of the
correct type. If we have certain knowledge about the strides of the
array, e.g. that it is contiguous in C order, we can avoid this
character fiddling and use pointer arithmetics directly on elements
instead. That is why specifying mode="c" and e.g. ndim=2 in the cdef of
the ndarray will speed things up considerably.

Sturla


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nader Al-Naji  
View profile  
 More options Jul 27 2012, 1:45 pm
From: Nader Al-Naji <iamnotna...@gmail.com>
Date: Fri, 27 Jul 2012 10:45:07 -0700 (PDT)
Local: Fri, Jul 27 2012 1:45 pm
Subject: Re: Best Practices for passing numpy data pointer to C ?

Great; thanks for the response! This is my first time posting here so it's
good to check back and see an involved discussion on the question I had--
namely, how to extract the buffer of a numpy array in Cython.

After reading through all the responses, it appears as though Sturla's
solution:

<dtype_t*>  np.PyArray_DATA(arr)  

is the right way to go. It is superior to the &a[0] solution because it
doesn't require bounds checking, and it is superior to the .data solution
because 1) it won't break as numpy changes and 2) BONUS: it works even on
arrays that haven't been statically casted to a Cython np.ndarray[...]. To
fully understand what I mean by (2), take a look at the following example:

    a = np.arange(10, dtype=int)

    # This works even if a hasn't been casted yet.
    fprintf(stderr, "%lu %lu\n", <size_t>np.PyArray_DATA(a))

    # These cause errors if the array is not casted because a[0] and a.data
return
    # python objects:
    #    cdef size_t broken_ptr1 = &a[0]
    #    cdef size_t broken_ptr2 = a.data

    cdef np.ndarray[long, ndim=1, mode="c"] x = a

    # After casting, everything works and is consistent.
    cdef size_t buf_ptr1 = <size_t> np.PyArray_DATA(x)
    cdef size_t buf_ptr2 = <size_t> &x[0]
    cdef size_t buf_ptr3 = <size_t> x.data

    fprintf(stderr, "%lu %lu %lu\n", buf_ptr1, buf_ptr2, buf_ptr3)

    # Output for an example run-- all addresses are the same, which is what
    # we want:
    #
    # 27354544
    # 27354544 27354544 27354544

Not having to cast the array before extracting the buffer is useful if one
knows the size of the elements one is dealing with, but not necessarily the
type. In particular, I think it's useful if one wants to manipulate numpy
arrays of strings, which Cython doesn't support-- though I'm not sure about
this.

So I think Sturla's solution is what we should use in the future. It has
the same semantics as .data, which I think a lot of people really like, and
it won't be compromised if numpy changes.

Anyone have any other reasons why Sturla's solution might not be a good
idea?

Thanks,
Nader


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chris Barker  
View profile  
 More options Jul 27 2012, 2:06 pm
From: Chris Barker <chris.bar...@noaa.gov>
Date: Fri, 27 Jul 2012 11:06:52 -0700
Local: Fri, Jul 27 2012 2:06 pm
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?

On Fri, Jul 27, 2012 at 10:45 AM, Nader Al-Naji <iamnotna...@gmail.com> wrote:
> <dtype_t*>  np.PyArray_DATA(arr)

> is the right way to go. It is superior to the &a[0] solution because it
> doesn't require bounds checking, and it is superior to the .data solution
> 2) BONUS: it works even on
> arrays that haven't been statically casted to a Cython np.ndarray[...]. To
> fully understand what I mean by (2), take a look at the following example:

>     a = np.arange(10, dtype=int)

>     # This works even if a hasn't been casted yet.
>     fprintf(stderr, "%lu %lu\n", <size_t>np.PyArray_DATA(a))

what happens if a is not a numpy array? If you create it in your
Cython code, you can be confident that it is, but if it's passed in,
who knows? and if you have to check, why not cast it?

> Not having to cast the array before extracting the buffer is useful if one
> knows the size of the elements one is dealing with, but not necessarily the
> type.

you can cast it to a ndarray with unspecified type -- I suspect arr[0]
will work there, too.

> So I think Sturla's solution is what we should use in the future. It has the
> same semantics as .data, which I think a lot of people really like, and it
> won't be compromised if numpy changes.

well, as Sturla points out, Cython essentially creates the same code
under the hood if you index the zeroth element anyway. My it's just
aesthetics, but I like the more pyhtonic looking: &arr[0]

-Chris

> Anyone have any other reasons why Sturla's solution might not be a good
> idea?

> Thanks,
> Nader

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Bar...@noaa.gov


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sturla Molden  
View profile  
 More options Jul 27 2012, 2:42 pm
From: Sturla Molden <sturlamol...@yahoo.no>
Date: Fri, 27 Jul 2012 20:42:50 +0200
Local: Fri, Jul 27 2012 2:42 pm
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?
Den 27.07.2012 20:06, skrev Chris Barker:

> 2) BONUS: it works even on
> arrays that haven't been statically casted to a Cython np.ndarray[...]. To
> fully understand what I mean by (2), take a look at the following example:

>      a = np.arange(10, dtype=int)

>      # This works even if a hasn't been casted yet.
>      fprintf(stderr, "%lu %lu\n",<size_t>np.PyArray_DATA(a))
> what happens if a is not a numpy array?

np.PyArray_DATA is a C macro that does not check anything.

> well, as Sturla points out, Cython essentially creates the same code
> under the hood if you index the zeroth element anyway. My it's just
> aesthetics, but I like the more pyhtonic looking: &arr[0] -Chris

That's what Cython does, yes. You can see the code in
Cython/Includes/numpy.pxd, i.e. the function __getbuffer__ in np.ndarray.

Two other benefits of &arr[0] are that the code will be easier to port
to typed memoryviews and that it uses the same coding style as C++ code
passing the buffer of a std::vector to C (or even Cython passing
libcpp.vector.vector to C.) In all cases, &arr[0] is what we use.
&arr[0] is also shorter to write than <dtype_t*>np.PyArray_DATA(arr). To
get rid of the bounds checking (which hardly matters here), use
@cython.boundscheck(False) and @cython.wraparound(False).  Using &arr[0]
will also provide better run-time error checking, such as ensuring the
buffer is contiguous if that is required by the C code.

I do not prefer <dtype_t*>np.PyArray_DATA(arr) over &arr[0], but this is
a matter of taste I guess. I just want to discourage
"<dtype_t*>arr.data", which is soon due to break.

Sturla


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dag Sverre Seljebotn  
View profile  
 More options Jul 27 2012, 3:35 pm
From: Dag Sverre Seljebotn <d.s.seljeb...@astro.uio.no>
Date: Fri, 27 Jul 2012 21:35:40 +0200
Local: Fri, Jul 27 2012 3:35 pm
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?

(This isn't of consequence to the original poster and sort of goes off topic, I just thought Sturla may be interested:)

Actually there's a half-baked patch to turn arr.data into PyArray_DATA for np.ndarray. The lgoic is that arr.shape is so much used, and we don't want that to break, so we'll throw in some hacks to change that to using a macro, and then we may as well throw in 'data' too.

I think this is all pending a decision about how closely the new memoryviews should emulate the current ndarray/buffer syntax (i.e. can you access the underlying object of a memoryview transparently), and we don't seem to have the energy to discuss that yet.

Dag

>Sturla

--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sturla Molden  
View profile  
 More options Jul 27 2012, 6:12 pm
From: Sturla Molden <sturlamol...@yahoo.no>
Date: Sat, 28 Jul 2012 00:12:32 +0200
Local: Fri, Jul 27 2012 6:12 pm
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?
Den 27.07.2012 21:35, skrev Dag Sverre Seljebotn:

> Actually there's a half-baked patch to turn arr.data into PyArray_DATA for np.ndarray. The lgoic is that arr.shape is so much used, and we don't want that to break, so we'll throw in some hacks to change that to using a macro, and then we may as well throw in 'data' too.

That might prevent some code from breaking when NumPy completes the
transition to PyArray_DATA...

> I think this is all pending a decision about how closely the new memoryviews should emulate the current ndarray/buffer syntax (i.e. can you access the underlying object of a memoryview transparently), and we don't seem to have the energy to discuss that yet.

I recently suggested using typed memoryviews in SciPy
(scipy.spatial.cKDTree to begin with), but it got clubbed down for not
being compatible with Python 2.4... We finally settled on using
PyArray_DATA and plain old C style pointer arithmetics. But granted, the
code would have looked much better with memoryviews and multidimensional
arrays :(

Sturla


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jake Vanderplas  
View profile  
 More options Jul 27 2012, 6:38 pm
From: Jake Vanderplas <jake...@gmail.com>
Date: Fri, 27 Jul 2012 15:38:01 -0700
Local: Fri, Jul 27 2012 6:38 pm
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?

Hello,

On Fri, Jul 27, 2012 at 3:12 PM, Sturla Molden <sturlamol...@yahoo.no>wrote:

> I recently suggested using typed memoryviews in SciPy
> (scipy.spatial.cKDTree to begin with), but it got clubbed down for not
> being compatible with Python 2.4... We finally settled on using
> PyArray_DATA and plain old C style pointer arithmetics. But granted, the
> code would have looked much better with memoryviews and multidimensional
> arrays :(

> Sturla

I've been playing around with some similar code in scikit-learn which needs
to pass around views of arrays (the ball tree).  Trying out some quick
benchmarks, it looks like memoryviews are pretty comparable to pointer
arithmetic for individual memory access, but lead to some significant
overhead when passing around slices as you'd need to in ckdtree.  For my
application, I've settled on simply using pointers for the sake of speed.

I prepared some quick-and-dirty benchmarks of the behavior I need at
https://github.com/jakevdp/memview_benchmarks/ -- I'd be interested if
people more familiar with memory-views could take a look and let me know if
I'm missing anything there.
   Jake


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sturla Molden  
View profile  
 More options Jul 27 2012, 7:34 pm
From: Sturla Molden <sturlamol...@yahoo.no>
Date: Sat, 28 Jul 2012 01:34:20 +0200
Local: Fri, Jul 27 2012 7:34 pm
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?
 On Fri, 27 Jul 2012 15:38:01 -0700, Jake Vanderplas <jake...@gmail.com>

 Try to turn off bounds checking and wraparound for slice_func and see
 if this changes the timings.

 cimport cython

 @cython.wraparound(False)
 @cython.boundscheck(False)
 cdef inline void slice_func(...):
      whatever

 Sturla


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sturla Molden  
View profile  
 More options Jul 27 2012, 7:44 pm
From: Sturla Molden <sturlamol...@yahoo.no>
Date: Sat, 28 Jul 2012 01:44:03 +0200
Local: Fri, Jul 27 2012 7:44 pm
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?
 On Fri, 27 Jul 2012 15:38:01 -0700, Jake Vanderplas <jake...@gmail.com>

 wrote:

  quick-and-dirty benchmarks of the behavior I need at

> https://github.com/jakevdp/memview_benchmarks/ [2] -- Id be
> interested
> if people more familiar with memory-views could take a look and let
> me
> know if Im missing anything there.
>     Jake

 You also forgot to declare M no.int_p in compute_distances, which makes
 it a Python object.

 You might also want to turn on some compiler optimization in setup.py,
 e.g. -O2, as Cython often generates C code that needs the C compiler to
 optimize in order to be efficient.

 Sturla


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
mark florisson  
View profile  
 More options Jul 28 2012, 8:06 am
From: mark florisson <markflorisso...@gmail.com>
Date: Sat, 28 Jul 2012 13:06:53 +0100
Local: Sat, Jul 28 2012 8:06 am
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?
On 28 July 2012 00:34, Sturla Molden <sturlamol...@yahoo.no> wrote:

Indeed. You probably will also get much better performance by not
slicing, i.e. pass in the slices as 2D arrays and pass in the i and j
indices for the first dimension into the function, and perform full
indexing (an index for each dimension).

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
mark florisson  
View profile  
 More options Jul 28 2012, 8:08 am
From: mark florisson <markflorisso...@gmail.com>
Date: Sat, 28 Jul 2012 13:08:39 +0100
Local: Sat, Jul 28 2012 8:08 am
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?
On 28 July 2012 13:06, mark florisson <markflorisso...@gmail.com> wrote:

(Basically slicing often involves function calls as well as
introducing a temporary memoryview slice, which is cleaned up for
every iteration (which happens in a thread-safe manner, using atomics
or locks)).

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sturla Molden  
View profile  
 More options Jul 28 2012, 10:40 am
From: Sturla Molden <sturlamol...@yahoo.no>
Date: Sat, 28 Jul 2012 16:40:38 +0200
Local: Sat, Jul 28 2012 10:40 am
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?

> I prepared some quick-and-dirty benchmarks of the behavior I need at
> https://github.com/jakevdp/memview_benchmarks/ -- I'd be interested if
> people more familiar with memory-views could take a look and let me
> know if I'm missing anything there.
>    Jake

I took the liberty to update your banchmarks (see attachment).  For
example I noticed that GCC was clever enough to optimize out all the
loops in your pointer_arith.pyx...

Here are the timings I got from the updated version in the attachment. I
think this gives the correct picture:

D:\memview-benchmarks\new>python runme.py
numpy_only: 6.86 sec
cythonized_numpy: 5.74 sec
cythonized_numpy_2: 10.4 sec
cythonized_numpy_2b: 6.25 sec
cythonized_numpy_3: 2.43 sec
cythonized_numpy_4: 1.78 sec
pointer_arith: 1.79 sec
memview: 1.86 sec

There is a table in the attached PDF that should be easier to read.

The overhead from the numpy versions comes from slicing the ndarray. In
comparison, slicing the memoryview has a very small overhead. If we
slice the ndarray in Cython, this is not much better than just using
plain numpy in Python. But if we use memoryviews, slicing is just a
little bit slower than using C style pointer arithmetics.

And consider this: Numerical code using array slicing in Fortran90 with
gfortran is often 2x slower than the same code using pointer arithmetics
in C with GCC. At least in my experience (Fortran 77 is another matter.)

If you wonder why using np.dot was faster than writing out the loop in
Cython, that is due to Intel MKL in Enthought.

Conclusion:

Memoryviews are extremely fast, comparable to pointer arithmetics in C.

Now we need a real benchmark, e.g. some linear algebra solver or an FFT.
Something like Scimark perhaps. Cython vs. C vs. Fortran 90.

Sturla

  memview-benchmarks.zip
5K Download

  benchmark.pdf
232K Download

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sturla Molden  
View profile  
 More options Jul 28 2012, 11:18 am
From: Sturla Molden <sturlamol...@yahoo.no>
Date: Sat, 28 Jul 2012 17:18:57 +0200
Local: Sat, Jul 28 2012 11:18 am
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?

I found another issue, the memoryview slices were not declared
contiguous. This reduced the runtime from 1.86 to 1.83 seconds. That
puts the overhead from using memoryview slices to 2.2% compared to raw C
pointer arithmetics. The benchmark creates two million memoryview slices
and computes one million dot products, each with vector lengths of 1000.
I am more than willing to accept those 2.2 % to avoid those pesky
pointers, but it remains to be seen how memoryviews perform on a more
realistic problem.

Sturla

Den 28.07.2012 16:40, skrev Sturla Molden:

  memview.pyx
< 1K Download

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jake Vanderplas  
View profile  
 More options Jul 28 2012, 11:39 am
From: Jake Vanderplas <jake...@gmail.com>
Date: Sat, 28 Jul 2012 08:39:21 -0700
Local: Sat, Jul 28 2012 11:39 am
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?

Sturla,
Thanks for looking at this.  I'm still learning the details of optimizing
memviews - these are very impressive benchmarks!  I've updated my github
repository with your changes:
https://github.com/jakevdp/memview_benchmarks
Thanks
   Jake

On Sat, Jul 28, 2012 at 8:18 AM, Sturla Molden <sturlamol...@yahoo.no>wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sturla Molden  
View profile  
 More options Jul 28 2012, 11:57 am
From: Sturla Molden <sturlamol...@yahoo.no>
Date: Sat, 28 Jul 2012 17:57:26 +0200
Local: Sat, Jul 28 2012 11:57 am
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?

I finally managed to make git/github work...
https://github.com/sturlamolden/memview_benchmarks

You got a pull request. I'm not sure if you already updated your code.

I'm very happy with the speed of memoryviews too, particularly slicing.
The slowness of slicing np.ndarray was the reason I never could use
Cython+NumPy instead of Fortran 95.

I now want to see a more realistic benchmark. I'm not sure if porting
Scimark will be too much work. I want preferably to compare these on a
set of real-world problems:

Python
Python with NumPy
C
C++ using STL
Fortran 77
Fortran 95
Cython with memoryviews
Java (perhaps)
C#.NET (perhaps)
MATLAB (perhaps)

Or perhaps we could use the Debian shootout?

Sturla

Den 28.07.2012 17:39, skrev Jake Vanderplas:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gerald Dalley  
View profile  
 More options Jul 30 2012, 10:24 am
From: Gerald Dalley <gerald.dal...@gmail.com>
Date: Mon, 30 Jul 2012 07:24:18 -0700 (PDT)
Local: Mon, Jul 30 2012 10:24 am
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?

It seems like there are shortcomings to all the major approaches:

   - arr.data
      - good: intent is clear (but the semantics don't actually match)
      - bad: will break at some point, doesn't handle offsets properly
   - &arr[0]
      - good: intent is mostly clear, follows a common C++ idiom
      - bad: requires completely disabling bounds checking for the entire
      function to handle 0-length arrays
   - PyArray_DATA(arr)
      - good: has proper semantics
      - bad: verbose and unpythonic

Would it make sense to enhance numpy.pxd to provide a new property that
acts like PyArray_DATA(arr) but looks pythonic (we might call it
zeroth_elem, first_elem, or something like that)?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bradley Froehle  
View profile  
 More options Jul 30 2012, 11:33 am
From: Bradley Froehle <brad.froe...@gmail.com>
Date: Mon, 30 Jul 2012 08:33:27 -0700 (PDT)
Local: Mon, Jul 30 2012 11:33 am
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?

Can somebody clarify what is meant by the semantics not matching?  I
apologize if this was covered earlier in the thread... I must have missed
it.

Also, in Numpy 1.6, arr.data and PyArray_DATA are equivalent (up to a cast
to the data type):
    #define PyArray_DATA(obj) ((void *)(((PyArrayObject *)(obj))->data))

I think the best approach is for Cython to just translate arr.data into
PyArray_DATA(arr) in the generated C code.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gerald Dalley  
View profile  
 More options Jul 30 2012, 11:53 am
From: Gerald Dalley <gerald.dal...@gmail.com>
Date: Mon, 30 Jul 2012 11:53:05 -0400
Local: Mon, Jul 30 2012 11:53 am
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?

On Mon, Jul 30, 2012 at 11:33 AM, Bradley Froehle <brad.froe...@gmail.com>wrote:

> Can somebody clarify what is meant by the semantics not matching?  I
> apologize if this was covered earlier in the thread... I must have missed
> it.

By mismatched semantics, I just meant that apparently arr.data doesn't
always refer to the first element of the array.  Sometimes arr.data !=
&arr[0].

> Also, in Numpy 1.6, arr.data and PyArray_DATA are equivalent (up to a cast
> to the data type):
>     #define PyArray_DATA(obj) ((void *)(((PyArrayObject *)(obj))->data))

> I think the best approach is for Cython to just translate arr.data into
> PyArray_DATA(arr) in the generated C code.

Agreed.

--
--Gerald Dalley
  dall...@ieee.org


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sturla Molden  
View profile  
 More options Jul 30 2012, 1:16 pm
From: Sturla Molden <sturlamol...@yahoo.no>
Date: Mon, 30 Jul 2012 19:16:10 +0200
Local: Mon, Jul 30 2012 1:16 pm
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?
On 30.07.2012 17:33, Bradley Froehle wrote:

> Also, in Numpy 1.6, arr.data and PyArray_DATA are equivalent (up to a
> cast to the data type):
> #define PyArray_DATA(obj) ((void *)(((PyArrayObject *)(obj))->data))

That is due to change.

> I think the best approach is for Cython to just translate arr.data into
> PyArray_DATA(arr) in the generated C code.

>     It seems like there are shortcomings to all the major approaches:

>       * arr.data
>           o good: intent is clear (but the semantics don't actually match)
>           o bad: will break at some point, doesn't handle offsets properly
>       * &arr[0]
>           o good: intent is mostly clear, follows a common C++ idiom
>           o bad: requires completely disabling bounds checking for the
>             entire function to handle 0-length arrays

No, it does not require bounds checking to be turned off. But it will do
a bounds check if you don't.

>       * PyArray_DATA(arr)
>           o good: has proper semantics
>           o bad: verbose and unpythonic

>     Would it make sense to enhance numpy.pxd to provide a new property
>     that acts like PyArray_DATA(arr) but looks pythonic (we might call
>     it zeroth_elem, first_elem, or something like that)?

In my opinion we should use memoryviews instead, unless compatibility
with Python 2.4 is absolutely required. We are discussing how to use an
older, half-broken (useless for anything but trivial code), and soon
deprecated interface to solve a problem that memoryviews do right.

http://docs.cython.org/src/userguide/memoryviews.html

The most important differences:

o Memoryviews are much faster than np.ndarray and expose all the PEP
3118 attributes. Particularly slicing and using them as function
arguments are faster (like Fortran arrays). You will notice the
difference if your code has array slicing or function calls in it.
np.ndarray is only fast and useful if you can inline everything into one
big function, and you never generate an array slice. So for everything
except trivial use-cases, typed memoryviews is what we should use.

o Cython arrays (cython.view.array) have an attribute .data which will
do what you want. They behave like NumPy arrays except they are much
faster.

o Memoryviews have a shorter syntax.

o NumPy arrays can be converted to memoryviews and Cython arrays
automatically.

o This will take build and runtime dependency on NumPy away. And yet we
can use as much of NumPy's Python API as we want.

o np.ndarray might even be deprecated when Cython reach 1.0, so using it
should be discouraged. I'd even go so far as to suggest all references
to suggest all instructions on using np.ndarray removed from the Wiki
and replaced with memoryviews.

o Memoryviews works with all PEP 3118 compliant buffers, including NumPy.

Sturla


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sturla Molden  
View profile  
 More options Jul 30 2012, 1:22 pm
From: Sturla Molden <sturlamol...@yahoo.no>
Date: Mon, 30 Jul 2012 19:22:13 +0200
Local: Mon, Jul 30 2012 1:22 pm
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer to C ?
On 30.07.2012 19:16, Sturla Molden wrote:

>> o good: intent is mostly clear, follows a common C++ idiom
>> o bad: requires completely disabling bounds checking for the
>> entire function to handle 0-length arrays

> No, it does not require bounds checking to be turned off. But it will do
> a bounds check if you don't.

Sorry, let me rephrase this:

Bounds checking will prevent you from passing a dangling data pointer to
C if you have a 0-length array. I.e. you get an informative Python
exception (IndexError) with a traceback instead of a segfault -- which
can be nice I guess.

Sturla


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 26 - 50 of 59 < Older  Newer >
« Back to Discussions « Newer topic     Older topic »