Working with NumPy and Cython: Typecasting, `heap` support, Memview

Chintak Sheth

unread,

Jul 18, 2013, 6:59:54 AM7/18/13

to cython...@googlegroups.com

Hi,

I am optimising my Python code using Cython for scikit-image and hence rely heavily on numpy. I have a couple of questions in this regard.

I need to pass a `cnp.float_t[:, ::1]` and `cnp.uint8_t[:, ::1]` memviews as an argument to a function call. As a result I am not able to static type the corresponding parameter in the function definition. If I static type `cnp.float_t[:, ::1]` in the function definition, it gives me a compile error. And if I use `<cnp.float_t[:, ::1]>` in the function definition, I get `Can only create cython.array from pointer or array`, though I am passing a memview. So how can I type cast a memview of a numpy array?
I need a priority queue `heap` structure for the algorithm for which I am using the in-built python `heapq` module. Is there a Cython implementation of this which I could use and which would be faster?
I use `np.logical_xor`, `np.transpose` and `np.where` function calls which are pipelined in my Python implementation, which after profiling turns out has a heavy overhead. Any alternate implementations to this? I perform this on a numpy array and then store them using a memview.
In general, if I want to perform any arithmetic operation on a memview or even indexing, why do I need to use `base` attribute? This doesn't slow things down right? I am not able to understand this concept yet.

I know I have put quite a lot of questions in a single thread, but please help me out here. I checked out the documentation and some tutorials from various SciPy events including the latest by Kurt Smith but wasn't able to get much help on these aspects.

Thanks,

Chintak

[1] https://github.com/chintak/scikit-image/tree/inpaint/skimage/inpaint

Lars Buitinck

unread,

Jul 18, 2013, 7:50:04 AM7/18/13

to cython...@googlegroups.com

2013/7/18 Chintak Sheth <chinta...@gmail.com>:

> I need a priority queue `heap` structure for the algorithm for which I am
> using the in-built python `heapq` module. Is there a Cython implementation
> of this which I could use and which would be faster?

Check the Python source code [1]. The heapq module is actually
implemented in twice, in Python and in C, so you should be able to
either version to work in Cython.

Also the SciPy source code contains some sophisticated (but
specialized) heap data structures written in Cython. They're in the
sparse graphs package.

[1] http://hg.python.org/cpython/

--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

Chris Barker - NOAA Federal

unread,

Jul 18, 2013, 11:38:16 AM7/18/13

to cython...@googlegroups.com

On Jul 18, 2013, at 4:50 AM, Lars Buitinck <L.J.Bu...@uva.nl> wrote:

> 2013/7/18 Chintak Sheth <chinta...@gmail.com>:
>> I need a priority queue `heap` structure for the
>

> Check the Python source code [1]. The heapq module is actually
> implemented in twice, in Python and in C, so you should be able to
> either version to work in Cython.

However, it's still going to be storing python objects. Depending on
your use-case, you may be better off with a staticly typed
implimentation in c or c++. Perhaps the STL or Boost versions.

-Chris

Chris Barker - NOAA Federal

unread,

Jul 18, 2013, 12:11:49 PM7/18/13

to cython...@googlegroups.com

On Jul 18, 2013, at 4:40 AM, Chintak Sheth <chinta...@gmail.com> wrote:

Hi,

I am optimising my Python code using Cython for scikit-image and hence rely heavily on numpy. I have a couple of questions in this regard.
I need to pass a `cnp.float_t[:, ::1]` and `cnp.uint8_t[:, ::1]` memviews as an argument to a function call.

Do you mean it could be either type?

You are going to have to type-dispatch somewhere. If you are committed to numpy anyway, then write the function to take a generic numpy array, then check its type, and dispatch after that.

I use `np.logical_xor`, `np.transpose` and `np.where` function calls which are pipelined in my Python implementation, which after profiling turns out has a heavy overhead. Any alternate implementations to this?

You may need to simply write those as loops in cython.

Chris

Chintak Sheth

unread,

Jul 20, 2013, 9:37:23 AM7/20/13

to cython...@googlegroups.com

Hi,

I tried something fairly simple to get used to `pyximport` and check out the basics of typing with numpy types. I am typing a function which computes fibonacci. Here is the code:

```

cimport numpy as cnp

def fib(cnp.uint8_t n):

cdef cnp.uint8_t a = 1, b = 1

cdef Py_ssize_t i

for i in range(n):

a, b = a+b, a

return a

```

This is the error which I get when I try to `pyximport` in ipython or manually compile the file - http://pastebin.com/nTMGi6wu

I don't understand what I am doing incorrect here. Any suggestions?

Chintak

Yury V. Zaytsev

unread,

Jul 21, 2013, 5:13:57 AM7/21/13

to cython...@googlegroups.com

On Sat, 2013-07-20 at 06:37 -0700, Chintak Sheth wrote:
>
> I don't understand what I am doing incorrect here. Any suggestions?

From your pastebin:

In [3]: import fib
/Users/chintak/.pyxbld/temp.macosx-10.6-intel-2.7/pyrex/fib.c:314:31:
error: numpy/arrayobject.h: No such file or directory
/Users/chintak/.pyxbld/temp.macosx-10.6-intel-2.7/pyrex/fib.c:315:31:
error: numpy/ufuncobject.h: No such file or directory
It's unable to find NumPy header files.

--
Sincerely yours,
Yury V. Zaytsev

Chintak Sheth

unread,

Jul 21, 2013, 5:30:09 AM7/21/13

to cython...@googlegroups.com

Hi Yury,

Oh, I needed to give a setup args,

``import pyximport

pyximport.install(setup_args={"include_dirs": np.get_include()}, reload_support=True)

``

Found an answer on StackOverflow. Just a suggestion, maybe update the docs to this effect? Or is this the case on my system only?

Thanks,

Chintak

Yury V. Zaytsev

unread,

Jul 21, 2013, 5:48:42 AM7/21/13

to cython...@googlegroups.com

On Sun, 2013-07-21 at 15:00 +0530, Chintak Sheth wrote:
>
> Found an answer on StackOverflow. Just a suggestion, maybe update the
> docs to this effect? Or is this the case on my system only?

Feel free to submit a pull request, and Cython developers will handle
it. Personally, I see no reason why an explicit mention of this fact
would be in any way inappropriate for the pyximport docs.

Chintak Sheth

unread,

Jul 21, 2013, 5:57:44 AM7/21/13

to cython...@googlegroups.com

On Sun, Jul 21, 2013 at 3:18 PM, Yury V. Zaytsev <yu...@shurup.com> wrote:

Feel free to submit a pull request, and Cython developers will handle
it. Personally, I see no reason why an explicit mention of this fact
would be in any way inappropriate for the pyximport docs.

Sure will! :)

On another note, is there a way to profile a function specifically and see how much time the function calls or statements take inside of that function?

I am performing Image Inpainting. And in my case, the execution time taken by a function (`inpaint_point`) is subject to the image size although I am processing just 300 pixels in either case, i.e. the number of function calls to `inapint_point` are 300 in both cases. Profiling results. The first case is for an image size of 100X100 pixels and the second case for 512X512 pixels.

Code: https://github.com/chintak/scikit-image/blob/inpaint/skimage/inpaint/_inpaint.pyx

Any suggestions on how I could better write it and any syntactical inefficiencies?

Thanks,

Chintak

Yury V. Zaytsev

unread,

Jul 21, 2013, 6:15:40 AM7/21/13

to cython...@googlegroups.com

On Sun, 2013-07-21 at 15:27 +0530, Chintak Sheth wrote:
>
> Any suggestions on how I could better write it and any syntactical
> inefficiencies?

You could compile it using the --annotate mode, which will create an
HTML file where your Cython code is overlaid with color code that shows
how much generated code your statements are responsible for.

I would look through the results for the inpaint_point and check whether
all loops are properly converted to C loops, C math primitives are used
instead of Python builtins where appropriate, etc.

You can expand yellow sections by clicking on them to see the generated
code. It's not perfectly accurate, but I find it more enjoyable then
reading raw C files.

Chintak Sheth

unread,

Jul 21, 2013, 7:16:55 AM7/21/13

to cython...@googlegroups.com

On Jul 21, 2013 3:45 PM, "Yury V. Zaytsev" <yu...@shurup.com> wrote:

> I would look through the results for the inpaint_point and check whether
> all loops are properly converted to C loops, C math primitives are used
> instead of Python builtins where appropriate, etc.
>

Yes, the loops seem to be a dark shade of yellow. How can I write a C equivalent of
`for i, j in array_view:`

And also the best possible C equivalent for a `list` ? Any inbuilt data type?

Chintak

Yury V. Zaytsev

unread,

Jul 21, 2013, 9:09:57 AM7/21/13

to cython...@googlegroups.com

On Sun, 2013-07-21 at 16:46 +0530, Chintak Sheth wrote:
> Yes, the loops seem to be a dark shade of yellow.

I've just had a closer look at what you are doing in ep_neighbor and
it's a performance killer:

You are gradually filling in a Python list (!) and then make a NumPy
array out of it by copying and re-arranging all the data (!!), just to
use it as a source of tuples in `for i_nb, j_nb in nb_view` (!!!).
A very obvious solution is to get rid of this function altogether,
especially since it's just a couple of lines, and you'll lose function
call overhead as a bonus.

As to your other question, it's not immediately obvious what are the
dimensions of center_ind_view, but, in general, a more efficient way of
iterating memory views is something along these lines:

for i in range(view.shape[0]):
for j in range(view.shape[1]):
view[i, j] = ...

http://docs.cython.org/src/userguide/memoryviews.html

Chintak Sheth

unread,

Jul 22, 2013, 2:20:23 AM7/22/13

to cython...@googlegroups.com

On Sun, Jul 21, 2013 at 6:39 PM, Yury V. Zaytsev <yu...@shurup.com> wrote:

I've just had a closer look at what you are doing in ep_neighbor and
it's a performance killer:

You are gradually filling in a Python list (!) and then make a NumPy
array out of it by copying and re-arranging all the data (!!), just to
use it as a source of tuples in `for i_nb, j_nb in nb_view` (!!!).
A very obvious solution is to get rid of this function altogether,
especially since it's just a couple of lines, and you'll lose function
call overhead as a bonus.

Yes, you are right! Apparently, the bonus is 10 seconds! My earlier implementation in Python was like this but this caused a highly indented code and python seemed to show an increase in complexity. So, a function definition seemed apt. But, yeah totally performance killer! Thanks a lot!

As to your other question, it's not immediately obvious what are the
dimensions of center_ind_view, but, in general, a more efficient way of
iterating memory views is something along these lines:

for i in range(view.shape[0]):
for j in range(view.shape[1]):
view[i, j] = ...

http://docs.cython.org/src/userguide/memoryviews.html

I can't use this approach, since the contents `center_ind_view` are the index values I need. But, yeah this approach is much more efficient:

for x in xrange(center_ind_view.shape[0]):

i_nb = center_ind_view[x, 0]

j_nb = center_ind_view[x, 1]

11.555 seconds -> 0.779 seconds! =) Big thanks, Yury!

Chintak

Yury V. Zaytsev

unread,

Jul 22, 2013, 3:43:29 AM7/22/13

to cython...@googlegroups.com

On Mon, 2013-07-22 at 11:50 +0530, Chintak Sheth wrote:
> My earlier implementation in Python was like this but this caused a
> highly indented code and python seemed to show an increase in
> complexity.

Hint: you can appropriately invert the logic flow in order to avoid
problematic nesting, if it's becoming really difficult to handle.

for x, y in z:
if x < 42:
if y > 7:
do_stuff()

for x in range(y):
if x >= 42 or y <= 7:
continue

do_stuff()

I call it the "Zaytsev inversion pattern", but it's also known under
many other names in the industry, such as the law of De Morgan <g>

> So, a function definition seemed apt.

You could have done a bit better by trying to make it properly
inlineable, and use a different data structure, but the question always
is: how much of maintainability and complexity are you willing to pay
for improved performance?

> I can't use this approach, since the contents `center_ind_view` are
> the index values I need.

It might be possible to further improve the performance by rearranging
`center_ind_view` into vectors, but at this point one needs more
detailed profiling results as opposed to just glancing over the code to
properly guide the optimization efforts.

In particular, your `grad_func` now looks quite bad to me, but as I
said, low hanging fruits are picked, profile before optimizing any
further in order to avoid wasting time on something that is not going to
bring significant improvements.

> for x in xrange(center_ind_view.shape[0]):

Don't use xrange, Cython automatically translates both range and xrange
into C loops when possible.

Reply all

Reply to author

Forward