looking for a cython solution for pure python code in two modules

Marko Loparic

unread,

Feb 7, 2016, 5:03:13 AM2/7/16

to cython...@googlegroups.com

Hello,

I am translating a C code into cython and struggling trying to find a way to obtain the same performance when adding some modularity. The good news is that -- already in the C code -- 99% of the time is spent in a single loop. Simplifying it and writing without thinking about cython we get to

import numpy as np
class Problem:
    def __init__(self, n, m):
        self.n = n
        self.m = m
        self.price = np.linspace(1, 2, n)
        self.amount = np.linspace(3, 4, m)
    def value(self, i, j):

        return self.price[i] * self.amount(j)
def compute(problem):
    value = 0
    for i in range(problem.n):
        for j in range(problem.m):
            value += problem.value(i, j)
    return value
p = Problem(5000, 5000)
print 'value =', compute(p)

The actual value routine is bigger but is restricted to the manipulation of integers and arrays, like the one shown here. The compute algorithm is also more complex, but also consists on integer and array manipulation plus calls to routines of the problem class/module passing integer arguments.

My goals are (from the most important to the least important):

1. Achieve C performance (ie. "nogil code", white cython html) for the loop. If possible, the value method/function should be inlined (so to avoid a C function call).

2. Separate the value function/method from the compute code in two different modules. The compute code ---a library, with a generic algorithm -- is to be used by different problems, each one with a different value routine. I know that in order to have my first goal met the link between the two modules has to be done in compilation time. The 'problem' object does not need to be a cython class. It can also be a cython module with functions.

3. Have these two modules in a cython pure python source code, so as to be able to run the code without cython (and without having two different versions of the same code).

In our tests so far we have only been able to achieve goal 1 using C structures (instead of numpy annotated arrays). This forbids (as far as I understand) the usage of pure python mode.

Do you have something to suggest?

Thanks a lot!

Marko

Oscar Benjamin

unread,

Feb 9, 2016, 7:49:55 PM2/9/16

to cython-users

On 7 February 2016 at 10:02, Marko Loparic <marko....@gmail.com> wrote:
> 1. Achieve C performance (ie. "nogil code", white cython html) for the loop.
> If possible, the value method/function should be inlined (so to avoid a C
> function call).
>
> 2. Separate the value function/method from the compute code in two different
> modules. The compute code ---a library, with a generic algorithm -- is to be
> used by different problems, each one with a different value routine. I know
> that in order to have my first goal met the link between the two modules has
> to be done in compilation time. The 'problem' object does not need to be a
> cython class. It can also be a cython module with functions.
>
> 3. Have these two modules in a cython pure python source code, so as to be
> able to run the code without cython (and without having two different
> versions of the same code).

A while ago I spent some time profiling different ways to achieve goal
2. I found that the most efficient method to achieve the redirection
is by subclassing and overriding a cdef method. This way the dispatch
is handled by a vtable (like in C++).

I have an example of that here:
https://github.com/oscarbenjamin/euler/blob/master/euler_10.pyx
https://github.com/oscarbenjamin/euler/blob/master/README.rst

I'm not sure if it's possible to combine this with pure Python mode
(goal 3) but I don't see any reason why it wouldn't work.

Whether or not you can meet goal 1 depends on the detail of you
problem. The importance of goal 2 does as well: if all you're overhead
is in value() then the expense of compute and the compute to value()
connection may be irrelevant.

--
Oscar

Marko Loparic

unread,

Feb 11, 2016, 4:20:50 AM2/11/16

to cython...@googlegroups.com

On Wed, Feb 10, 2016 at 1:49 AM, Oscar Benjamin
<oscar.j....@gmail.com> wrote:
> A while ago I spent some time profiling different ways to achieve goal
> 2. I found that the most efficient method to achieve the redirection
> is by subclassing and overriding a cdef method. This way the dispatch
> is handled by a vtable (like in C++).
>
> I have an example of that here:
> https://github.com/oscarbenjamin/euler/blob/master/euler_10.pyx
> https://github.com/oscarbenjamin/euler/blob/master/README.rst

Wonderful! Thanks for sharing this precious analysis!

For anyone reading this and who has not yet read Oscar's code here is
my advice: if you -- like me -- 1. use cython but are not always
certain of what it does and 2. are tired of using something important
for your work in a trial-and-error-stop-when-is-good-enough mode, I
can't think of a better way to advance than reading Oscar 20
experiments. It is trivial to run them and diffing experiment by
experiment is very enlightening. The complete code of each experiment
is small and so each diff fits in a single page.

I am still doing the exercise myself and still puzzled by some
differences in performance, e.g. between euler_04 and euler_07.

If I get good results for my code (eg. using the approach in
euler_19), I come back to this forum to report.

Thanks,
Marko

Oscar Benjamin

unread,

Feb 11, 2016, 9:54:15 AM2/11/16

to cython-users

On 11 February 2016 at 09:20, Marko Loparic <marko....@gmail.com> wrote:
>
> I am still doing the exercise myself and still puzzled by some
> differences in performance, e.g. between euler_04 and euler_07.

Using numpy arrays is a lot slower than working with raw pointers to C
arrays. Maybe it would be faster with memoryviews but it's hard to
beat plain C arrays. This may not be relevant for your purposes where
you just want to return an individual number.

--
Oscar

Chris Barker

unread,

Feb 11, 2016, 7:02:34 PM2/11/16

to cython-users

Oscar -- great work and write up!

I just looked over it quickly, but I had a couple thoughts:

1) did you try array.arrays? I'm still fuzzy on exactly what Cython supports, but they may be a way to get faster indexing for 1-d arrays.

2) Using numpy arrays is a lot slower than working with raw pointers to C arrays.

well, if it's indexing you are talking about, you can grab the pointer from the numpy array and use that:

<double *> the_array[0]

or maybe:

<double *> the_array.data

(Not sure if that's deprecated now)

That might be a way to use ndarrays to pass around, but use pointer indexing in the code. Though I'm still not sure it should be that slow -- if you specify the dtype and rank of the ndarray, and use unsigned integers for indexing, cython can be very fast with numpy arrays.

Maybe it would be faster with memoryviews but it's hard to
beat plain C arrays.

you're not going to beat plain C arrays -- but should be able to match them.

-CHB

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA 98115   (206) 526-6317   main reception

Chris....@noaa.gov

Reply all

Reply to author

Forward