Fast python callbacks

194 views
Skip to first unread message

Nick Wogan

unread,
Feb 19, 2022, 1:02:20 AM2/19/22
to cython-users
Hello,

My question is about how to implement really fast python callbacks in Cython.

Context:
I've put together a package called NumbaLSODA. It uses the "lsoda" method to solve ordinary differential equations (ODEs). It uses ctypes to wrap a c++ library.

Using lsoda requires a callback function describing the ODEs. In this package i use numba cfunc to make callback functions.

The benefit of this approach is that all code is compiled and very fast. The drawback is that it isn't very user friendly. Passing data to the callback function is a bit tricky.

I thought I might be able to fix these issues with Cython. I followed the example at this link


and re-wrote the NumbaLSODA wrapper with Cython. The most relevant code is here:

```cython
import numpy as np
cimport numpy as np
from numpy cimport ndarray
np.import_array()

cdef extern from "wrapper.h":
  ctypedef void (*rhs_func)(double t, double *u, double *du, void *data);
 
  struct Data:
    void *function
    void *args
    int neq
 
  void lsoda_wrapper(rhs_func rhs, \
                     int neq, double* u0, void* data, int nt, double* teval, \
                     double* usol, double rtol, double atol, int* success);
 
cdef api void lsoda_cy(object rhs, \
                       int neq, double *u0, tuple args, int nt, double *teval, \
                       double *usol, double rtol, double atol, int *success):
                       
  cdef Data d;
  d.function = <void *>rhs
  d.args = <void *>args
  d.neq = neq
 
  lsoda_wrapper(callback, \
                neq, u0, &d, nt, teval, \
                usol, rtol, atol, success);

cdef void callback(double t, double *u, double *du, void *data):
  # unpack data
  cdef Data *d = <Data *>data
  cdef object f = <object> d.function
  cdef tuple args = <tuple> d.args
  # make numpy array out of u
  cdef double[:] u_view = <double[:d.neq]> u
  cdef ndarray[double, ndim=1] u_ = np.asarray(u_view)
  # call function
  cdef ndarray[double, ndim=1] du_ = f(t, u_, *args)
  # check size
  assert du_.shape[0] == d.neq
  # load output into du
  cdef int i;
  for i in range(d.neq):
    du[i] = du_[i]
```

I then wrap `lsoda_cy` using ctypes. I wrap with ctypes so that the function can be called from within numba functions.

This approach works! But it is about a factor of 10 slower than my pure ctypes implementation in NumbaLSODA. I've done some testing and most of the slowdown has to do with the following 3 lines.

```
cdef double[:] u_view = <double[:d.neq]> u
cdef ndarray[double, ndim=1] u_ = np.asarray(u_view) 
cdef ndarray[double, ndim=1] du_ = f(t, u_, *args)
```
 
Wondering how these lines can be optimized.




Stefan Behnel

unread,
Feb 19, 2022, 3:53:44 AM2/19/22
to cython...@googlegroups.com
Hi,

Nick Wogan schrieb am 19.02.22 um 01:07:
> My question is about how to implement really fast python callbacks in
> Cython.
>
> *Context:*
> I've put together a package called NumbaLSODA
> <https://github.com/Nicholaswogan/NumbaLSODA>. It uses the "lsoda" method
> to solve ordinary differential equations (ODEs). It uses ctypes to wrap a
> c++ library.

ctypes? Really?

Is that because you are aiming to interface with Numba?

From a quick look, it seems that Numba can introspect ctypes callable
objects and unpack their C function pointer. And the documentation
describes a manual way to get the underlying C functions unpacked from
Cython functions:

https://numba.pydata.org/numba-doc/latest/extending/high-level.html#importing-cython-functions

I guess Cython could also provide a ctypes (and/or cffi?) interface to its
C functions, to enable fast C-level calls from Numba. The necessary type
information is there, but it needs a runtime mapping (or some mix of
compile time and runtime) to reflect the ABI (which ctypes represents). A
bunch of sizeof()s would probably do the trick.

Basically, Cython functions could provide a property or method (like
"myfunction.as_ctypes()") that would construct the ctypes representation
and point directly to the underlying C function. Numba could then look for
that method and just call it to get at the function metadata.

Supporting memory views is going to be tricky, though, because buffers only
seem to have a low-level representation in ctypes. The actual buffer
protocol doesn't seem to be supported. I only found this:

https://docs.python.org/3/library/ctypes.html#ctypes._CData.from_buffer

Maybe getting 1-D arrays to work might at least be doable with less effort.
And that would already enable a bunch of use cases.

I guess most of this could actually be implemented in generated Cython
code. That would make the ctypes object creation easy.

I created https://github.com/cython/cython/issues/4649
> I then wrap `lsoda_cy` using ctypes. I wrap with ctypes so that the into
> function can be called from within numba functions.
>
> This approach works! But it is about a factor of 10 slower than my pure
> ctypes implementation in NumbaLSODA. I've done some testing and most of the
> slowdown has to do with the following 3 lines.
>
> ```
> cdef double[:] u_view = <double[:d.neq]> u
> cdef ndarray[double, ndim=1] u_ = np.asarray(u_view)
> cdef ndarray[double, ndim=1] du_ = f(t, u_, *args)
> ```

Numpy also has a C-API function for creating an ndarray view from a C data
pointer directly.

https://numpy.org/devdocs/reference/c-api/array.html#c.PyArray_SimpleNewFromData

You could also consider not returning a new array but letting the user
function work in place. And don't use the ndarray syntax, use a memory view
there as well.

Stefan
Reply all
Reply to author
Forward
0 new messages