I have a 2d numpy array, which I would like to pass to a C function
which expects a double*.
According to
http://mail.scipy.org/pipermail/numpy-discussion/2008-September/037675.html
this requires a dirty hack:
In your_header.h:
#define GET_BUFPTR_HACK(x) __pyx_bstruct_#x.buf
In .pyx code:
cdef extern from "your_header.h":
void* GET_BUFPTR_HACK(x)
cdef double *data
data = GET_BUFPTR_HACK(buf)
But then, this mail is from 2008. Is there a better way now?
Thanks,
-Nikolaus
--
»Time flies like an arrow, fruit flies like a Banana.«
PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
indeed there is -- this looks like a reasonable example:
http://stackoverflow.com/questions/3046305/simple-wrapping-of-c-code-with-cython
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R Â Â Â Â Â Â (206) 526-6959Â Â voice
7600 Sand Point Way NE Â Â (206) 526-6329Â Â fax
Seattle, WA Â 98115 Â Â Â Â (206) 526-6317Â Â main reception
cdef extern void cfoobar(double*)
import numpy as np
cimport numpy as np
def foobar(object array):
# declare a 2d NumPy array in C order
np.ndarray[double, ndim=2, mode="c"] x
# unbox NumPy array into local variable x
# make sure we have a contiguous array in C order
# this might produce a temporary copy
x = np.ascontiguousarray(array, dtype=np.float64)
# call C function with the address of x[0,0],
# that is &x[0,0]
cfoobar(&x[0,0])
The new memoryview syntax could also be used.
Sturla Molden
> cdef extern void cfoobar(double*)
>
> import numpy as np
> cimport numpy as np
>
> def foobar(object array):
>
> # declare a 2d NumPy array in C order
>
> np.ndarray[double, ndim=2, mode="c"] x
Sorry that should be
cdef np.ndarray[double, ndim=2, mode="c"] x
:)
Sturla
Hmm. This requires an explicit cast and probably shows unexpected
behaviour for non-continous arrays. It seems to me that Sturla's code is
the better choice.
Best,
It always is :P
:)
Sturla
Henry
Yeah, but I think when you're wrapping an existing C function then it
typically only expects a pointer and the dimensions, so it won't be able
to deal with non-contiguous arrays.
Dag Sverre
> Presumably the&x[0,0] syntax has an implicit casting.
No casting, the NumPy PyArrayObject struct is unboxed into a set of
local variables. That is why we declare:
cdef np.ndarray[double, ndim=2, mode="c"] x
One of the local variables generated by this declaration is a double*.
Now Cython will make sure that x[i,j] is a dereferenced pointer to the
i,j element of x. If you take the address of dereferenced pointer, the C
compiler knows what to do: &*ptr becomes ptr. Thus in Cython, &x[i,j]
will be the address of element i,j in x.
The exact pointer arithmetics used to compute the pointer to element i,j
depends on the other hints you provide in the declaration, as well as
the value of the @cython.wraparound decorator for the function
(cython.boundscheck also has an effect). For maximum performance, i.e.
comparabke to C or Fortran, use contiguous arrays (mode="c" or
mode="fortran"), @cython.wraparound(False) and @cython.boundscheck(False).
> Is it inferred
> from the array dtype? How does it handle arrays that don't have a
> clear mapping (for example complex256)?
It is infered from the cdef declaration of the ndarray. If you assign a
NumPy array to it that don't mach the declaration you will get an exception.
complex256 maps to
ctypedef long double complex complex256_t
cdef np.ndarray[complex256_t, ndim=2, mode="c"] x
Of course you will need a system that actually has a 128 byte long
double. A simple test is to declare a variable of type npy_complex256:
# compile will fail if npy_complex256 is not
# defined in NumPy headers for the platform
cdef np.npy_complex256 dummy
Cython will use a typedef for complex numbers in C89, _Complex in C99 or
std::complex<> in C++. That is, you can do arithmetics with complex
numbers in Cython, similar to NumPy and Python (that is, unlike C89, but
similar to C99, C++, and Fortran).
Speaking of which: I find this in numpy.pxd:
ctypedef struct npy_complex256:
double real
double imag
This is a bug, yes? (Should be long double I'd think...)
Sturla
> > Is it inferred
>> from the array dtype? How does it handle arrays that don't have a
>> clear mapping (for example complex256)?
>
> It is infered from the cdef declaration of the ndarray. If you assign a
> NumPy array to it that don't mach the declaration you will get an
> exception.
Which is to say (in the example I posted):
np.ascontiguousarray(array, dtype=np.float64)
will raise an exception if a NumPy C contiguous array cannot be
constructed from the argument array.
And then
x = np.ascontiguousarray(array, dtype=np.float64)
will aise an exception if x is assigned an array with the incorrect
number of dimensions (we already know it is contiguous and of dtype C
double, as the declaration of x mandates).
No coding is needed for these type checks, it is taken care of by NumPy
and Cython. (Just make sure the function can propagate Python
exceptions, which means special return values for Cython cdef functions.
And of course you need to own the GIL.)
Sturla
Try again:
In [3]: np.ascontiguousarray(np.zeros((3,3)).T)
Out[3]:
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
So, if you modify the array, you should typically do
a = np.ascontiguousarray(input, ...)
...
if a is not input:
input[...] = a
Dag
> Yeah, but I think when you're wrapping an existing C function then it
> typically only expects a pointer and the dimensions, so it won't be able
> to deal with non-contiguous arrays.
Many C functions take array bounds as a C int or a C long. The shape of
a NumPy array is an array of npy_intp, which can be larger than an int
or a long. Particularly on 64-bit systems there is a danger for integer
overflow (int is "at least" 16 bits, long is "at least" 32 bits).
cdef extern void cfoobar(double*,int,int) nogil
import numpy as np
cimport numpy as np
cimport limits
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def foobar(object array):
# declare a 2d NumPy array in C order
cdef np.ndarray[double, ndim=2, mode="c"] x
cdef int m, n
# unbox NumPy array into local variable x
# make sure we have a contiguous array in C order
# this might produce a temporary copy
x = np.ascontiguousarray(array, dtype=np.float64)
# make sure the shape of x does not overflow a C int
if x.shape[0] <= <Py_ssize_t> limits.INT_MAX:
m = x.shape[0]
else:
raise ValueError, 'integer overflow'
if x.shape[1] <= <Py_ssize_t> limits.INT_MAX:
n = x.shape[1]
else:
raise ValueError, 'integer overflow'
# call C function with the address of x[0,0],
# that is &x[0,0]
# release the GIL while the C function executes
with nogil:
cfoobar(&x[0,0],m,n)
Sturla
Presumably the bounds checking is only applicable in the indexing
context.
I pass my (possibly non-contiguous) array into a library function and it
doesn't care how it lives in memory. Clearly Cython has no way of
knowing how a library function is accessing the memory.
>
> > Is it inferred
> > from the array dtype? How does it handle arrays that don't have a
> > clear mapping (for example complex256)?
>
> It is infered from the cdef declaration of the ndarray. If you assign
> a
> NumPy array to it that don't mach the declaration you will get an
> exception.
Ah, so you *are* doing an implicit cast when you cdef the ndarray. The
point is, there's no way of just getting a pointer of the correct type
based on the dtype of the array.
Consider something like this toy example (which obviously doesn't
work!):
cdef double_function(double *array):
pass
cdef single_function(float *array):
pass
cdef call_a_function(np.ndarray array):
if array.dtype == "float32":
single_function(&array[0])
elif array.dtype == "float64":
double_function(&array[0])
Thanks,
Henry
> In [3]: np.ascontiguousarray(np.zeros((3,3)).T)
> Out[3]:
> array([[ 0., 0., 0.],
> [ 0., 0., 0.],
> [ 0., 0., 0.]])
Yes, here you get a copy, as a Fortran ordered array is passed to
np.ascontiguousarray
> So, if you modify the array, you should typically do
>
> a = np.ascontiguousarray(input, ...)
>
> ...
>
> if a is not input:
> input[...] = a
No, input might not be a NumPy array at all, just something from which a
NumPy array can be constructed.
The safest option is to take a copy (if needed) and pass it out.
Sturla
cdef extern void cfoobar(double*,int,int) nogil
import numpy as np
cimport numpy as np
cimport limits
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def foobar(object array, allow_overwrite=True):
# declare a 2d NumPy array in C order
cdef np.ndarray[double, ndim=2, mode="c"] x
cdef int m, n
# unbox NumPy array into local variable x
# make sure we have a contiguous array in C order
# this might produce a temporary copy
x = np.ascontiguousarray(array, dtype=np.float64)
# make sure the shape of x does not overflow a C int
if x.shape[0] <= <Py_ssize_t> limits.INT_MAX:
m = x.shape[0]
else:
raise ValueError, 'integer overflow'
if x.shape[1] <= <Py_ssize_t> limits.INT_MAX:
n = x.shape[1]
else:
raise ValueError, 'integer overflow'
# make a temporary copy if x was unchanged
# by np.ascontiguousarray and overwrite is
# not allowed
if not allow_overwrite and (x is array):
x = x.copy()
# call C function with the address of x[0,0],
# that is &x[0,0]
# release the GIL while the C function executes
with nogil:
cfoobar(&x[0,0],m,n)
return x
I don't think we understand quite what you mean by "implicit cast". The
declaration of the type is rather *explicit*, and it is validated at
run-time through checking the NumPy dtype.
Your whole sentence sounds like a contradiction in terms to me, since
"dtype" is inherently run-time, and whenever you deal with a pointer
type it is inherently a compile-time thing (and only present as a
safeguard during writing and compiling code (in a sense, the C compiler
checks that what Cython has generated is sane..), since to the machine
hardware pointers don't have "type", they are just memory addresses).
I'm sure you know this though...
>
> Consider something like this toy example (which obviously doesn't
> work!):
>
> cdef double_function(double *array):
> pass
>
> cdef single_function(float *array):
> pass
>
> cdef call_a_function(np.ndarray array):
> if array.dtype == "float32":
> single_function(&array[0])
> elif array.dtype == "float64":
> double_function(&array[0])
Yep, this doesn't work. But, respectively,
<float*>np.PyArray_DATA(array) and <double*>np.PyArray_DATA(array)
does.
Dag
is that preferred to <float *>array.data?
Cheers,
Henry
Sorry, I have a separation in my head of C - python, and if it looks
like python, then it must be doing some behind the scenes C (hence the
"implicit" bit). But yes, I realise the type is explicit set.
Just a peculiarity of my mental model!
>
> Your whole sentence sounds like a contradiction in terms to me, since
> "dtype" is inherently run-time, and whenever you deal with a pointer
> type it is inherently a compile-time thing (and only present as a
> safeguard during writing and compiling code (in a sense, the C
> compiler
> checks that what Cython has generated is sane..), since to the
> machine
> hardware pointers don't have "type", they are just memory addresses).
Yeah sure, I just wanted it to be made clear that there is no run time
funkiness going on - it really is statically typed (this was so I am
happy that my efforts in creating some run time funkiness were not for
no reason).
Cheers,
Henry
> Is this wisdom still valid?:
> http://mail.scipy.org/pipermail/scipy-dev/2008-March/008562.html
Yes.
> On my system, numpy.clongdouble is *defined* as numpy.complex256, so
> it's sufficient to work entirely in terms of numpy.clongdouble and then
> get complex256 (and presumably complex192?) for free. Comparisons of
> complex256 to clongdouble return true. (of course all this holds true
> for numpy.longdouble and numpy.float128)
numpy.longdouble is the dtype that corresponds to long double on your
system. But be adviced that there is no guarrantee for what "long
double" might be. It could be 64, 80 or 128 bits, depending on your
system. On my computer (Windows 64) long double is 64 bits, as long
double is just an ordinary double with Microsoft Visual C++. If I need
to call a Fortran program that actually use a 10 or 16 bytes real, I
cannot rely on long double in C. And as there is no corresponding C
type, it means I cannot make a NumPy array to use with Fortran as well.
(Well, we can always use an array of np.uint8 to store the bytes.)
Sturla
Yes, NumPy is deprecating the latter in newer versions.
Dag
ah yes, but it's only important (at least for the code I've written so
far) that the type of the array in numpy is the same type as that
expected by the library I am calling, which in this case is long double.
If the platform definition of long double changes, then it also changes
for the library. That's why numpy.longdouble is the more useful dtype to
check for - please correct me if I am wrong in my analysis.
Thanks,
Henry
It will compile to the same code in C (when the PyArray_DATA is
expanded). If the NumPy PyArrayObject is changed, albeit unlikely, the
macro PyArray_DATA prevents your C code from breaking. Don't worry about
this when you use Cython (it's only important when you code for C or
Pyrex and use the NumPy C API directly). Which, by the way, I'd strongly
advice against; use Cython, ctypes or f2py instead.
Sturla
Is that only for the C api? Presumably array.data is still the correct
way to access the buffer of an array? (forgive me if I have a conceptual
error).
Thanks,
Henry
> ah yes, but it's only important (at least for the code I've written so
> far) that the type of the array in numpy is the same type as that
> expected by the library I am calling, which in this case is long double.
> If the platform definition of long double changes, then it also changes
> for the library. That's why numpy.longdouble is the more useful dtype to
> check for - please correct me if I am wrong in my analysis.
If the library and NumPy were compiled with the same compiler, and the
library expects long double, then numpy.longdouble is the correct dtype.
But if different compilers were used for the library and for NumPy, then
this might not be the case.
Sturla
Is there a way of checking this?
Henry
Already drunk the kool-aid ;)
Read Lisandro's recent thread. It's only a deprecation though.
Dag
> Is that only for the C api? Presumably array.data is still the correct
> way to access the buffer of an array? (forgive me if I have a conceptual
> error).
array.data is a field in the current NumPy ndarray struct (PyArrayObject).
A NumPy array looks like a NumPy array in Cython, don't worry about the
C details.
To access the buffer in Cython, take the address of the first element,
i.e. &array[0].
To access the buffer in C, use the NumPy C API macros.
Sturla
When you use "cdef np.ndarray[...] arr", and you access arr, you're
really using the C API. Cython doesn't do anything special; it simply
allows raw C access to the C struct representing the Python object.
It's *only* the indexing operation which has magic Cython support.
If (or probably when) NumPy removes direct access to ".data" from the C
API struct, Cython won't emulate it. So you should use the macro if you
want to be future-proof.
See recent thread by Lisandro Dalcin.
Dag Sverre
Is this wisdom still valid?:
http://mail.scipy.org/pipermail/scipy-dev/2008-March/008562.html
On my system, numpy.clongdouble is *defined* as numpy.complex256, so
it's sufficient to work entirely in terms of numpy.clongdouble and then
get complex256 (and presumably complex192?) for free. Comparisons of
complex256 to clongdouble return true. (of course all this holds true
for numpy.longdouble and numpy.float128)
Cheers,
Henry
He wants to do runtime dispatches; casting the untyped data-pointer from
PyArray_DATA is probably the most painless way. Yes, one could create
multiply Cython functions, one for each type, or just have many "cdef
np.ndarray[T]" declarations, one for each type.
Dag Sverre
no, I meant at runtime.
Cheers,
Henry
> He wants to do runtime dispatches; casting the untyped data-pointer from
> PyArray_DATA is probably the most painless way. Yes, one could create
> multiply Cython functions, one for each type, or just have many "cdef
> np.ndarray[T]" declarations, one for each type.
Another option is just to view the array as raw bytes, then he just
needs to declare for one dtype:
def foobar(object array):
cdef np.ndarray[np.uint8_t] x
cdef void *buffer
x = array.view(dtype=np.uint8)
buffer = <void*> &x[0]
Then dispatch on array.dtype...
Sturla
If you get unexpected results or the program segfaults the data types
don't match.
Sturla
:) Unittests then!
Thanks,
Henry
> Is there a way of checking this?
Yes, read the documentation for the compilers.
Sturla