Re: [cython-users] Re: Buffer dtype mismatch error

144 views
Skip to first unread message

Robert Bradshaw

unread,
Aug 20, 2014, 2:09:40 AM8/20/14
to cython...@googlegroups.com, gregorio...@gmail.com, Józsa, Gergely, Krabót, Mátyás, Discussion of Numerical Python
On Tue, Aug 19, 2014 at 8:51 AM, Robert Bradshaw <robe...@gmail.com> wrote:
> Generally, if the problem is due to being on Windows, we're counting
> on someone with knowledge about and access to Windows to answer. In
> other words, windows users are expected to support other windows
> users, just as linux users and mac users support each other (with the
> added benefit that the latter two groups include developers).
>
> As it looks like you can reproduce this issue on other platforms; I
> can look into it.

On this note, the thread you bumped was titled "Buffer dtype mismatch
error on Windows but not OSX" which made it sound very Windows
specific.

> On Tue, Aug 19, 2014 at 4:51 AM, Gregorio Bastardo
> <gregorio...@gmail.com> wrote:
>> Hello,
>>
>> We experienced this issue also under OS X, see the updated gist for details.
>>
>>> https://gist.github.com/gregorio-bastardo/f2be00493a5b8c186c08

See below for a smaller example of this error.

The problem seems to be that if the full struct has size a multiple of
4 then the Py_buffer format string numpy gives drops the alignment
parameter (which is still use to align sub-structs). For example, when
using the code below numpy specifies a buffer format of

T{T{h:s:b:t:}:a:b:x:}

which, per https://docs.python.org/2/library/struct.html#module-struct
, says to use native alignment

but uncommenting y (adding an extra byte, bringing the total to 5 bytes) we get

T{T{=h:s:b:t:}:a:b:x:b:y:}

which does give us alignment information and our internal struct
format agrees with this numpy format string.

One short term solution is to add a dummy byte at the end, but
hopefully we should have a fix out soon.


--------------------------------------
import numpy as np
cimport numpy as np

cdef packed struct A:
np.int16_t s
np.int8_t t

a = [
('s', np.int16),
('t', np.int8),
]

cdef packed struct B:
A a
np.int8_t x
# np.int8_t y

b = [
('a', a),
('x', np.int8),
# ('y', np.int8),
]

cdef test_it():
print sizeof(B)
print "try"
cdef np.ndarray[B, ndim=1] input = np.zeros(10, dtype=b)
print "ok"
test_it()
--------------------------------------

Robert Bradshaw

unread,
Aug 20, 2014, 3:12:33 AM8/20/14
to cython...@googlegroups.com, gregorio.bastardo, Józsa, Gergely, Krabót, Mátyás, Discussion of Numerical Python
Actually, this needs to be fixed in numpy, as "native" structs are
padded at the end according to the alignment of their first member but
numpy structs are packed (by default). Even creating a dtype with
align=True doesn't work as numpy doesn't respect this padding (maybe
it simply flattens the structs?).

Stefan Behnel

unread,
Aug 20, 2014, 3:21:18 AM8/20/14
to cython...@googlegroups.com
Robert Bradshaw schrieb am 20.08.2014 um 09:12:
> On Tue, Aug 19, 2014 at 11:09 PM, Robert Bradshaw wrote:
>> See below for a smaller example of this error.
>>
>> The problem seems to be that if the full struct has size a multiple of
>> 4 then the Py_buffer format string numpy gives drops the alignment
>> parameter (which is still use to align sub-structs). For example, when
>> using the code below numpy specifies a buffer format of
>>
>> T{T{h:s:b:t:}:a:b:x:}
>>
>> which, per https://docs.python.org/2/library/struct.html#module-struct
>> , says to use native alignment
>>
>> but uncommenting y (adding an extra byte, bringing the total to 5 bytes) we get
>>
>> T{T{=h:s:b:t:}:a:b:x:b:y:}
>>
>> which does give us alignment information and our internal struct
>> format agrees with this numpy format string.
>>
>> One short term solution is to add a dummy byte at the end, but
>> hopefully we should have a fix out soon.
>
> Actually, this needs to be fixed in numpy, as "native" structs are
> padded at the end according to the alignment of their first member but
> numpy structs are packed (by default). Even creating a dtype with
> align=True doesn't work as numpy doesn't respect this padding (maybe
> it simply flattens the structs?).

Did you try it with the latest version of NumPy? (Just asking since there
were no versions mentioned so far, I have no idea if it changes anything.)

Stefan

Robert Bradshaw

unread,
Aug 20, 2014, 11:52:19 AM8/20/14
to cython...@googlegroups.com, Discussion of Numerical Python
Nope, still broken :(

- Robert

Gregorio Bastardo

unread,
Aug 21, 2014, 10:51:06 AM8/21/14
to cython...@googlegroups.com
Hi Robert,

I admit my first email was misleading on the Windows-specific thread.
I'm happy that it turned out to be a general problem and got support.

>>> Actually, this needs to be fixed in numpy, as "native" structs are
>>> padded at the end according to the alignment of their first member but
>>> numpy structs are packed (by default). Even creating a dtype with
>>> align=True doesn't work as numpy doesn't respect this padding (maybe
>>> it simply flattens the structs?).

Thank you for clarifying the root cause and for the workaround
suggestion. If I understood correctly, numpy does not handle correctly
the alignment of nested structured arrays in this case. What is the
proposed next step, would you report it on their mailing list?

All the best,
Gregorio

Francesc Alted

unread,
Aug 21, 2014, 1:03:34 PM8/21/14
to cython...@googlegroups.com
Hi Gregorio,

El 21/08/14 a les 16:36, Gregorio Bastardo ha escrit:
My understanding is that there is nothing wrong in alignments of nested
records in NumPy:

In [230]: dt = np.dtype([('x','f4'), ('c', [('y', 'i2'),('v', 'i4')])])

In [231]: dt.itemsize
Out[231]: 10

In [232]: dt = np.dtype([('x','f4'), ('c', [('y', 'i2'),('v', 'i4')])],
align=True)

In [233]: dt.itemsize
Out[233]: 12

In [234]: dt = np.dtype([('x','i2'), ('c', [('y', 'i2'),('v', 'i4')])],
align=True)

In [235]: dt.itemsize
Out[235]: 12

Everything looks good to me here.

-- Francesc Alted

Robert Bradshaw

unread,
Aug 21, 2014, 1:17:08 PM8/21/14
to cython...@googlegroups.com
It's not with the item size, it's with the format string. In my
example, np.dtype(b) correctly has an itemsize of 4, but the Py_buffer
format string is "T{T{h:s:b:t:}:a:b:x:}" which, due to (default)
alignment, has size 6. It should be "T{T{=h:s:b:t:}:a:b:x:}"

I filed https://github.com/numpy/numpy/issues/4979

Francesc Alted

unread,
Aug 21, 2014, 1:29:02 PM8/21/14
to cython...@googlegroups.com
El 21/08/14 a les 19:16, Robert Bradshaw ha escrit:
Doh, it was with the C API :)

--
Francesc Alted

Robert Bradshaw

unread,
Aug 21, 2014, 1:41:47 PM8/21/14
to cython...@googlegroups.com

Gregorio Bastardo

unread,
Aug 21, 2014, 1:55:51 PM8/21/14
to cython...@googlegroups.com
Thank you again. I hope it will be good for arbitrary structure
nesting, as soon I'll face with a C lib that uses 3-level nested
structure (with an array of sub-sub struct :)
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "cython-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Robert Bradshaw

unread,
Aug 21, 2014, 2:36:17 PM8/21/14
to cython...@googlegroups.com
OK, this is messier than I thought (at least doing the naive thing has
more widespread changes then I like) :(.

Gregorio Bastardo

unread,
Aug 23, 2014, 3:14:09 AM8/23/14
to cython...@googlegroups.com
> OK, this is messier than I thought (at least doing the naive thing has
> more widespread changes then I like) :(.

I see that noone is assigned to the issue. Does it worth to notify the
original author(s) of that numpy module?

Pauli Virtanen

unread,
Aug 23, 2014, 4:15:18 PM8/23/14
to cython...@googlegroups.com
Numpy does not use the assignment feature.

Gregorio Bastardo

unread,
Sep 1, 2014, 1:24:04 PM9/1/14
to cython...@googlegroups.com
> Numpy does not use the assignment feature.

Thank you, Pauli. So far I see that the issue is still intact.

@Robert: I tested your suggestion ("add a dummy byte at the end"), the
buffer mismatch error indeed disappears. However I got compile time
error when trying to pass this to an external c library (which does
not expect the dummy byte). Following your example, here's what I
tried:

------------- ext_c_lib.h -----------------
#pragma pack(push, 1)

typedef struct A_t
{
short int s;
char t;
} A_t;

typedef struct B_t

{
A_t a;
char x;
} B_t;

int proc_c(B_t input);

#pragma pack(pop)

------------- foo.pyx -----------------
cdef extern from "ext_c_lib.h":
ctypedef struct B_t
int proc_c(B_t input)

def proc(np.ndarray[B, ndim=1] inp):
cdef unsigned int cycle = inp.size
cdef np.ndarray[np.int32_t, ndim=1] res
for i in range(cycle):
res[i] = proc_c( <B_t> inp[i] ) # error: incompatible type for
argument 1 of 'proc_c', expected 'B_t' but argument is of type 'struct
__pyx_t_..._B'
res[i] = proc_c( <B_t> inp.data[i] ) # error: incompatible type
for argument 1 of 'proc_c', expected 'B_t' but argument is of type
'char'

Gregorio Bastardo

unread,
Sep 2, 2014, 2:30:57 AM9/2/14
to cython...@googlegroups.com
> ------------- foo.pyx -----------------
> ...
> def proc(np.ndarray[B, ndim=1] inp):
> cdef unsigned int cycle = inp.size
> cdef np.ndarray[np.int32_t, ndim=1] res

I made a mistake, the variable "res" is initialized in my environment
(and returned from the function) but missing from this example (res =
np.zeros(cycle, dtype=np.int32)). My intention is to demonstrate how
the buffer is actually used in a loop to call the external library.
Improvement ideas for this use case are also very welcome.

Gregorio Bastardo

unread,
Sep 8, 2014, 7:48:06 AM9/8/14
to cython...@googlegroups.com
I tried to create a local variable of the struct w/o the extra byte
and copy each field from the input buffer (except the last dummy), but
it gives the same error when trying to pass it to the external c
function:

def proc(np.ndarray[B, ndim=1] inp):
...
cdef B_wo_dummy inp_tmp

for i in range(cycle):
inp_tmp.a = inp.data[i].a
inp_tmp.x = inp.data[i].x
res[i] = proc_c( <B_t> inp_tmp] ) # error: incompatible type for
argument 1 of 'proc_c', expected 'B_t' but argument is of type
'struct__pyx_t_...foo_B'

Any ideas? Am I missing something fundamental here?

Gregorio Bastardo

unread,
Sep 9, 2014, 12:02:39 PM9/9/14
to cython...@googlegroups.com
Finally I got it running with memoryview and a little hack :)

def proc(B[:] inp):
...
cdef B_wo_dummy inp_tmp
for i in range(cycle):
inp_tmp.a = inp.data[i].a
inp_tmp.x = inp.data[i].x
res[i] = proc_c( (<B_t *> &inp_tmp)[0] ) # casting to a pointer
and dereferencing

Any ideas why

proc_c( <B_t> inp_tmp )

does not work?

Thanks for all your support,
Gregorio

Gregorio Bastardo

unread,
Sep 15, 2014, 5:55:44 AM9/15/14
to cython...@googlegroups.com
Just for the record, it works w/o the tmp variable (directly with memoryview):

res[i] = proc_c( (<B_t *> &inp[i])[0] )
Reply all
Reply to author
Forward
0 new messages