[Numpy-discussion] Fixing PyArray_Descr flags member size, ABI vs pickling issue

David Cournapeau

unread,

Mar 5, 2012, 10:53:05 PM3/5/12

to Discussion of Numerical Python

Hi,

This is following the discussion on bug
http://projects.scipy.org/numpy/ticket/2017

Essentially, there is a discrepency between the actual type of the
flags member in the dtype C structure (char) and how it is declared in
the descriptor table (T_INT). The problem is that we are damned if we
fix it, damned if we are not:
- fixing T_INT to T_BYTE. flag in python is now fixed, but it breaks
pickled numpy arrays
- not fixing it means that the value is broken at the python level.
Changing the flag to be an actual int would break the ABI.

I currently have this: https://github.com/numpy/numpy/pull/228

Which one is the more appropriate ? Maybe a deprecation warning for
1.7 and change it in 1.8 ?

Note that we can fix the hash issue with either solution,

David
_______________________________________________
NumPy-Discussion mailing list
NumPy-Di...@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Robert Kern

unread,

Mar 6, 2012, 6:20:28 AM3/6/12

to Discussion of Numerical Python

On Tue, Mar 6, 2012 at 03:53, David Cournapeau <cour...@gmail.com> wrote:
> Hi,
>
> This is following the discussion on bug
> http://projects.scipy.org/numpy/ticket/2017
>
> Essentially, there is a discrepency between the actual type of the
> flags member in the dtype C structure (char) and how it is declared in
> the descriptor table (T_INT). The problem is that we are damned if we
> fix it, damned if we are not:
> - fixing T_INT to T_BYTE. flag in python is now fixed, but it breaks
> pickled numpy arrays

Is the problem that T_BYTE returns a single-item string? My
handwrapping skills are rusty (Cython has blissfully eradicated this
from my memory), but aren't these T_INT/T_BYTE things just convenient
shortcuts for exposing C struct members as Python attributes? Couldn't
we just write a full getter function for returning the correct value,
just returned as a Python int instead of a str.

--
Robert Kern

David Cournapeau

unread,

Mar 6, 2012, 11:07:31 AM3/6/12

to Discussion of Numerical Python

On Tue, Mar 6, 2012 at 6:20 AM, Robert Kern <rober...@gmail.com> wrote:
> On Tue, Mar 6, 2012 at 03:53, David Cournapeau <cour...@gmail.com> wrote:
>> Hi,
>>
>> This is following the discussion on bug
>> http://projects.scipy.org/numpy/ticket/2017
>>
>> Essentially, there is a discrepency between the actual type of the
>> flags member in the dtype C structure (char) and how it is declared in
>> the descriptor table (T_INT). The problem is that we are damned if we
>> fix it, damned if we are not:
>> - fixing T_INT to T_BYTE. flag in python is now fixed, but it breaks
>> pickled numpy arrays
>
> Is the problem that T_BYTE returns a single-item string?

Yes (although it is actually what we want, instead of an int).

> My handwrapping skills are rusty (Cython has blissfully eradicated this
> from my memory), but aren't these T_INT/T_BYTE things just convenient
> shortcuts for exposing C struct members as Python attributes? Couldn't
> we just write a full getter function for returning the correct value,
> just returned as a Python int instead of a str.

You're right, I did not think about this solution. That's certainly
better than the two I suggested.

Thanks,

David

Travis Oliphant

unread,

Mar 6, 2012, 1:25:40 PM3/6/12

to Discussion of Numerical Python, Discussion of Numerical Python

Why do we want to return a single string char instead of an int?

There is a need for more flags on the dtype object. Using an actual attribute call seems like the way to go. This could even merge the contents of two struct members so that we can add more flags but preserve ABI compatibility.

Travis

--
Travis Oliphant
(on a mobile)
512-826-7480

Robert Kern

unread,

Mar 6, 2012, 1:44:40 PM3/6/12

to Discussion of Numerical Python

On Tue, Mar 6, 2012 at 18:25, Travis Oliphant <tra...@continuum.io> wrote:
> Why do we want to return a single string char instead of an int?

I suspect just to ensure that any provided value fits in the range
0..255. But that's easily done explicitly.

--
Robert Kern

David Cournapeau

unread,

Mar 6, 2012, 1:59:41 PM3/6/12

to Discussion of Numerical Python

On Tue, Mar 6, 2012 at 1:25 PM, Travis Oliphant <tra...@continuum.io> wrote:
> Why do we want to return a single string char instead of an int?
>
> There is a need for more flags on the dtype object. Using an actual attribute call seems like the way to go. This could even merge the contents of two struct members so that we can add more flags but preserve ABI compatibility.

Yes. The T_BYTE/T_INT is actually pretty minor compared to the
underlying issue (where we cast back and forth between int and char).
I will make a new PR that fixes everything but this exact point, and
will put an actual accessor if needed. Given that dtype.flags is
nonsensical as of today (at the python level), I would expect nobody
uses it.

cheers,

David Cournapeau

unread,

Mar 7, 2012, 11:46:13 AM3/7/12

to Discussion of Numerical Python

On Tue, Mar 6, 2012 at 1:44 PM, Robert Kern <rober...@gmail.com> wrote:

On Tue, Mar 6, 2012 at 18:25, Travis Oliphant <tra...@continuum.io> wrote:
> Why do we want to return a single string char instead of an int?

I suspect just to ensure that any provided value fits in the range
0..255. But that's easily done explicitly.

That was not even the issue in the end, my initial analysis was wrong.

In any case, I have now a new PR that fixes both dtypes.flags value and dtype hashing reported in #2017: https://github.com/numpy/numpy/pull/231