Workaround for creating a numpy array of structs with pointer members

t_tauri

unread,

Feb 25, 2018, 2:47:48 AM2/25/18

to cython-users

Hi all,

I was just wondering if there is a good/accepted workaround for creating a numpy array from an array of structs where each struct has members which are pointers? I don't care about the pointers and am perfectly happy to have them ignored in the numpy array.

To be more clear, I have a structure along the lines of the following:

ctypedef struct simple_t:
    float a[4]
    double b
    int c
    int d
    float* e

I then have an array of these structs, created by an external c code which I am interfacing with. I would like to create a numpy array from this for use in Python code.

I have tried something along the lines of:

cdef simple_t* temp
# update temp here
np.asarray(<simple_t[:10]>temp)

As expected, this doesn't work. However, what I find confusing is that even if I omit the pointer member ,`e`, from the `ctypedef` declaration, I am still unable to use `asarray`.

Does anyone have a good way to work round this? The python code which will make use of the numpy array doesn't care that the pointers exist so I am happy to just omit them in the result of `asarray`...

Nils Bruin

unread,

Feb 26, 2018, 5:39:32 PM2/26/18

to cython-users

On Sunday, February 25, 2018 at 7:47:48 AM UTC, t_tauri wrote:

Hi all,

I was just wondering if there is a good/accepted workaround for creating a numpy array from an array of structs where each struct has members which are pointers? I don't care about the pointers and am perfectly happy to have them ignored in the numpy array.

To be more clear, I have a structure along the lines of the following:

ctypedef struct simple_t: float a[4] double b int c int d float* e

I then have an array of these structs, created by an external c code which I am interfacing with. I would like to create a numpy array from this for use in Python code.

According to http://cython.readthedocs.io/en/latest/src/userguide/language_basics.html having pointer definitions as part of a cdef struct should be no issue.

I have tried something along the lines of:

cdef simple_t* temp # update temp here np.asarray(<simple_t[:10]>temp)

As expected, this doesn't work. However, what I find confusing is that even if I omit the pointer member ,`e`, from the `ctypedef` declaration, I am still unable to use `asarray`.

The presence of `e` shouldn't be an issue. Have you tried looking at http://cython.readthedocs.io/en/latest/src/tutorial/numpy.html to see how efficient interaction with np arrays is done? It doesn't look like the syntax you are using is the recommended one.

Chris Barker

unread,

Feb 26, 2018, 7:03:13 PM2/26/18

to cython-users

On Mon, Feb 26, 2018 at 2:39 PM, Nils Bruin <bruin...@gmail.com> wrote:

To be more clear, I have a structure along the lines of the following:

ctypedef struct simple_t: float a[4] double b int c int d float* e

I then have an array of these structs, created by an external c code which I am interfacing with. I would like to create a numpy array from this for use in Python code.

Does the numpy array need to share the data pointer? i.e. do you need to alter it in-place with Python?

If not, it may be easier to make a copy of the data you want.

According to http://cython.readthedocs.io/en/latest/src/userguide/language_basics.html having pointer definitions as part of a cdef struct should be no issue.

I'm not sure that addresses the numpy array issue.

if you want your numpy array to have that data pointer, you will need to have a numpy dtype that matches that struct. I"m not sure if Cython provides a way to generate that dtype, but it should be farily straightforward -- something like:

In [15]: dt = np.dtype([('a', 'f4', (4,)),

...: ('b', 'f8'),

...: ('c', 'i'),

...: ('d', 'i'),

...: ('e', 'i8'),

...: ]

...: )

In [16]: dt

Out[16]: dtype([('a', '<f4', (4,)), ('b', '<f8'), ('c', '<i4'), ('d', '<i4'), ('e', '<i8')])

not that I used "u8" for the pointer, as you are right, I don't know that numpy can deal with an arbitrary pointer like that. And I think on a 64 bit system, you just need something with the 8 bytes -- and an unsigned int is OK for a memory address.

Then you can create your array with:

array([([0., 0., 0., 0.], 0., 0, 0, 0), ([0., 0., 0., 0.], 0., 0, 0, 0),

([0., 0., 0., 0.], 0., 0, 0, 0), ([0., 0., 0., 0.], 0., 0, 0, 0),

([0., 0., 0., 0.], 0., 0, 0, 0), ([0., 0., 0., 0.], 0., 0, 0, 0)],

dtype=[('a', '<f4', (4,)), ('b', '<f8'), ('c', '<i4'), ('d', '<i4'), ('e', '<u8')])

and then assign the pointer to it.

Or call the numpy API to create an array from a pointer.

I have tried something along the lines of:

cdef simple_t* temp # update temp here np.asarray(<simple_t[:10]>temp)

As expected, this doesn't work. However, what I find confusing is that even if I omit the pointer member ,`e`, from the `ctypedef` declaration, I am still unable to use `asarray`.

you may be able to use asarray, if you make a memoryview of your pointer first....

-CHB

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA 98115   (206) 526-6317   main reception

Chris....@noaa.gov

t_tauri

unread,

Mar 3, 2018, 1:22:29 AM3/3/18

to cython-users

As expected, this doesn't work. However, what I find confusing is that even if I omit the pointer member ,`e`, from the `ctypedef` declaration, I am still unable to use `asarray`.

The presence of `e` shouldn't be an issue. Have you tried looking at http://cython.readthedocs.io/en/latest/src/tutorial/numpy.html to see how efficient interaction with np arrays is done? It doesn't look like the syntax you are using is the recommended one.

Thanks for your links and response. I've been trying to use typed memoryviews (http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html) which, if I understand correctly, are the preferred way of interacting with numpy array data now.

The docs do mention that pointers in typed memoryviews are not supported (http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#python-buffer-support), however, I had originally assumed that this just meant that they couldn't be accessed, not that you can't wrap a c struct which contains pointers at all using a memoryview. This is what led to my question about any known workaround...

I'm really keen to know if there is a way I can wrap a c struct with a pointer (where I am happy to not have access to the pointer) using a memoryview, without having to manually calculate a dtype with offsets, padding etc...

t_tauri

unread,

Mar 3, 2018, 1:22:49 AM3/3/18

to cython-users

On Tuesday, 27 February 2018 11:03:13 UTC+11, Chris Barker wrote:

On Mon, Feb 26, 2018 at 2:39 PM, Nils Bruin <bruin...@gmail.com> wrote:
To be more clear, I have a structure along the lines of the following:

ctypedef struct simple_t: float a[4] double b int c int d float* e

I then have an array of these structs, created by an external c code which I am interfacing with. I would like to create a numpy array from this for use in Python code.

Does the numpy array need to share the data pointer? i.e. do you need to alter it in-place with Python?

No. I'm happy to ignore the pointer.

If not, it may be easier to make a copy of the data you want.

This is the route I am going down just now, although it's not ideal due to the 10s of GB which the raw data occupies in my real problem. If I can't find a good workaround then this is probably how I'll proceed.

According to http://cython.readthedocs.io/en/latest/src/userguide/language_basics.html having pointer definitions as part of a cdef struct should be no issue.

I'm not sure that addresses the numpy array issue.

if you want your numpy array to have that data pointer, you will need to have a numpy dtype that matches that struct. I"m not sure if Cython provides a way to generate that dtype, but it should be farily straightforward -- something like:

In [15]: dt = np.dtype([('a', 'f4', (4,)),
...: ('b', 'f8'),
...: ('c', 'i'),
...: ('d', 'i'),
...: ('e', 'i8'),
...: ]
...: )

In [16]: dt
Out[16]: dtype([('a', '<f4', (4,)), ('b', '<f8'), ('c', '<i4'), ('d', '<i4'), ('e', '<i8')])

not that I used "u8" for the pointer, as you are right, I don't know that numpy can deal with an arbitrary pointer like that. And I think on a 64 bit system, you just need something with the 8 bytes -- and an unsigned int is OK for a memory address.

This is a great tip. Thanks for this! It's much easier for me to do this than to manually work out offsets with padding etc...

Then you can create your array with:

array([([0., 0., 0., 0.], 0., 0, 0, 0), ([0., 0., 0., 0.], 0., 0, 0, 0),

   ([0., 0., 0., 0.], 0., 0, 0, 0), ([0., 0., 0., 0.], 0., 0, 0, 0),

   ([0., 0., 0., 0.], 0., 0, 0, 0), ([0., 0., 0., 0.], 0., 0, 0, 0),

   ([0., 0., 0., 0.], 0., 0, 0, 0), ([0., 0., 0., 0.], 0., 0, 0, 0),

   ([0., 0., 0., 0.], 0., 0, 0, 0), ([0., 0., 0., 0.], 0., 0, 0, 0)],

dtype=[('a', '<f4', (4,)), ('b', '<f8'), ('c', '<i4'), ('d', '<i4'), ('e', '<u8')])

and then assign the pointer to it.

Or call the numpy API to create an array from a pointer.

I have tried something along the lines of:

cdef simple_t* temp # update temp here np.asarray(<simple_t[:10]>temp)

As expected, this doesn't work. However, what I find confusing is that even if I omit the pointer member ,`e`, from the `ctypedef` declaration, I am still unable to use `asarray`.

you may be able to use asarray, if you make a memoryview of your pointer first....

This, unfortunately, doesn't seem to be able to be handled by memoryviews. The mere presence of the pointer in the c struct, regardless of whether or not I declare that member in my cython typedef, seems to result in errors..

Thanks again for your help. I'll definitely make use of the "u8" pointer replacement.

Reply all

Reply to author

Forward