Re: [cython-users] Wrapping C structure for access from python

3,899 views
Skip to first unread message

Robert Bradshaw

unread,
Jun 26, 2012, 12:56:08 AM6/26/12
to cython...@googlegroups.com
On Mon, Jun 25, 2012 at 5:48 PM, Sarvi Shanmugham <sarv...@gmail.com> wrote:
> I've done some extensive reading for this but can't seem to figure out the
> following.
> Need some help
>
> I am trying to generate a python wrapper around a C library for access from
> python
>
> ------------------------
> // test.h C Structure
> typedef struct {
>     int hellovar;
> } substuct_t;
>
> typedef struct {
>     int    intvar;
>     char *charptr;
>     char chararr[255];
>     substruct_t substruct;
> } mainstruct_t
>
> mainstruct_t *examplefunc(mainstruct_t *param);
> ----------------------------
> //test.pyx
> cdef extern import "test.h":
>     ctypedef struct substruct_t:
>         int hellovar
>
>     ctypedef struct mainstruct_t:
>         char *charptr
>         char chararr[255]
>         substruct_t substruct
>
>     mainstruct_t *examplefunc_c(mainstruct_t *param1, mainstruct_t *param2)
>
> cdef class mainstruct(object):
>     cdef mainstruct_t ms
>
> //* I have figured out the __init__(), __dealloc__() to possibly malloc and
> initialize the structures
>
>       def __getattr__(self,name):
>            // Need to figure how to implement this
>
>       def __setattr__(self, name, value):
>            // Need to figure how to implement this
>
>       def examplefunc(self, param2):
>           mainstruct_t ms=examplefunc_c(self.ms, param2.ms)
>           // somehow convert the returned C mainstruct data structure to a
> python "mainstruct" extension type
>           return mainstruct(ms)
>
> I can't seem to figure out how to provide read/write access the fields in
> "ms"??????????
>
> I suspect the following functions figure some how. Don't understand how
> though.
> "value" below would be a python object suspect and I need to set the fields
> in "self.ms", say "self.ms.charptr"
> I suspect I need to deal with malloc as well as converting python string
> buffers to C buffers and stuff?
> Also, how do you suggest I deal with providing access to the substructure
> "self.ms.substruct"

How about

def __getattr__(self, name):
if name == 'charptr':
return self.ms.charptr # or whatever conversion is better
elif name == 'chararray':
...

In Python you'd just use obj.charptr (or perhaps a more descriptive
name). As for substruct, you could either create a cdef class wrapping
substruct_t using the same principles, or add "attributes" to
mainstruct that delegate to the substruct directly, i.e. the user
would type obj.substructmember rather than obj.substruct.member.

- Robert

Stefan Behnel

unread,
Jun 26, 2012, 3:01:12 AM6/26/12
to cython...@googlegroups.com
Hi,

please don't top-post.

Sarvi Shanmugham, 26.06.2012 08:41:
>>> typedef struct {
>>> int intvar;
>>> char *charptr;
>>> char chararr[255];
>>> substruct_t substruct;
>>> } mainstruct_t
>
> I think I understand def __getattr__(self, name)
> It llooks like returning the self.ms.charptr or self.ms.chararr gets
> automatically converted to an equivalent python object.
>
> How about def __getattr__(self, name, value) ???

I assume you mean "__setattr__" here.


> How should this be implemented?
> doing
> self.ms.charptr = value
> OR
> self.ms.chararr = value
> does not work

"does not work" is a very incomplete error description. How is it not
working and what exact error do you get?

You might want to read this:

http://docs.cython.org/src/tutorial/strings.html

Stefan

Robert Bradshaw

unread,
Jun 26, 2012, 3:01:39 AM6/26/12
to cython...@googlegroups.com
On Mon, Jun 25, 2012 at 11:41 PM, Sarvi Shanmugham <sarv...@gmail.com> wrote:
> Thanks Robert.
> I think I understand def __getattr__(self, name)
> It llooks like returning the self.ms.charptr or self.ms.chararr gets
> automatically converted to an equivalent python object.
>
> How about def __getattr__(self, name, value) ???
>    How should this be implemented?
>     doing
>           self.ms.charptr = value
>         OR
>           self.ms.chararr = value
>    does not work
>
> Any pointers or reading material?

See http://docs.cython.org/ and specifically
http://docs.cython.org/src/tutorial/strings.html if you want to handle
unicode.

For chrarr, you need to strncpy or memcpy the data. For charptr, you
can assign, but be sure to keep a reference to value around as its
bound to the lifetime of the pointer. You might want to strcpy here
too.

Robert Bradshaw

unread,
Jun 26, 2012, 2:13:43 PM6/26/12
to cython...@googlegroups.com
On Tue, Jun 26, 2012 at 10:44 AM, Sarvi Shanmugham <sarv...@gmail.com> wrote:
> Thanks Robert,
>         I had read through quite a bit of the document and other blogs
> before I had asked the question.
> With help from this thread and some experimenting I now understand the
> solution for simple attributes like intvar, and charptr and chararr.
> I guess the key to making __setattr__ work was typecasting.
> Both read and write works well.
>
> I am still yet to figure out how to expose substruct in this way? I am still
> experimenting on this. Any help would be appreciated.

Instead of exposing substruct, why not expose its fields directly, e.g.

cdef struct substruct:
int foo

def __getattr__(self, name):
....
elif name == "foo": # or "substruct_foo" or whatever
return self.ms.substruct.foo

Otherwise you have to worry about copy vs reference in returning a
substruct value .

> I promise to write a detailed blog post on the topic once I figure it out.
>
> Solution posted to help future posters
>            if name == 'intvar':
>                return self.ms.intvar
>            elif name == 'charptr':
>                return self.ms.charptr
>            elif name == 'chararr':
>                return self.ms.chararr
>            elif name == 'substruct':
>                ???????????????
>                // How is this section work?
>                return substruct(self.ms.substruct)
>                ????????????
>            else:
>                raise AttributeError()
>
>       def __setattr__(self, name, value):
>            if name == 'intvar':
>                self.ms.intvar = <int>
>            elif name == 'chararr':
>                strncpy(self.ms.chararr,<char*>value,254)
>            elif name == 'charptr':
>                if self.ms.charptr:
>                    free(self.ms.charptr)
>                self.ms.charptr=strdup(<char*>value)
>                if not self.ms.charptr:
>                    raise MemoryError()
>            else:
>                raise AttributeError()
>            // Need to figure how to implement this
>
>       def examplefunc(self, param2):
>           mainstruct_t ms=examplefunc_c(self.msparam2.ms)
>           // somehow convert the returned C mainstruct data structure to a
> python "mainstruct" extension type
>           return mainstruct(ms)

To return it, you must create a new mainstruct object and then set its
ms field.

Stefan Behnel

unread,
Jun 26, 2012, 3:11:43 PM6/26/12
to cython...@googlegroups.com
Sarvi Shanmugham, 26.06.2012 19:44:
> With help from this thread and some experimenting I now understand the
> solution for simple attributes like intvar, and charptr and chararr.
> I guess the key to making __setattr__ work was typecasting.
> Both read and write works well.
>
> I am still yet to figure out how to expose substruct in this way? I am
> still experimenting on this. Any help would be appreciated.

As a general remark, I suggest that you think some more about the Python
API that you want to expose. I mean, what you currently do may make perfect
sense and I really can't tell without knowing your actual project setting,
but it looks like what you're writing is a very thin 1:1 wrapper around a C
API, with a rather low-level set of features.

It's often better to provide your users with more high-level functionality
that you implement by combining features of the underlying C API. Not every
field of the structs you are wrapping might be interesting for Python
programmers and if a smaller set of fields makes a smaller API then that
means less for them to learn. For example, your users may prefer passing
keyword arguments into a function rather than creating an object and
setting attributes on it. C and Python work very differently and Cython
allows you to make the Python API very pythonic by abstracting from the C
API and hiding its complexity behind something that really looks like Python.

I know, it's hard to invest thoughts into these things when you're still
struggling with the basics, but I'm sure you'll quickly notice that it's
worth it.

Stefan

Robert Bradshaw

unread,
Jun 26, 2012, 9:14:20 PM6/26/12
to cython...@googlegroups.com
On Tue, Jun 26, 2012 at 5:30 PM, Sarvi Shanmugham <sarv...@gmail.com> wrote:
> I understand.
> We have an interface generator that takes an interface definition in custom
> Interface Definition Language and generates C code and headers that get
> compiled into a hared library.
> I am trying to generate a python wrapper around that library. So I am not
> have info ahead of time what semantics are good or what fields are
> unnecessary or how deep the sub structure nesting goes.

So are you generating the .pyx files as well in that case?

> So I would like them to be read/write as follows, where z is substruct of y
> is a substruct of x
> x.y.z='hello'
>
> As long as X is malloced Y and Z can point to within it.
>
> if a C Struct is as follows
> typedef struct subsubstruct {
>     char chararr[255];
> } subsubstruct_t;
>
> typedef struct substruct {
>     substruct_t subsubstruct;
> } substruct_t;
>
> typedef struct mainstruct {
>     substruct_t substruct;
> } mainstruct_t;
>
>
> When wrapped with a python wrapper I want the folloing python API
> x=mainstruct()   #Mallocs mainstruct_t or can point static definition of
> mainstruct defined within the C library
> x.substruct.subsubstruct.chararr = 'Hello World'
> print x.substruct.subsubstruct.chrarr   # this prints 'Hello World'
> y=x.substruct
> y.subsubstruct.chararr = 'This overwrites Hello World'
> print y.subsubstruct.chararr  # this prints 'This overwrites Hellow World'
> z=y.subsubstruct
> z.chararr = 'This overwrites again'
> print x.chararr   # this prints 'This overwrites again'
> x=None
> y=None
> z=None   # Frees mainstruct IF it was malloced

You've hit the nail on the head in terms of the memory management issues.

> I am trying to use cython for the embedded development space, where everyone
> is using C.
> And I am trying to demonstrate that a blend of Python/Cython can be much
> more productive
>
> At this point though, I am quite frustrated and have half a mind to switch
> to using ctypes for this wrapper.

If you're trying to provide a very thin wrapper, ctypes might be a
better fit, though it won't help with the memory issues (i.e. you'll
either end up doing too much copying or having to be very careful that
you don't segfault).

> Atleast the C data structure to python data structure mapping seems quite
> straighforward for this case.
>
> Considering one of Cython's goals is to build python wrappers around C
> libraries, I am surprised this is so difficult.

Typically one doesn't try to mirror the C library interface, as it's
often not very natural to do in Python.

What you might be interested in is the fact that Cython does automatic
dict <-> struct conversion. Thus if one had a C function "structA
cfoo(structB*)" one could simply write

def foo(**kwds):
cdef structB value_c = kwds
return cfoo(&value_c)

And use it as

print foo(int_value=3, substruct=dict(int_field=5, charptr="abc"))

rather than worry about building wrappers for each of these struct
objects with complicated allocation characteristics.

> It is also very likely, that it is because I haven't got the Cython concepts
> straight enough in my head yet.
>
> In ctypes I believe the following would have provided the sort of python API
> I am looking for
>
> class subsubstruct_t(Structure):
>     _fields_=[
>         ('chararr', c_char *255)
>      ]
>
> class substruct_t(Structure):
>     _fields_=[
>         ('subsubstruct', subsubstruct_t)
>      ]
>
> class mainstruct_t(Structure):
>     _fields_=[
>         ('mainstruct', substruct_t)
>      ]
>
>
> Sarvi

Sarvi Shanmugham

unread,
Jun 27, 2012, 2:37:54 AM6/27/12
to cython...@googlegroups.com
A couple of questions.
1. If the following is the C data structure
typedef struct subsubstruct {
    char chararr[5];
    char *charptr;
} subsubstruct_t;

typedef struct substruct {
    substruct_t subsubstruct;
    substruct_t subsubstructarr[10]
} substruct_t;

typedef struct mainstruct {
    substruct_t substruct;
} mainstruct_t;

Is the following possible? 
How does this deal with char chararr[5] and char *charptr which could be null?
Can it deal with subsubstructarr above? Does it get translated to a list of dicts?
What if the dict being assigned does not have all fields in the structure hierachy?
cdef mainstruct_t mainstruct={ 
                              'substruct': { 
                                  'subsubstruct': { 
                                      'chararr': 'Hello World chararr',
                                      'charptr': 'Hello World charptr
                                   }
                                }
                            }

Some responses inline

On Tuesday, June 26, 2012 6:14:20 PM UTC-7, Robert Bradshaw wrote:
On Tue, Jun 26, 2012 at 5:30 PM, Sarvi Shanmugham <sarv...@gmail.com> wrote:
> I understand.
> We have an interface generator that takes an interface definition in custom
> Interface Definition Language and generates C code and headers that get
> compiled into a hared library.
> I am trying to generate a python wrapper around that library. So I am not
> have info ahead of time what semantics are good or what fields are
> unnecessary or how deep the sub structure nesting goes.

So are you generating the .pyx files as well in that case?
Yes. Currently the IDL generator generates C code that get compiled into libcompA.so 
I plan to modify the idl generator to generate .pyx files that can be compiled to a python wrapcompA.so that can interface with the libcompA.so
Or I could generated ctypes based .py files to do the same.
From API perspective being able to 
x=mainstruct() and being able to access it as x.substruct.subsubstruct.chararr seems cleaner, than x['substruct']['subsubstruct']['chararr']
 
From what I can tell exposing struct hierarchy shouldn't be difficult and clean with the following logic
1. Use cdef class to model each struct with a pointer to the C struct inside it.
2. Allow these struct classes to be initialized as standalone where they malloc and free or be point to the substruct within its parent struct class, in which case they also hold a reference to the instance of the parent struct class instance as well. This gaurantees that the parent/top level struct class doesn't get freed until all references to itself or its substructs are refcounted/freed down. 

The trouble API to set each attribute within the substruct to achieve x.substruct.subsubstruct.chararr looks like a lot of work
 

> I am trying to use cython for the embedded development space, where everyone
> is using C.
> And I am trying to demonstrate that a blend of Python/Cython can be much
> more productive
>
> At this point though, I am quite frustrated and have half a mind to switch
> to using ctypes for this wrapper.

If you're trying to provide a very thin wrapper, ctypes might be a
better fit, though it won't help with the memory issues (i.e. you'll
either end up doing too much copying or having to be very careful that
you don't segfault).
 
If I write Ctypes Code to define the struct definintion, will cython compile it? 

Robert Bradshaw

unread,
Jun 27, 2012, 12:27:07 PM6/27/12
to cython...@googlegroups.com
On Tue, Jun 26, 2012 at 11:37 PM, Sarvi Shanmugham <sarv...@gmail.com> wrote:
> A couple of questions.
> 1. If the following is the C data structure
> typedef struct subsubstruct {
>     char chararr[5];
>     char *charptr;
> } subsubstruct_t;
>
> typedef struct substruct {
>     substruct_t subsubstruct;
>     substruct_t subsubstructarr[10]
> } substruct_t;
>
> typedef struct mainstruct {
>     substruct_t substruct;
> } mainstruct_t;
>
> Is the following possible?
> How does this deal with char chararr[5] and char *charptr which could be
> null?

It doesn't. There is, of course, nothing about this automatic
conversion that you couldn't generate yourself.

> Can it deal with subsubstructarr above? Does it get translated to a list of
> dicts?

I don't remember if this was finally implemented, but you'd have to
make sure there aren't any un-initialized entries.

> What if the dict being assigned does not have all fields in the structure
> hierachy?

That's an error.
True. One could perhaps more easily do x.substruct_subsubstruct_chararr.

> From what I can tell exposing struct hierarchy shouldn't be difficult and
> clean with the following logic
> 1. Use cdef class to model each struct with a pointer to the C struct inside
> it.
> 2. Allow these struct classes to be initialized as standalone where they
> malloc and free or be point to the substruct within its parent struct class,
> in which case they also hold a reference to the instance of the parent
> struct class instance as well. This gaurantees that the parent/top level
> struct class doesn't get freed until all references to itself or its
> substructs are refcounted/freed down.

Yes, this would do the job

> The trouble API to set each attribute within the substruct to achieve
> x.substruct.subsubstruct.chararr looks like a lot of work

You could introduce a shorthand mystruct.set(**kwds) and accept
keywords in the constructor to ease the burden so one wouldn't have to
write

x = mystrict()
x.a = ...
x.b = ...
x.y.c = ...
...

>> > I am trying to use cython for the embedded development space, where
>> > everyone
>> > is using C.
>> > And I am trying to demonstrate that a blend of Python/Cython can be much
>> > more productive
>> >
>> > At this point though, I am quite frustrated and have half a mind to
>> > switch
>> > to using ctypes for this wrapper.
>>
>> If you're trying to provide a very thin wrapper, ctypes might be a
>> better fit, though it won't help with the memory issues (i.e. you'll
>> either end up doing too much copying or having to be very careful that
>> you don't segfault).
>
>
> If I write Ctypes Code to define the struct definintion, will cython compile
> it?

No, but this has been a longstanding todo feature.

Stefan Behnel

unread,
Jun 27, 2012, 4:34:08 PM6/27/12
to cython...@googlegroups.com
Robert Bradshaw, 27.06.2012 18:27:
> On Tue, Jun 26, 2012 at 11:37 PM, Sarvi Shanmugham wrote:
>> If I write Ctypes Code to define the struct definintion, will cython compile
>> it?
>
> No, but this has been a longstanding todo feature.

Although, under the assumption that the cffi package is eventually going to
get to a point where it's usable and being used, supporting cffi instead of
ctypes might actually be preferable because it aims for more statically
exploitable features than the pure runtime ABI oriented ctypes package.

Stefan

Sarvi Shanmugham

unread,
Jun 30, 2012, 2:44:23 AM6/30/12
to cython...@googlegroups.com
I would love to use Cython for wrapping the C API. 
But based on my experience with it so far, wrapping complex C structures is not even remotely easy.
And I think it should be really easy, if we really want cython to be a strong contentnder for wrapping C libraries.

The amount of code needed to properly wrap a complex struct of struct C data structure is not simple in Cython. 
And any non-trivial C library uses these kind of structure of structures all the time. 


Sarvi
 

Stefan

Robert Bradshaw

unread,
Jun 30, 2012, 3:18:20 AM6/30/12
to cython...@googlegroups.com
There's a significant difference between "creating a Pythonic wrapper
of a non-trivial C library" and "creating a Python mirror of a
non-trivial C API." Cython is very good at the former, at least in my
experience (and I've both wrapped libraries and, more commonly,
created new functionality that leverages non-trivial C libraries), but
you're trying to do the latter.

That being said, we should make this better. It shouldn't be too hard
to automatically generate classes corresponding to structs, e.g.

struct A {
int x;
struct B b;
};

could become

cdef class A_py:
cdef A *ptr
cdef A value
cdef object ref
def __cinit__(self, **kwds):
self.ptr = &a
[populate value fields based on kwds]
def __getattr__(self, name):
if name == 'x':
return ptr.x
elif name == 'b':
return wrap_B(&ptr.b, self)
def __setattr__(self, name, value):
if name == 'x':
ptr.x = value
elif name == 'b':
if value instanceof B_py:
ptr.b = (<B_py>value).value
else:
ptr.b = value # try coercion

cdef A_class wrap_A(A* a, ref):
cdef A_py a_py = A_py()
a_py.ptr = a
a_py.ref = ref
return a_py

Arrays would turn into classes with __getitem__ rather than
__getattr__ but otherwise the same (perhaps char arrays would be
handled specially). Pointers could be trickier, but could be solved in
a manner similar to arrays by keeping a reference to a python wrapper
around (it's unclear how many contiguous copies of X an X* should
have, but this could be decided dynamicallly). All in all, given a
nice programatic description of the structs to wrap, it sounds like
maybe a days work.

- Robert

Stefan Behnel

unread,
Jun 30, 2012, 3:58:30 AM6/30/12
to cython...@googlegroups.com
Robert Bradshaw, 30.06.2012 09:18:
> On Fri, Jun 29, 2012 at 11:44 PM, Sarvi Shanmugham wrote:
>> I would love to use Cython for wrapping the C API.
>> But based on my experience with it so far, wrapping complex C structures is
>> not even remotely easy.

Well, it *is* easy. It's just that mapping it 1:1 to a Python API obviously
requires an amount of code that scales linearly with the size of the C API.
Thus my advice to simplify the Python level API in order to reduce both the
amount of (trivial) Cython code in the wrapper and the amount of Python
code for the usage. The general idea is to make the wrapper smart,
especially if the C API is dumb.


>> And I think it should be really easy, if we really want cython to be a
>> strong contentnder for wrapping C libraries.
>>
>> The amount of code needed to properly wrap a complex struct of struct C data
>> structure is not simple in Cython.
>> And any non-trivial C library uses these kind of structure of structures all
>> the time.

That's exaggerating a bit. Almost all C libraries I know get along quite
well without this complexity. Personally, I find the parts of the C API
that you showed so far a bit questionable (again, without actually knowing
what they do).


> There's a significant difference between "creating a Pythonic wrapper
> of a non-trivial C library" and "creating a Python mirror of a
> non-trivial C API." Cython is very good at the former, at least in my
> experience (and I've both wrapped libraries and, more commonly,
> created new functionality that leverages non-trivial C libraries)

+1


> but you're trying to do the latter.
>
> That being said, we should make this better. It shouldn't be too hard
> to automatically generate classes corresponding to structs, e.g.

Yes, and that wouldn't have to be done by Cython at all. It's better to use
an external code generator for this. Even without parsing C header files,
writing down an easily exploitable text representation of the structs (e.g.
in JSON or YAML) and implementing a Cython code generator for it should be
really quick and trivial. Then, if they need to be extensible, use
inheritance to extend them.

Using a dedicated (JSON-/YAML-/...-)DSL for this instead of parsing header
files will also make it easy to extend that language at any time to tweak
the code generation.


> struct A {
> int x;
> struct B b;
> };
>
> could become
>
> cdef class A_py:
> cdef A *ptr
> cdef A value
> cdef object ref
> def __cinit__(self, **kwds):
> self.ptr = &a
> [populate value fields based on kwds]
> def __getattr__(self, name):
> if name == 'x':
> return ptr.x
> elif name == 'b':
> return wrap_B(&ptr.b, self)

I'd even generate separate properties for the fields instead of going
through __getattr__() and friends.


> def __setattr__(self, name, value):
> if name == 'x':
> ptr.x = value
> elif name == 'b':
> if value instanceof B_py:

Someone's been writing a lot of non-Python code lately...

> ptr.b = (<B_py>value).value
> else:
> ptr.b = value # try coercion
>
> cdef A_class wrap_A(A* a, ref):
> cdef A_py a_py = A_py()
> a_py.ptr = a
> a_py.ref = ref
> return a_py
>
> Arrays would turn into classes with __getitem__ rather than
> __getattr__ but otherwise the same (perhaps char arrays would be
> handled specially). Pointers could be trickier, but could be solved in
> a manner similar to arrays by keeping a reference to a python wrapper
> around (it's unclear how many contiguous copies of X an X* should
> have, but this could be decided dynamicallly). All in all, given a
> nice programatic description of the structs to wrap, it sounds like
> maybe a days work.

Absolutely.

Stefan
Reply all
Reply to author
Forward
0 new messages