Borrowed reference to cdef class?

389 views
Skip to first unread message

Sturla Molden

unread,
Jan 6, 2015, 4:37:44 PM1/6/15
to cython...@googlegroups.com

Basically, how do I do that? I cannot find it in the documentation.

Say I have a cdef class like this:

cdef class FooBar:
cdef:
double x
double y

I can get a borrowed reference to this object by doing:

from cpython.ref cimport PyObject
cdef FooBar foobar = Foobar()
cdef PyObject *foobar_ptr = <PyObject*> foobar
<...>

This is all well except I cannot access the fields x and y from
foobar_ptr because it is the wrong pointer type.

So how can I do this? Typecast to FooBar*? Or does Cython support this
at all?

Casting back too FooBar does not help,

cdef FooBar foofoo = <FooBar> foobar_ptr

because now foofoo is not a borrowed reference.

Are there other tricks that can be used?

Is it e.g. possible to infer the C name of the C struct with which
FooBar is implemented?


This is by the way for scipy.spatial.cKDTree. It still has numerous
possibilities for leaking memory, and reference counting is not
acceptable in the algorithmic code. Currently it uses C structs which
are allocated by malloc and this is the source of the problem. If I
could make borrowed references from a cdef class, then the allocations
could be replaced by refcounted Python objects, but the algorithmic code
would still run without reference counting happening everywhere.


Sturla

























Robert Bradshaw

unread,
Jan 6, 2015, 5:44:08 PM1/6/15
to cython...@googlegroups.com
This is not currently supported. You could, however, put a C struct
with all the data in the cdef class and "borrow" a pointer to that.

- Robert

Sturla Molden

unread,
Jan 6, 2015, 6:17:14 PM1/6/15
to cython...@googlegroups.com
On 06/01/15 23:43, Robert Bradshaw wrote:

> This is not currently supported. You could, however, put a C struct
> with all the data in the cdef class and "borrow" a pointer to that.

No, because I cannot get to the C struct without owning a reference to
the cdef class. :-(


I was wondering if it could be done with some C magic though.

Say I have

cdef class Foobar:
cdef:
double x
double y

and then mirror cdef class Foobar in borrow.h:

#include <Python.h>

typedef struct {
PyObject_HEAD
double x;
double y;
} Borrow;


Now this should work I think:

cdef extern from "borrow.h":
struct Borrow:
double x
double y

cdef Foobar foobar = Foobar()
cdef Borrow *borrowed = <Borrow*> (<void*> foobar)

But of course this assumes that Cython implements the cdef class in the
"most obvious way", of which I am not sure...


Sturla






Robert Bradshaw

unread,
Jan 6, 2015, 8:00:05 PM1/6/15
to cython...@googlegroups.com
On Tue, Jan 6, 2015 at 3:15 PM, Sturla Molden <sturla...@gmail.com> wrote:
> On 06/01/15 23:43, Robert Bradshaw wrote:
>
>> This is not currently supported. You could, however, put a C struct
>> with all the data in the cdef class and "borrow" a pointer to that.
>
> No, because I cannot get to the C struct without owning a reference to the
> cdef class. :-(

How is that different than what you have below?

> I was wondering if it could be done with some C magic though.
>
> Say I have
>
> cdef class Foobar:
> cdef:
> double x
> double y
>
> and then mirror cdef class Foobar in borrow.h:
>
> #include <Python.h>
>
> typedef struct {
> PyObject_HEAD
> double x;
> double y;
> } Borrow;
>
>
> Now this should work I think:
>
> cdef extern from "borrow.h":
> struct Borrow:
> double x
> double y
>
> cdef Foobar foobar = Foobar()
> cdef Borrow *borrowed = <Borrow*> (<void*> foobar)
>
> But of course this assumes that Cython implements the cdef class in the
> "most obvious way", of which I am not sure...

It probably, usually, does. Even if you wanted to count on this
(current?) implementation detail it's a bit hackish. Why not

cdef struct FooStruct:
double x
double y

cdef class Foo:
FooStruct data

cdef Foo foo = Foo()
cdef FooStruct *foo_c = &foo.data

in your C code you would just pass around the bare FooStruct*. I
suppose you'd have a level of indirection for Python access, but it's
not too bad.

Sturla Molden

unread,
Jan 6, 2015, 10:32:15 PM1/6/15
to cython...@googlegroups.com
On 07/01/15 01:59, Robert Bradshaw wrote:
> It probably, usually, does. Even if you wanted to count on this
> (current?) implementation detail it's a bit hackish.

It seems to be ok. If the class has cdef methods there is a vtab pointer
in the struct as well though.

But yeah, it is hackish, and it depends on an implementation detail.


> Why not
>
> cdef struct FooStruct:
> double x
> double y
>
> cdef class Foo:
> FooStruct data
>
> cdef Foo foo = Foo()
> cdef FooStruct *foo_c = &foo.data
>
> in your C code you would just pass around the bare FooStruct*. I
> suppose you'd have a level of indirection for Python access, but it's
> not too bad.
>

I thought about this a lot, but it does not really solve my problem.

First a tree like cKDTree is constructed of nodes, and I need to be able
to access the child nodes in the algorithm.

The send thing is that we are using heaps as priority queues. I want to
do an incref when a node is pushed into the heap and a decref when it is
popped off. But I cannot have any refcounting happening inside the heap.

Right now scipy.spatial.cKDTree is written to use C structs:

https://github.com/scipy/scipy/blob/master/scipy/spatial/ckdtree.pyx

But the query method has been notorious for leaking memory. The main
problem is use of malloc inside a complex algorithm. I have spent a lot
of time trying to weed out the leaks, but it is really too complicated...

So no I want to let Python deal with the memory, but it cannot impact
the performance of the data structure.

Another thing is that it would be nice to support pickle, as well as
letting the tree be viewable from Python. This too points to using cdef
classes instead of plain C structs.



I am afraid to suggest this, but perhaps Cython needs a "borrowed"
keyword to indicate a borrowed reference, so we could

cdef object foo = Foo()
cdef borrowed object bfoo = foo

And the only thing borrowed would do is to turn off reference counting.


Another possibility would be if there was a compiler directive to
enforce a C name for a cdef class.

@cython.cname('CFoo')
cdef class Foo:
pass

cdef extern from *:
struct CFoo:
pass

cdef Foo foo = Foo()
cdef CFoo *cfoo = <CFoo*> (<void*> foo)



Sturla


Robert Bradshaw

unread,
Jan 6, 2015, 11:40:18 PM1/6/15
to cython...@googlegroups.com
On Tue, Jan 6, 2015 at 7:30 PM, Sturla Molden <sturla...@gmail.com> wrote:
> On 07/01/15 01:59, Robert Bradshaw wrote:
>> It probably, usually, does. Even if you wanted to count on this
>> (current?) implementation detail it's a bit hackish.
>
> It seems to be ok. If the class has cdef methods there is a vtab pointer in
> the struct as well though.
>
> But yeah, it is hackish, and it depends on an implementation detail.
>
>
>> Why not
>>
>> cdef struct FooStruct:
>> double x
>> double y
>>
>> cdef class Foo:
>> FooStruct data
>>
>> cdef Foo foo = Foo()
>> cdef FooStruct *foo_c = &foo.data
>>
>> in your C code you would just pass around the bare FooStruct*. I
>> suppose you'd have a level of indirection for Python access, but it's
>> not too bad.
>>
>
> I thought about this a lot, but it does not really solve my problem.
>
> First a tree like cKDTree is constructed of nodes, and I need to be able to
> access the child nodes in the algorithm.

Ah, so you want to traverse an entire data structure without
refcounting. Of course you'd need the GIL and refcounting to safely
modify it... (well, I suppose you could swap nodes or do other
refcount-preserving operations). You'd still want to make sure no one
else is concurrently touching your structure.

> The send thing is that we are using heaps as priority queues. I want to do
> an incref when a node is pushed into the heap and a decref when it is popped
> off. But I cannot have any refcounting happening inside the heap.
>
> Right now scipy.spatial.cKDTree is written to use C structs:
>
> https://github.com/scipy/scipy/blob/master/scipy/spatial/ckdtree.pyx
>
> But the query method has been notorious for leaking memory. The main problem
> is use of malloc inside a complex algorithm. I have spent a lot of time
> trying to weed out the leaks, but it is really too complicated...

I wonder if using a memory arena would make things much nicer here
(though wouldn't help with the other issues). Reminds me of a fellow
student of my that was trying to convert someone's binary to be used
as a library, but it just malloc'd stuff left and right and assumed
process exit was right around the corner...

> So no I want to let Python deal with the memory, but it cannot impact the
> performance of the data structure.
>
> Another thing is that it would be nice to support pickle, as well as letting
> the tree be viewable from Python. This too points to using cdef classes
> instead of plain C structs.
>
> I am afraid to suggest this, but perhaps Cython needs a "borrowed" keyword
> to indicate a borrowed reference, so we could
>
> cdef object foo = Foo()
> cdef borrowed object bfoo = foo
>
> And the only thing borrowed would do is to turn off reference counting.

This has been toyed with before, one possible syntax is

cdef Foo foo # refcounted
cdef Foo *foo_ref # borrowed, one could do Foo **fffoo as well.

Seeing how much people mess up on char* conversion, I'm a bit wary of
making this too easy, as borrowed references can lead to subtle
heisenbugs.

Note that just referencing fields does not require a refcount, thus you can do

cdef void* c_node = <void*>py_node
print (<Foo>c_node).child

> Another possibility would be if there was a compiler directive to enforce a
> C name for a cdef class.
>
> @cython.cname('CFoo')
> cdef class Foo:
> pass
>
> cdef extern from *:
> struct CFoo:
> pass
>
> cdef Foo foo = Foo()
> cdef CFoo *cfoo = <CFoo*> (<void*> foo)

You can use the archaic

cdef public class Foo [object FooC, type FooCType]:
int x
Foo child

(which should be available as a directive, but I'm not sure what) then write

cdef extern from *:
cdef struct FooC:
int x
FooC *child

Is that what you're looking for?

- Robert

Sturla Molden

unread,
Jan 6, 2015, 11:51:39 PM1/6/15
to cython...@googlegroups.com
On 07/01/15 05:39, Robert Bradshaw wrote:

> You can use the archaic
>
> cdef public class Foo [object FooC, type FooCType]:
> int x
> Foo child
>
> (which should be available as a directive, but I'm not sure what) then write
>
> cdef extern from *:
> cdef struct FooC:
> int x
> FooC *child
>
> Is that what you're looking for?


I have never seen this before, but it might be.


Sturla


Sturla Molden

unread,
Jan 7, 2015, 12:08:51 AM1/7/15
to cython...@googlegroups.com
This is fantastic!!!

Thank you :-D


Sturla



Sturla Molden

unread,
Jan 7, 2015, 4:41:43 AM1/7/15
to cython...@googlegroups.com
On 07/01/15 05:39, Robert Bradshaw wrote:

> I wonder if using a memory arena would make things much nicer here
> (though wouldn't help with the other issues).

I ended up doing both. Python object to construct the kd-tree, memory
pool inside the query function.

No speed penalty compared to the current version, but now we can inspect
the tree from Python, pickle the tree, and all memory leaks (and
possibilities of such) are gone.


> Reminds me of a fellow
> student of my that was trying to convert someone's binary to be used
> as a library, but it just malloc'd stuff left and right and assumed
> process exit was right around the corner...

That is approximately how the original cKDTree was written... It
malloc'ed eveywhere, and reclaimed about half of it... :-(

Sturla

Sturla Molden

unread,
Jan 7, 2015, 11:26:08 AM1/7/15
to cython...@googlegroups.com
Sturla Molden <sturla...@gmail.com> wrote:

>> I wonder if using a memory arena would make things much nicer here
>> (though wouldn't help with the other issues).
>
> I ended up doing both. Python object to construct the kd-tree, memory
> pool inside the query function.

https://github.com/scipy/scipy/pull/4374

Matthew Honnibal

unread,
Jan 11, 2015, 2:51:20 PM1/11/15
to cython...@googlegroups.com


https://github.com/scipy/scipy/blob/master/scipy/spatial/ckdtree.pyx

But the query method has been notorious for leaking memory. The main
problem is use of malloc inside a complex algorithm. I have spent a lot
of time trying to weed out the leaks, but it is really too complicated...

So no I want to let Python deal with the memory, but it cannot impact
the performance of the data structure.

The solution that I came up with for this is to use a memory pool tied to a Python object, but still use C structs:

 http://github.com/syllog1sm/cymem

I find this makes things quite easy --- and if there are remaining problems, you can check which pool is growing in size, so it's easy to find the remaining leaks.

Sturla Molden

unread,
Jan 11, 2015, 3:15:51 PM1/11/15
to cython...@googlegroups.com
On 11/01/15 20:51, Matthew Honnibal wrote:

> The solution that I came up with for this is to use a memory pool tied
> to a Python object, but still use C structs:

That is what I ended up doing too.

Sturla

Sturla Molden

unread,
Jan 11, 2015, 3:36:05 PM1/11/15
to cython...@googlegroups.com
Actually I ended up doing it in C++ (with RAII) because I needed inside
a function that should release the GIL, but a Cython cdef class worked
similarly (albeit slightly slower because of a virtual allocate function).

Sturla


struct nodeinfo {
nodeinfo *next;
nodeinfo *prev;
ckdtreenode *node;
npy_float64 side_distances[1]; // the good old struct hack
};

struct nodeinfo_pool {

nodeinfo pool;
nodeinfo *tail;
npy_intp extra;

nodeinfo_pool(npy_intp m) {
tail = &pool;
pool.next = NULL;
extra = m-1;
}

~nodeinfo_pool() {
nodeinfo *cur, *tmp;
cur = tail;
while (cur != &pool) {
tmp = cur->prev;
if (cur) delete [] ((char*)cur);
cur = tmp;
}
}

inline nodeinfo *allocate() {
nodeinfo* ni = NULL;
ni = (nodeinfo*)(new char[sizeof(nodeinfo) +
extra*sizeof(npy_float64)]);
ni->next = NULL;
ni->prev = tail;
tail->next = ni;
tail = ni;
return ni;
}
};




Reply all
Reply to author
Forward
0 new messages