How to prevent garbage collecting an object?

1,696 views
Skip to first unread message

Zak

unread,
Jun 24, 2013, 11:37:58 AM6/24/13
to cython...@googlegroups.com
Hello All,

I want to dynamically generate a Python dictionary, with two caveats:

1. The Python dictionary should not be accessible from interpreted Python (only accessible within the Cython module).

2. The Python dictionary should NOT be garbage collected.

First, I tried this (inside a Cython module, my_code.pyx):

cdef object make_dict():
    the_dict = {'foo': 'bar'}
    return the_dict

def do_stuff():
    the_dict = make_dict()
    # I'm worried the_dict will be garbage collected right here
    # I want to keep it and use it, for example:
    return the_dict['foo']

The code above works, but I am afraid the_dict will be garbage collected. I am not sure how Cython works, but it seems like the function make_dict() may be returning a pointer to a Python dictionary. Once make_dict() finishes and returns, it seems like the memory actually storing the dictionary may be garbage collected (freed), leaving us with a dangling pointer. In theory, the last line of do_stuff() might be a segmentation fault. In practice, my code above seems to work, but I am afraid it is just because garbage collection hasn't happened yet.

Is the following code safer?

cdef object THE_DICT

cdef void make_dict(output_dict):
    output_dict = {'foo': 'bar'}

make_dict(THE_DICT)

def do_stuff():
    return THE_DICT['foo']

Basically, I am asking how Cython garbage collection works. Is the_dict eligible for garbage collection at any time in the first code snippet? What about the second code snippet? What are the rules? It looks like Cython uses reference counting garbage collection, and I have looked at the generated C code, but I am not sure I understand what I see.

Is there some explicit way to tell Cython "do not garbage collect this"?

Thank you,

Zak

Chris Barker - NOAA Federal

unread,
Jun 24, 2013, 12:22:32 PM6/24/13
to cython...@googlegroups.com
On Mon, Jun 24, 2013 at 8:37 AM, Zak <cyt...@m.allo.ws> wrote:

> I want to dynamically generate a Python dictionary, with two caveats:
>
> 1. The Python dictionary should not be accessible from interpreted Python
> (only accessible within the Cython module).
>
> 2. The Python dictionary should NOT be garbage collected.

never, ever? or simply not when you still need it?

> First, I tried this (inside a Cython module, my_code.pyx):
>
> cdef object make_dict():
> the_dict = {'foo': 'bar'}
> return the_dict

I'd probably do "cdef dict make_dict():" -- why not tel Cython that
this is always return a dict?

> def do_stuff():
> the_dict = make_dict()
> # I'm worried the_dict will be garbage collected right here
> # I want to keep it and use it, for example:
> return the_dict['foo']
>
> The code above works, but I am afraid the_dict will be garbage collected. I
> am not sure how Cython works, but it seems like the function make_dict() may
> be returning a pointer to a Python dictionary.

yes.

> Once make_dict() finishes and
> returns, it seems like the memory actually storing the dictionary may be
> garbage collected (freed), leaving us with a dangling pointer.

no -- Cython uses Python's reference counting for python objects --
that's part of the point, you still get Python's memory management.

So when
the_dict = make_dict()

the dict's reference count is increased, so it won't get cleared out
until that reference goes away.

You might want to play with calling sys.getrefcount() in various
places to watch what happens:

"""
sys.getrefcount(object)

Return the reference count of the object. The count returned is
generally one higher than you might expect, because it includes the
(temporary) reference as an argument to getrefcount().
"""

> I am afraid it is just because garbage
> collection hasn't happened yet.

Python uses a reference counting scheme, so objects are deleted as
soon as their reference count goes to zero.

> Is the following code safer?
>
> cdef object THE_DICT
>
> cdef void make_dict(output_dict):
> output_dict = {'foo': 'bar'}
>
> make_dict(THE_DICT)
>
> def do_stuff():
> return THE_DICT['foo']

does this even work? In:

cdef void make_dict(output_dict):
output_dict = {'foo': 'bar'}

you are pasing ouput_dict in to the function, but then in:

output_dict = {'foo': 'bar'}

you are assigning a NEW dict to the name ouput__dict -- so you would
not have changed the dict passed in.

If you really want to do this, you need to mutate the dict passed in,
rather than making a new one:

cdef void make_dict(output_dict):
ouput_dict.clear()
output_dict['foo'] = 'bar'


But this would let you have only a single dict in the module
namespace, and I suspect you're trying to solve a problem you don't
have.


> Is there some explicit way to tell Cython "do not garbage collect this"?

I'm not sure about that, tough you could explicitly increase the
reference count -- ugly hack.

if it's a cdef class attribute it won't get deleted as long as the
class instance is there. note that in that case, and with your module
attribute: THE_DICT above, the cdef call simply tells Cython that
there should be that name there with that type (creating a pointer)
the actual object still needs to be created somewhere. that could be
on the same line, but it's an additional operation:

cdef dict THE_DICT = {}

HTH,
-Chris



--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris....@noaa.gov

Zak

unread,
Jun 24, 2013, 2:58:06 PM6/24/13
to cython...@googlegroups.com
Thank you, Chris, your reply was very helpful in several respects.

1. I had tried the first code snippet, and it did work, I was just afraid that I was relying on undefined behavior of the garbage collector. Thank you for explaining more fully how the garbage collector works. I am going to continue using the first method.

2. Thank you for introducing me to sys.getrefcount(object), that is a useful tool.

3. I didn't know that Cython allowed static type declarations for Python types like 'dict', I though Cython only allowed C types (int, float, double) and one special all-encompassing Python type, namely 'object'. I just played around with functions like this:

cpdef dict my_func(int x):
    return x

Unfortunately, the code above compiles just fine. Ideally, I think it should be a compiler error. You can import the module, but when you call my_func(5) there is a run-time error, a TypeError. So, Cython does not enforce type checking at compile-time for Python types. Cython only does compile-time type checking for C types, it seems.

4. I had not actually tried to compile or run the second code snippet. It compiles, but it does not run as I expected because THE_DICT is equal to None, and assigning it inside a function does not mutate the global variable. I wasn't thinking clearly when I wrote the code snippet.

Thanks again,

Zak

Chris Barker - NOAA Federal

unread,
Jun 24, 2013, 3:49:08 PM6/24/13
to cython...@googlegroups.com
On Mon, Jun 24, 2013 at 11:58 AM, Zak <cyt...@m.allo.ws> wrote:

> 3. I didn't know that Cython allowed static type declarations for Python
> types like 'dict', I though Cython only allowed C types (int, float, double)
> and one special all-encompassing Python type, namely 'object'. I just played
> around with functions like this:
>
> cpdef dict my_func(int x):
> return x
>
> Unfortunately, the code above compiles just fine. Ideally, I think it should
> be a compiler error.

you'd think -- but that would require some smarts for the general case
-- i.e. static type analysis -- though maybe these simple cases could
be addressed.

If you take a look at the annotated (html) generated code for this:

cython -a the_file.pyx

you'll see that the type checking is done on the "return" line -- so
yes, at run time.

But it does mean that cython code calling this function can count on
it return ing a dict, which could be helpful.

-CHB

Zak

unread,
Jun 24, 2013, 6:01:40 PM6/24/13
to cython...@googlegroups.com
On Monday, June 24, 2013 3:49:08 PM UTC-4, Chris Barker - NOAA Federal wrote:
But it does mean that cython code calling this function can count on
it return ing a dict, which could be helpful.

 I am not sure what you mean. I have tried a few things, and I always get the error at run-time. Cython is certainly doing type checking, but if the expected type (dict) and the actual type (Python integer) are both Python types and not C types, it seems the error always appears at run-time, never at compile-time. For instance, this compiles fine:

cdef dict first_func(int x):
    return x

cdef int second_func(int x):
    cdef dict y
    y = first_func(x)
    return 5

The code above causes a run-time error, but I feel that in an ideal world it should be a compile-time error.

Thanks,

Zak

Chris Barker - NOAA Federal

unread,
Jun 24, 2013, 7:16:50 PM6/24/13
to cython...@googlegroups.com
On Mon, Jun 24, 2013 at 3:01 PM, Zak <cyt...@m.allo.ws> wrote:
> On Monday, June 24, 2013 3:49:08 PM UTC-4, Chris Barker - NOAA Federal

> cdef dict first_func(int x):
> return x
>
> cdef int second_func(int x):
> cdef dict y
> y = first_func(x)
> return 5
>
> The code above causes a run-time error, but I feel that in an ideal world it
> should be a compile-time error.

indeed -- I agree -- I was just suggesting that there may still be a
benefit to typing cdef functions to a particular python type, even if
you don't get the full compile time checking you could theoretically
get.

-Chris

Robert Bradshaw

unread,
Jun 24, 2013, 11:15:29 PM6/24/13
to cython...@googlegroups.com
On Mon, Jun 24, 2013 at 3:01 PM, Zak <cyt...@m.allo.ws> wrote:
Yes, it should be.

- Robert

Stefan Behnel

unread,
Jun 25, 2013, 12:17:31 AM6/25/13
to Cython-devel, cython...@googlegroups.com
Robert Bradshaw, 25.06.2013 05:15:
> On Mon, Jun 24, 2013 at 3:01 PM, Zak wrote:
>> I have tried a few things, and I always get
>> the error at run-time. Cython is certainly doing type checking, but if the
>> expected type (dict) and the actual type (Python integer) are both Python
>> types and not C types, it seems the error always appears at run-time, never
>> at compile-time. For instance, this compiles fine:
>>
>> cdef dict first_func(int x):
>> return x
>>
>> cdef int second_func(int x):
>> cdef dict y
>> y = first_func(x)
>> return 5
>>
>> The code above causes a run-time error, but I feel that in an ideal world it
>> should be a compile-time error.
>
> Yes, it should be.

Agreed. Cython has inherited this behaviour from Pyrex which originally
only knew "object", and we didn't do much about it since. There are rather
fuzzy limits to this, though. For example, this would be stupid but legal
wrt. language semantics:

cdef dict func():
return None

cdef list x = func()

So, the only case that we can really handle is when we know the typed value
originated from a C type that cannot coerce to None nor to the expected
type, i.e. essentially this case:

cdef int someting = 5
cdef dict x = something

whereas only slightly more involved code would end up passing through the
analysis, at least for now:

cdef int something = 5
cdef dict x = something + 999 # unknown integer size => object

I.e., this can only be handled during coercion of C types to known
(builtin) Python object types, not during assignment - although we do have
the may_be_none() predicate for nodes, and although there's still Vitja's
pending inference rewrite which I didn't have time to look into any
recently. Both can be used to push the fuzzy border a bit further into the
right direction.

Stefan

Reply all
Reply to author
Forward
0 new messages