[Python-ideas] weakrefs

Ethan Furman

unread,

May 17, 2012, 11:10:40 AM5/17/12

to python...@python.org

From the manual [8.11]:

> A weak reference to an object is not enough to keep the object alive:
> when the only remaining references to a referent are weak references,
> garbage collection is free to destroy the referent and reuse its
> memory for something else.

This leads to a difference in behaviour between CPython and the other
implementations: CPython will (currently) immediately destroy any
objects that only have weak references to them with the result that
trying to access said object will require making a new one; other
implementations (at least PyPy, and presumably the others that don't use
ref-count gc's) can "reach into the grave" and pull back objects that
don't have any strong references left.

I would like to have the guarantees for weakrefs strengthened such that
any weakref'ed object that has no strong references left will return
None instead of the object, even if the object has not yet been garbage
collected.

Without this stronger guarantee programs that are relying on weakrefs to
disappear when strong refs are gone end up relying on the gc method
instead, with the result that the program behaves differently on
different implementations.

~Ethan~
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas

Antoine Pitrou

unread,

May 17, 2012, 11:44:29 AM5/17/12

to python...@python.org

On Thu, 17 May 2012 08:10:40 -0700
Ethan Furman <et...@stoneleaf.us> wrote:
> From the manual [8.11]:
>
> > A weak reference to an object is not enough to keep the object alive:
> > when the only remaining references to a referent are weak references,
> > garbage collection is free to destroy the referent and reuse its
> > memory for something else.
>
> This leads to a difference in behaviour between CPython and the other
> implementations: CPython will (currently) immediately destroy any
> objects that only have weak references to them with the result that
> trying to access said object will require making a new one;

This is only true if the object isn't caught in a reference cycle.

> Without this stronger guarantee programs that are relying on weakrefs to
> disappear when strong refs are gone end up relying on the gc method
> instead, with the result that the program behaves differently on
> different implementations.

Why would they "rely on weakrefs to disappear when strong refs are
gone"? What is the use case?

Regards

Antoine.

Chris Kaynor

unread,

May 17, 2012, 1:13:15 PM5/17/12

to python...@python.org

On Thu, May 17, 2012 at 8:44 AM, Antoine Pitrou <soli...@pitrou.net> wrote:

On Thu, 17 May 2012 08:10:40 -0700
Ethan Furman <et...@stoneleaf.us> wrote:
> From the manual [8.11]:
>
> > A weak reference to an object is not enough to keep the object alive:
> > when the only remaining references to a referent are weak references,
> > garbage collection is free to destroy the referent and reuse its
> > memory for something else.
>
> This leads to a difference in behaviour between CPython and the other
> implementations: CPython will (currently) immediately destroy any
> objects that only have weak references to them with the result that
> trying to access said object will require making a new one;

This is only true if the object isn't caught in a reference cycle.

To further this, consider the following example, ran in CPython2.6:

>>> import weakref

>>> import gc

>>>

>>> class O(object):

... pass

...

>>> a = O()

>>> b = O()

>>> a.x = b

>>> b.x = a

>>>

>>> w = weakref.ref(a)

>>>

>>> del a, b

>>>

>>> print w()

<__main__.O object at 0x0000000003C78B38>

>>>

>>> gc.collect()

20

>>>

>>> print w()

None

Greg Ewing

unread,

May 17, 2012, 6:49:05 PM5/17/12

to python...@python.org

Ethan Furman wrote:
> I would like to have the guarantees for weakrefs strengthened such that
> any weakref'ed object that has no strong references left will return
> None instead of the object, even if the object has not yet been garbage
> collected.

Why do you want this guarantee? It would complicate
implementations for which ref counting is not the
native method of managing memory.

--
Greg

stoneleaf

unread,

May 18, 2012, 12:08:48 PM5/18/12

to python...@python.org

On May 17, 8:10 am, Ethan Furman wrote:
> From the manual [8.11]:
>
>> A weak reference to an object is not enough to keep the object alive:
>> when the only remaining references to a referent are weak references,
>> garbage collection is free to destroy the referent and reuse its
>> memory for something else.
>
> This leads to a difference in behaviour between CPython and the other
> implementations: CPython will (currently) immediately destroy any
> objects that only have weak references to them with the result that
> trying to access said object will require making a new one; other
> implementations (at least PyPy, and presumably the others that don't use
> ref-count gc's) can "reach into the grave" and pull back objects that
> don't have any strong references left.

Antione Pitrou wrote:
> This is only true if the object isn't caught in a reference cycle.

Good point -- so I would also like the proposed change in CPython as
well.

Ethan Furman wrote:
> I would like to have the guarantees for weakrefs strengthened such that
> any weakref'ed object that has no strong references left will return
> None instead of the object, even if the object has not yet been garbage
> collected.
>
> Without this stronger guarantee programs that are relying on weakrefs to
> disappear when strong refs are gone end up relying on the gc method
> instead, with the result that the program behaves differently on
> different implementations.

Antione Pitrou wrote:
> Why would they "rely on weakrefs to disappear when strong refs are
> gone"? What is the use case?

Greg Ewing wrote:
> Why do you want this guarantee? It would complicate
> implementations for which ref counting is not the
> native method of managing memory.

My dbf module provides direct access to dbf files. A retrieved record
is
a singleton object, and allows temporary changes that are not written
to
disk. Whether those changes are seen by the next incarnation depends
on
(I had thought) whether or not the record with the unwritten changes
has
gone out of scope.

I see two questions that determine whether this change should be made:

1) How difficult it would be for the non-ref counting
implementations
to implement

2) Whether it's appropriate to have objects be changed, but not
saved,
and then discarded when the strong references are gone so the
next
incarnation doesn't see the changes, even if the object hasn't
been
destroyed yet.

~Ethan~

FYI: For dbf I am going to disallow temporary changes so this won't
be
an immediate issue for me.

Masklinn

unread,

May 18, 2012, 12:38:00 PM5/18/12

to stoneleaf, python...@python.org

On 2012-05-18, at 18:08 , stoneleaf wrote:
>
> My dbf module provides direct access to dbf files. A retrieved record
> is
> a singleton object, and allows temporary changes that are not written
> to
> disk. Whether those changes are seen by the next incarnation depends
> on
> (I had thought) whether or not the record with the unwritten changes
> has
> gone out of scope.

If a record is a singleton, that singleton-ification would be handled
through weakrefs would it not?

In that case, until the GC is triggered (and the weakref is
invalidated), you will keep getting your initial singleton and there
will be no "next record", I fail to see why that would be an issue.

> I see two questions that determine whether this change should be made:
>
> 1) How difficult it would be for the non-ref counting
> implementations
> to implement
>

Pretty much impossible I'd expect, the weakrefs can only be broken on GC
runs (at object deallocation) and that is generally non-deterministic
without specifying precisely which type of GC implementation is used.
You'd need a fully deterministic deallocation model to ensure a weakref
is broken as soon as the corresponding object has no outstanding strong
(and soft, in some VMs like the JVM) reference.

> 2) Whether it's appropriate to have objects be changed, but not
> saved,
> and then discarded when the strong references are gone so the
> next
> incarnation doesn't see the changes, even if the object hasn't
> been
> destroyed yet.

If your saves are synchronized with the weakref being broken (the object
being *effectively* collected) and the singleton behavior is as well,
there will be no difference, I'm not sure what the issue would be, you
might just have a second change cycle using the same unsaved (but still
modified) object.

Although frankly speaking such reliance on non-deterministic events would
scare the shit out of me.

stoneleaf

unread,

May 18, 2012, 10:54:08 PM5/18/12

to python...@python.org

On May 18, 9:38 am, Masklinn wrote:
> On 2012-05-18, at 18:08 , stoneleaf wrote:
>> My dbf module provides direct access to dbf files. A retrieved record
>> is
>> a singleton object, and allows temporary changes that are not written
>> to
>> disk. Whether those changes are seen by the next incarnation depends
>> on
>> (I had thought) whether or not the record with the unwritten changes
>> has
>> gone out of scope.
>
> If a record is a singleton, that singleton-ification would be handled
> through weakrefs would it not?

Indeed, that is the current bahavior.

> In that case, until the GC is triggered (and the weakref is
> invalidated), you will keep getting your initial singleton and there
> will be no "next record", I fail to see why that would be an issue.

Because, since I had only been using CPython, I was able to count on
records that had gone out of scope disappearing along with their
_temporary_ changes. If I get that same record back the next time I
loop
through the table -- well, then the changes weren't temporary, were
they?

>> I see two questions that determine whether this change should be made:
>
>> 1) How difficult it would be for the non-ref counting
>> implementations to implement
>
> Pretty much impossible I'd expect, the weakrefs can only be broken on GC
> runs (at object deallocation) and that is generally non-deterministic
> without specifying precisely which type of GC implementation is used.
> You'd need a fully deterministic deallocation model to ensure a weakref
> is broken as soon as the corresponding object has no outstanding strong
> (and soft, in some VMs like the JVM) reference.
>
>> 2) Whether it's appropriate to have objects be changed, but not
>> saved, and then discarded when the strong references are gone so the
>> next incarnation doesn't see the changes, even if the object hasn't
>> been destroyed yet.
>
> If your saves are synchronized with the weakref being broken (the object
> being *effectively* collected) and the singleton behavior is as well,
> there will be no difference, I'm not sure what the issue would be, you
> might just have a second change cycle using the same unsaved (but still
> modified) object.

And that's exactly the problem -- I don't want to see the
modifications the
second time 'round, and if I can't count on weakrefs invalidating as
soon as
the strong refs are gone I'll have to completely rethink how I handle
records
from the table.

> Although frankly speaking such reliance on non-deterministic events would
> scare the shit out of me.

Indeed -- I hadn't realized that I was until somebody using PyPy
noticed the
problem.

~Ethan~

Michael Foord

unread,

May 19, 2012, 8:33:35 AM5/19/12

to stoneleaf, python...@python.org

On 19 May 2012 03:54, stoneleaf <et...@stoneleaf.us> wrote:

On May 18, 9:38 am, Masklinn wrote:
> On 2012-05-18, at 18:08 , stoneleaf wrote:
>> My dbf module provides direct access to dbf files. A retrieved record
>> is
>> a singleton object, and allows temporary changes that are not written
>> to
>> disk. Whether those changes are seen by the next incarnation depends
>> on
>> (I had thought) whether or not the record with the unwritten changes
>> has
>> gone out of scope.
>
> If a record is a singleton, that singleton-ification would be handled
> through weakrefs would it not?

Indeed, that is the current bahavior.

> In that case, until the GC is triggered (and the weakref is
> invalidated), you will keep getting your initial singleton and there
> will be no "next record", I fail to see why that would be an issue.

Because, since I had only been using CPython, I was able to count on
records that had gone out of scope disappearing along with their
_temporary_ changes. If I get that same record back the next time I
loop
through the table -- well, then the changes weren't temporary, were
they?

So you're taking a *dependence* on the reference counting garbage collection of the CPython implementation, and when that doesn't work for you with other implementations trying to force the same semantics on them. Your proposal can't reasonably be implemented by other implementations as working out whether there are any references to an object is an expensive operation.

A much better technique would be for you to use explicit life-cycle-management (like the with statement) for your objects.

Michael

--

http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others

May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html

stoneleaf

unread,

May 19, 2012, 11:29:02 AM5/19/12

to python...@python.org

On May 19, 5:33 am, Michael Foord wrote:
> So you're taking a *dependence* on the reference counting garbage
> collection of the CPython implementation, and when that doesn't work for
> you with other implementations trying to force the same semantics on them.

I am not trying to force anything. I stated what I would like, and
followed
up with questions to further the discussion.

> Your proposal can't reasonably be implemented by other implementations as
> working out whether there are any references to an object is an expensive
> operation.

Then that nixes it. The (debatable) advantages aren't worth a large
expenditure in programmer time, nor a large hit in performance.

> A much better technique would be for you to use explicit
> life-cycle-management (like the with statement) for your objects.

I'm leaning strongly towards just not allowing temporary changes,
which will
also solve my problem.

Thanks everyone for the feedback.

Reply all

Reply to author

Forward