[Python-Dev] PEP 556: Threaded garbage collection

44 views
Skip to first unread message

Antoine Pitrou

unread,
Sep 8, 2017, 11:11:57 AM9/8/17
to pytho...@python.org

Hello,

I've written a PEP by which you can tell the GC to run in a dedicated
thread. The goal is to solve reentrancy issues with finalizers:
https://www.python.org/dev/peps/pep-0556/

Regards

Antoine.

PS: I did not come up with the idea for this PEP while other people
were having nightmares. Any nightmares involved in this PEP are
fictional, and any resemblance to actual nightmares is purely
coincidental. No nightmares were harmed while writing this PEP.

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Nick Coghlan

unread,
Sep 8, 2017, 1:05:23 PM9/8/17
to Antoine Pitrou, pytho...@python.org
On 8 September 2017 at 08:05, Antoine Pitrou <soli...@pitrou.net> wrote:
>
> Hello,
>
> I've written a PEP by which you can tell the GC to run in a dedicated
> thread. The goal is to solve reentrancy issues with finalizers:
> https://www.python.org/dev/peps/pep-0556/

+1 from me for the general concept. (Minor naming idea: "inline" may
be a clearer name for the current behaviour).

One point that seems worth noting: even with cyclic GC moved out to a
separate thread, __del__ methods and weakref callbacks triggered by a
refcount going to zero will still be called inline in the current
thread. Changing that would require a tweak to the semantics of
Py_DECREF where if the GC was in threaded mode, instead of finalizing
the object immediately, it would instead be placed on a FIFO queue
where the GC thread would pick it up and then actually delete it.

Cheers,
Nick.

> PS: I did not come up with the idea for this PEP while other people
> were having nightmares.

I dunno, debugging finalizer re-entrancy problems seems pretty
nightmarish to me ;)

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Antoine Pitrou

unread,
Sep 8, 2017, 1:10:30 PM9/8/17
to pytho...@python.org
On Fri, 8 Sep 2017 10:03:33 -0700
Nick Coghlan <ncog...@gmail.com> wrote:

> On 8 September 2017 at 08:05, Antoine Pitrou <soli...@pitrou.net> wrote:
> >
> > Hello,
> >
> > I've written a PEP by which you can tell the GC to run in a dedicated
> > thread. The goal is to solve reentrancy issues with finalizers:
> > https://www.python.org/dev/peps/pep-0556/
>
> +1 from me for the general concept. (Minor naming idea: "inline" may
> be a clearer name for the current behaviour).
>
> One point that seems worth noting: even with cyclic GC moved out to a
> separate thread, __del__ methods and weakref callbacks triggered by a
> refcount going to zero will still be called inline in the current
> thread.

You're right, that bears mentioning. It's much less of a problem,
of course, since such calls happen at deterministic places.

Regards

Antoine.

Benjamin Peterson

unread,
Sep 8, 2017, 3:06:00 PM9/8/17
to pytho...@python.org
I like it overall.

- I was wondering what happens during interpreter shutdown. I see you
have that listed as a open issue. How about simply shutting down the
finalization thread and not guaranteeing that finalizers are actually
ever run à la Java?

- Why not run all (Python) finalizers on the thread and not just ones
from cycles?

> https://mail.python.org/mailman/options/python-dev/benjamin%40python.org

Larry Hastings

unread,
Sep 8, 2017, 3:26:28 PM9/8/17
to pytho...@python.org



On 09/08/2017 12:04 PM, Benjamin Peterson wrote:
- Why not run all (Python) finalizers on the thread and not just ones
from cycles?

Two reasons:
  1. Because some code relies on the finalizer being called on the thread where the last reference is dropped.  This is usually the same thread where the object was created.  Some irritating third-party libraries make demands on callers, e.g. "you can only interact with / destroy X objects on your 'main thread'".  This is often true of windowing / GUI libraries.  (For example, I believe this was true of Direct3D, at least as of D3D8; it was also often true of Win32 USER objects.)
  2. Because there's so much work there.  In the Gilectomy prototype, I originally called all finalizers on the "reference count manager commit thread", the thread that also committed increfs and decrefs.  The thread immediately fell behind on its queue and never caught up.  I changed the Gilectomy so objects needing finalization are passed back to the thread where the last decref happened, for finalization on that thread; this was pleasingly self-balancing.
Note that I turned off cyclic GC on the Gilectomy prototype a long time ago and haven't revisited it since.  My very, very long-term plan for GC is to stop the world and run it from one thread.  With the current system, that means all those finalizers would be run on the thread chosen to run the GC.


/arry

Antoine Pitrou

unread,
Sep 8, 2017, 3:27:44 PM9/8/17
to pytho...@python.org
On Fri, 08 Sep 2017 12:04:10 -0700
Benjamin Peterson <benj...@python.org> wrote:
> I like it overall.
>
> - I was wondering what happens during interpreter shutdown. I see you
> have that listed as a open issue. How about simply shutting down the
> finalization thread and not guaranteeing that finalizers are actually
> ever run à la Java?

I don't know. People generally have expectations towards stuff being
finalized properly (especially when talking about files etc.).
Once the first implementation is devised, we will know more about
what's workable (perhaps we'll have to move _PyGC_Fini earlier in the
shutdown sequence? perhaps we'll want to switch back to serial mode
when shutting down?).

> - Why not run all (Python) finalizers on the thread and not just ones
> from cycles?

Because a lot of code probably expects them to be run as soon as the
last visible ref disappears.

Regards

Antoine.

Benjamin Peterson

unread,
Sep 8, 2017, 3:36:32 PM9/8/17
to pytho...@python.org


On Fri, Sep 8, 2017, at 12:24, Larry Hastings wrote:
>
>
> On 09/08/2017 12:04 PM, Benjamin Peterson wrote:
> > - Why not run all (Python) finalizers on the thread and not just ones
> > from cycles?
>
> Two reasons:
>
> 1. Because some code relies on the finalizer being called on the thread
> where the last reference is dropped. This is usually the same
> thread where the object was created. Some irritating third-party
> libraries make demands on callers, e.g. "you can only interact with
> / destroy X objects on your 'main thread'". This is often true of
> windowing / GUI libraries. (For example, I believe this was true of
> Direct3D, at least as of D3D8; it was also often true of Win32 USER
> objects.)

Is the requirement that the construction and destruction be literally on
the same thread or merely non-concurrent? The GIL would provide the
latter.

> 2. Because there's so much work there. In the Gilectomy prototype, I
> originally called all finalizers on the "reference count manager
> commit thread", the thread that also committed increfs and decrefs.
> The thread immediately fell behind on its queue and never caught
> up. I changed the Gilectomy so objects needing finalization are
> passed back to the thread where the last decref happened, for
> finalization on that thread; this was pleasingly self-balancing.

I'm only suggesting Python-level __del__ methods be run on the separate
thread not general deallocation work. I would those would be few and far
between.

Benjamin Peterson

unread,
Sep 8, 2017, 3:40:28 PM9/8/17
to Antoine Pitrou, pytho...@python.org

On Fri, Sep 8, 2017, at 12:13, Antoine Pitrou wrote:
> On Fri, 08 Sep 2017 12:04:10 -0700
> Benjamin Peterson <benj...@python.org> wrote:
> > I like it overall.
> >
> > - I was wondering what happens during interpreter shutdown. I see you
> > have that listed as a open issue. How about simply shutting down the
> > finalization thread and not guaranteeing that finalizers are actually
> > ever run à la Java?
>
> I don't know. People generally have expectations towards stuff being
> finalized properly (especially when talking about files etc.).
> Once the first implementation is devised, we will know more about
> what's workable (perhaps we'll have to move _PyGC_Fini earlier in the
> shutdown sequence? perhaps we'll want to switch back to serial mode
> when shutting down?).

Okay, I'm curious to know what ends up happening here then.

>
> > - Why not run all (Python) finalizers on the thread and not just ones
> > from cycles?
>
> Because a lot of code probably expects them to be run as soon as the
> last visible ref disappears.

But this assumption is broken on PyPy and sometimes already by CPython,
so I don't feel very bad moving away from it.

Nathaniel Smith

unread,
Sep 8, 2017, 3:43:38 PM9/8/17
to Antoine Pitrou, Python Dev
On Fri, Sep 8, 2017 at 12:13 PM, Antoine Pitrou <soli...@pitrou.net> wrote:
> On Fri, 08 Sep 2017 12:04:10 -0700
> Benjamin Peterson <benj...@python.org> wrote:
>> I like it overall.
>>
>> - I was wondering what happens during interpreter shutdown. I see you
>> have that listed as a open issue. How about simply shutting down the
>> finalization thread and not guaranteeing that finalizers are actually
>> ever run à la Java?
>
> I don't know. People generally have expectations towards stuff being
> finalized properly (especially when talking about files etc.).
> Once the first implementation is devised, we will know more about
> what's workable (perhaps we'll have to move _PyGC_Fini earlier in the
> shutdown sequence? perhaps we'll want to switch back to serial mode
> when shutting down?).

PyPy just abandons everything when shutting down, instead of running
finalizers. See the last paragraph of :
http://doc.pypy.org/en/latest/cpython_differences.html#differences-related-to-garbage-collection-strategies

So that might be a useful source of experience.

On another note, I'm going to be that annoying person who suggests
massively extending the scope of your proposal. Feel free to throw
things at me or whatever.

Would it make sense to also move signal handlers to run in this
thread? Those are the other major source of nasty re-entrancy
problems.

-n

--
Nathaniel J. Smith -- https://vorpus.org

Larry Hastings

unread,
Sep 8, 2017, 3:50:35 PM9/8/17
to pytho...@python.org



On 09/08/2017 12:30 PM, Benjamin Peterson wrote:

On Fri, Sep 8, 2017, at 12:24, Larry Hastings wrote:

On 09/08/2017 12:04 PM, Benjamin Peterson wrote:
- Why not run all (Python) finalizers on the thread and not just ones
from cycles?
Two reasons:

 1. Because some code relies on the finalizer being called on the thread
    where the last reference is dropped.  This is usually the same
    thread where the object was created.  Some irritating third-party
    libraries make demands on callers, e.g. "you can only interact with
    / destroy X objects on your 'main thread'". This is often true of
    windowing / GUI libraries.  (For example, I believe this was true of
    Direct3D, at least as of D3D8; it was also often true of Win32 USER
    objects.)
Is the requirement that the construction and destruction be literally on
the same thread or merely non-concurrent? The GIL would provide the
latter.


Literally the same thread.  My theory was that these clowntown external libraries are hiding important details in thread local storage, but I don't actually know.


/arry

Antoine Pitrou

unread,
Sep 8, 2017, 3:53:37 PM9/8/17
to pytho...@python.org
On Fri, 8 Sep 2017 12:40:34 -0700
Nathaniel Smith <n...@pobox.com> wrote:
>
> PyPy just abandons everything when shutting down, instead of running
> finalizers. See the last paragraph of :
> http://doc.pypy.org/en/latest/cpython_differences.html#differences-related-to-garbage-collection-strategies
>
> So that might be a useful source of experience.

CPython can be embedded in applications, though, and that is why we try
to be a bit more thorough during the interpreter cleanup phase.

> Would it make sense to also move signal handlers to run in this
> thread? Those are the other major source of nasty re-entrancy
> problems.

See the "Non-goals" section in the PEP, they are already mentioned
there :-)

Note I don't think signal handlers are a major source of reentrancy
problems, rather minor, since usually you don't try to do much in a
signal handler. Signal handling is mostly a relic of 70s Unix design
and it has less and less relevance in today's world, apart from the
trivial task of telling a process to shut down.

Regards

Antoine.

Gregory P. Smith

unread,
Sep 8, 2017, 3:59:10 PM9/8/17
to Antoine Pitrou, pytho...@python.org
On Fri, Sep 8, 2017 at 12:52 PM Antoine Pitrou <soli...@pitrou.net> wrote:
On Fri, 8 Sep 2017 12:40:34 -0700
Nathaniel Smith <n...@pobox.com> wrote:
>
> PyPy just abandons everything when shutting down, instead of running
> finalizers. See the last paragraph of :
> http://doc.pypy.org/en/latest/cpython_differences.html#differences-related-to-garbage-collection-strategies
>
> So that might be a useful source of experience.

CPython can be embedded in applications, though, and that is why we try
to be a bit more thorough during the interpreter cleanup phase.

Indeed.  My gut feeling is that proposing to not run finalizers on interpreter shutdown is a non-starter and would get the pep rejected.  We've previously guaranteed that they were run unless the process dies via an unhandled signal or calls os._exit() in CPython.

-gps

Nick Coghlan

unread,
Sep 9, 2017, 11:14:08 AM9/9/17
to Nathaniel Smith, Antoine Pitrou, Python Dev
On 8 September 2017 at 12:40, Nathaniel Smith <n...@pobox.com> wrote:
> Would it make sense to also move signal handlers to run in this
> thread? Those are the other major source of nasty re-entrancy
> problems.

Python level signal handlers are already only run in the main thread,
so applications that want to ensure signals don't run at arbitrary
points in their code are already free to push all their application
logic into a subthread and have the main thread be purely a signal
handling thread.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
Reply all
Reply to author
Forward
0 new messages