[Python-Dev] PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 2)

111 views
Skip to first unread message

Eric Snow

unread,
Feb 19, 2022, 2:52:01 AM2/19/22
to Python-Dev
Thanks to all those that provided feedback. I've worked to
substantially update the PEP in response. The text is included below.
Further feedback is appreciated.

-eric

------------------------

PEP: 683
Title: Immortal Objects, Using a Fixed Refcount
Author: Eric Snow <ericsnow...@gmail.com>, Eddie Elizondo
<eduardo.el...@gmail.com>
Discussions-To:
https://mail.python.org/archives/list/pytho...@python.org/thread/TPLEYDCXFQ4AMTW6F6OQFINSIFYBRFCR/
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 10-Feb-2022
Python-Version: 3.11
Post-History: 15-Feb-2022
Resolution:


Abstract
========

Currently the CPython runtime maintains a
`small amount of mutable state <Runtime Object State_>`_ in the
allocated memory of each object. Because of this, otherwise immutable
objects are actually mutable. This can have a large negative impact
on CPU and memory performance, especially for approaches to increasing
Python's scalability. The solution proposed here provides a way
to mark an object as one for which that per-object
runtime state should not change.

Specifically, if an object's refcount matches a very specific value
(defined below) then that object is treated as "immortal". If an object
is immortal then its refcount will never be modified by ``Py_INCREF()``,
etc. Consequently, the refcount will never reach 0, so that object will
never be cleaned up (unless explicitly done, e.g. during runtime
finalization). Additionally, all other per-object runtime state
for an immortal object will be considered immutable.

This approach has some possible negative impact, which is explained
below, along with mitigations. A critical requirement for this change
is that the performance regression be no more than 2-3%. Anything worse
the performance-neutral requires that the other benefits are proportionally
large. Aside from specific applications, the fundamental improvement
here is that now an object can be truly immutable.

(This proposal is meant to be CPython-specific and to affect only
internal implementation details. There are some slight exceptions
to that which are explained below. See `Backward Compatibility`_,
`Public Refcount Details`_, and `scope`_.)


Motivation
==========

As noted above, currently all objects are effectively mutable. That
includes "immutable" objects like ``str`` instances. This is because
every object's refcount is frequently modified as the object is used
during execution. This is especially significant for a number of
commonly used global (builtin) objects, e.g. ``None``. Such objects
are used a lot, both in Python code and internally. That adds up to
a consistent high volume of refcount changes.

The effective mutability of all Python objects has a concrete impact
on parts of the Python community, e.g. projects that aim for
scalability like Instragram or the effort to make the GIL
per-interpreter. Below we describe several ways in which refcount
modification has a real negative effect on such projects.
None of that would happen for objects that are truly immutable.

Reducing CPU Cache Invalidation
-------------------------------

Every modification of a refcount causes the corresponding CPU cache
line to be invalidated. This has a number of effects.

For one, the write must be propagated to other cache levels
and to main memory. This has small effect on all Python programs.
Immortal objects would provide a slight relief in that regard.

On top of that, multi-core applications pay a price. If two threads
(running simultaneously on distinct cores) are interacting with the
same object (e.g. ``None``) then they will end up invalidating each
other's caches with each incref and decref. This is true even for
otherwise immutable objects like ``True``, ``0``, and ``str`` instances.
CPython's GIL helps reduce this effect, since only one thread runs at a
time, but it doesn't completely eliminate the penalty.

Avoiding Data Races
-------------------

Speaking of multi-core, we are considering making the GIL
a per-interpreter lock, which would enable true multi-core parallelism.
Among other things, the GIL currently protects against races between
multiple concurrent threads that may incref or decref the same object.
Without a shared GIL, two running interpreters could not safely share
any objects, even otherwise immutable ones like ``None``.

This means that, to have a per-interpreter GIL, each interpreter must
have its own copy of *every* object. That includes the singletons and
static types. We have a viable strategy for that but it will require
a meaningful amount of extra effort and extra complexity.

The alternative is to ensure that all shared objects are truly immutable.
There would be no races because there would be no modification. This
is something that the immortality proposed here would enable for
otherwise immutable objects. With immortal objects,
support for a per-interpreter GIL
becomes much simpler.

Avoiding Copy-on-Write
----------------------

For some applications it makes sense to get the application into
a desired initial state and then fork the process for each worker.
This can result in a large performance improvement, especially
memory usage. Several enterprise Python users (e.g. Instagram,
YouTube) have taken advantage of this. However, the above
refcount semantics drastically reduce the benefits and
has led to some sub-optimal workarounds.

Also note that "fork" isn't the only operating system mechanism
that uses copy-on-write semantics. Anything that uses ``mmap``
relies on copy-on-write, including sharing data from shared objects
files between processes.


Rationale
=========

The proposed solution is obvious enough that both of this proposal's
authors came to the same conclusion (and implementation, more or less)
independently. The Pyston project `uses a similar approach <Pyston_>`_.
Other designs were also considered. Several possibilities have also
been discussed on python-dev in past years.

Alternatives include:

* use a high bit to mark "immortal" but do not change ``Py_INCREF()``
* add an explicit flag to objects
* implement via the type (``tp_dealloc()`` is a no-op)
* track via the object's type object
* track with a separate table

Each of the above makes objects immortal, but none of them address
the performance penalties from refcount modification described above.

In the case of per-interpreter GIL, the only realistic alternative
is to move all global objects into ``PyInterpreterState`` and add
one or more lookup functions to access them. Then we'd have to
add some hacks to the C-API to preserve compatibility for the
may objects exposed there. The story is much, much simpler
with immortal objects


Impact
======

Benefits
--------

Most notably, the cases described in the two examples above stand
to benefit greatly from immortal objects. Projects using pre-fork
can drop their workarounds. For the per-interpreter GIL project,
immortal objects greatly simplifies the solution for existing static
types, as well as objects exposed by the public C-API.

In general, a strong immutability guarantee for objects enables Python
applications to scale like never before. This is because they can
then leverage multi-core parallelism without a tradeoff in memory
usage. This is reflected in most of the above cases.

Performance
-----------

A naive implementation shows `a 4% slowdown`_.
Several promising mitigation strategies will be pursued in the effort
to bring it closer to performance-neutral. See the `mitigation`_
section below.

On the positive side, immortal objects save a significant amount of
memory when used with a pre-fork model. Also, immortal objects provide
opportunities for specialization in the eval loop that would improve
performance.

.. _a 4% slowdown:
https://github.com/python/cpython/pull/19474#issuecomment-1032944709

Backward Compatibility
----------------------

This proposal is meant to be completely compatible. It focuses strictly
on internal implementation details. It does not involve changes to any
public API, other a few minor changes in behavior related to refcounts
(but only for immortal objects):

* code that inspects the refcount will see a really, really large value
* the new noop behavior may break code that:

* depends specifically on the refcount to always increment or decrement
(or have a specific value from ``Py_SET_REFCNT()``)
* relies on any specific refcount value, other than 0
* directly manipulates the refcount to store extra information there

Again, those changes in behavior only apply to immortal objects, not
most of the objects a user will access. Furthermore, users cannot mark
an object as immortal so no user-created objects will ever have that
changed behavior. Users that rely on any of the changing behavior for
global (builtin) objects are already in trouble.

Also note that code which checks for refleaks should keep working fine,
unless it checks for hard-coded small values relative to some immortal
object. The problems noticed by `Pyston`_ shouldn't apply here since
we do not modify the refcount.

See `Public Refcount Details`_ and `scope`_ below for further discussion.

Stable ABI
----------

The approach is also compatible with extensions compiled to the stable
ABI. Unfortunately, they will modify the refcount and invalidate all
the performance benefits of immortal objects. However, the high bit
of the refcount `will still match _Py_IMMORTAL_REFCNT <_Py_IMMORTAL_REFCNT_>`_
so we can still identify such objects as immortal. At worst, objects
in that situation would feel the effects described in the `Motivation`_
section. Even then the overall impact is unlikely to be significant.

Also see `_Py_IMMORTAL_REFCNT`_ below.

Accidental Immortality
----------------------

Hypothetically, a regular object could be incref'ed so much that it
reaches the magic value needed to be considered immortal. That means
it would accidentally never be cleaned up (by going back to 0).

While it isn't impossible, this accidental scenario is so unlikely
that we need not worry. Even if done deliberately by using
``Py_INCREF()`` in a tight loop and each iteration only took 1 CPU
cycle, it would take 2^61 cycles (on a 64-bit processor). At a fast
5 GHz that would still take nearly 500,000,000 seconds (over 5,000 days)!
If that CPU were 32-bit then it is (technically) more possible though
still highly unlikely.

Also note that it is doubly unlikely to be a problem because it wouldn't
matter until the refcount got back to 0 and the object was cleaned up.
So any object that hit that magic "immortal" refcount value would have
to be decref'ed that many times again before the change in behavior
would be noticed.

Again, the only realistic way that the magic refcount would be reached
(and then reversed) is if it were done deliberately. (Of course, the
same thing could be done efficiently using ``Py_SET_REFCNT()`` though
that would be even less of an accident.) At that point we don't
consider it a concern of this proposal.

Alternate Python Implementations
--------------------------------

This proposal is CPython-specific. However, it does relate to the
behavior of the C-API, which may affect other Python implementations.
Consequently, the effect of changed behavior described in
`Backward Compatibility`_ above also applies here (e.g. if another
implementation is tightly coupled to specific refcount values, other
than 0, or on exactly how refcounts change, then they may impacted).

Security Implications
---------------------

This feature has no known impact on security.

Maintainability
---------------

This is not a complex feature so it should not cause much mental
overhead for maintainers. The basic implementation doesn't touch
much code so it should have much impact on maintainability. There
may be some extra complexity due to performance penalty mitigation.
However, that should be limited to where we immortalize all
objects post-init and that code will be in one place.


Specification
=============

The approach involves these fundamental changes:

* add `_Py_IMMORTAL_REFCNT`_ (the magic value) to the internal C-API
* update ``Py_INCREF()`` and ``Py_DECREF()`` to no-op for objects with
the magic refcount (or its most significant bit)
* do the same for any other API that modifies the refcount
* stop modifying ``PyGC_Head`` for immortal GC objects ("containers")
* ensure that all immortal objects are cleaned up during
runtime finalization

Then setting any object's refcount to ``_Py_IMMORTAL_REFCNT``
makes it immortal.

(There are other minor, internal changes which are not described here.)

In the following sub-sections we dive into the details. First we will
cover some conceptual topics, followed by more concrete aspects like
specific affected APIs.

Public Refcount Details
-----------------------

In `Backward Compatibility`_ we introduced possible ways that user code
might be broken by the change in this proposal. Any contributing
misunderstanding by users is likely due in large part to the names of
the refcount-related API and to how the documentation explains those
API (and refcounting in general).

Between the names and the docs, we can clearly see answers
to the following questions:

* what behavior do users expect?
* what guarantees do we make?
* do we indicate how to interpret the refcount value they receive?
* what are the use cases under which a user would set an object's
refcount to a specific value?
* are users setting the refcount of objects they did not create?

As part of this proposal, we must make sure that users can clearly
understand on which parts of the refcount behavior they can rely and
which are considered implementation details. Specifically, they should
use the existing public refcount-related API and the only refcount value
with any meaning is 0. All other values are considered "not 0".

This information will be clarified in the `documentation <Documentation_>`_.

Arguably, the existing refcount-related API should be modified to reflect
what we want users to expect. Something like the following:

* ``Py_INCREF()`` -> ``Py_ACQUIRE_REF()`` (or only support ``Py_NewRef()``)
* ``Py_DECREF()`` -> ``Py_RELEASE_REF()``
* ``Py_REFCNT()`` -> ``Py_HAS_REFS()``
* ``Py_SET_REFCNT()`` -> ``Py_RESET_REFS()`` and ``Py_SET_NO_REFS()``

However, such a change is not a part of this proposal. It is included
here to demonstrate the tighter focus for user expectations that would
benefit this change.

Constraints
-----------

* ensure that otherwise immutable objects can be truly immutable
* minimize performance penalty for normal Python use cases
* be careful when immortalizing objects that we don't actually expect
to persist until runtime finalization.
* be careful when immortalizing objects that are not otherwise immutable

.. _scope:

Scope of Changes
----------------

Object immortality is not meant to be a public feature but rather an
internal one. So the proposal does *not* including adding any new
public C-API, nor any Python API. However, this does not prevent
us from adding (publicly accessible) private API to do things
like immortalize an object or tell if one is immortal.

The particular details of:

* how to mark something as immortal
* how to recognize something as immortal
* which subset of functionally immortal objects are marked as immortal
* which memory-management activities are skipped or modified for
immortal objects

are not only Cpython-specific but are also private implementation
details that are expected to change in subsequent versions.

Immortal Mutable Objects
------------------------

Any object can be marked as immortal. We do not propose any
restrictions or checks. However, in practice the value of making an
object immortal relates to its mutability and depends on the likelihood
it would be used for a sufficient portion of the application's lifetime.
Marking a mutable object as immortal can make sense in some situations.

Many of the use cases for immortal objects center on immutability, so
that threads can safely and efficiently share such objects without
locking. For this reason a mutable object, like a dict or list, would
never be shared (and thus no immortality). However, immortality may
be appropriate if there is sufficient guarantee that the normally
mutable object won't actually be modified.

On the other hand, some mutable objects will never be shared between
threads (at least not without a lock like the GIL). In some cases it
may be practical to make some of those immortal too. For example,
``sys.modules`` is a per-interpreter dict that we do not expect to ever
get freed until the corresponding interpreter is finalized. By making
it immortal, we no longer incur the extra overhead during incref/decref.

We explore this idea further in the `mitigation`_ section below.

(Note that we are still investigating the impact on GC
of immortalizing containers.)

Implicitly Immortal Objects
---------------------------

If an immortal object holds a reference to a normal (mortal) object
then that held object is effectively immortal. This is because that
object's refcount can never reach 0 until the immortal object releases
it.

Examples:

* containers like ``dict`` and ``list``
* objects that hold references internally like ``PyTypeObject.tp_subclasses``
* an object's type (held in ``ob_type``)

Such held objects are thus implicitly immortal for as long as they are
held. In practice, this should have no real consequences since it
really isn't a change in behavior. The only difference is that the
immortal object (holding the reference) doesn't ever get cleaned up.

We do not propose that such implicitly immortal objects be changed
in any way. They should not be explicitly marked as immortal just
because they are held by an immortal object. That would provide
no advantage over doing nothing.

Un-Immortalizing Objects
------------------------

This proposal does not include any mechanism for taking an immortal
object and returning it to a "normal" condition. Currently there
is no need for such an ability.

On top of that, the obvious approach is to simply set the refcount
to a small value. However, at that point there is no way in knowing
which value would be safe. Ideally we'd set it to the value that it
would have been if it hadn't been made immortal. However, that value
has long been lost. Hence the complexities involved make it less
likely that an object could safely be un-immortalized, even if we
had a good reason to do so.

_Py_IMMORTAL_REFCNT
-------------------

We will add two internal constants::

#define _Py_IMMORTAL_BIT (1LL << (8 * sizeof(Py_ssize_t) - 4))
#define _Py_IMMORTAL_REFCNT (_Py_IMMORTAL_BIT + (_Py_IMMORTAL_BIT / 2))

The refcount for immortal objects will be set to ``_Py_IMMORTAL_REFCNT``.
However, to check if an object is immortal we will compare its refcount
against just the bit::

(op->ob_refcnt & _Py_IMMORTAL_BIT) != 0

The difference means that an immortal object will still be considered
immortal, even if somehow its refcount were modified (e.g. by an older
stable ABI extension).

Note that top two bits of the refcount are already reserved for other
uses. That's why we are using the third top-most bit.

Affected API
------------

API that will now ignore immortal objects:

* (public) ``Py_INCREF()``
* (public) ``Py_DECREF()``
* (public) ``Py_SET_REFCNT()``
* (private) ``_Py_NewReference()``

API that exposes refcounts (unchanged but may now return large values):

* (public) ``Py_REFCNT()``
* (public) ``sys.getrefcount()``

(Note that ``_Py_RefTotal`` and ``sys.gettotalrefcount()``
will not be affected.)

Immortal Global Objects
-----------------------

All objects that we expect to be shared globally (between interpreters)
will be made immortal. That includes the following:

* singletons (``None``, ``True``, ``False``, ``Ellipsis``, ``NotImplemented``)
* all static types (e.g. ``PyLong_Type``, ``PyExc_Exception``)
* all static objects in ``_PyRuntimeState.global_objects`` (e.g. identifiers,
small ints)

All such objects will be immutable. In the case of the static types,
they will be effectively immutable. ``PyTypeObject`` has some mutable
start (``tp_dict`` and ``tp_subclasses``), but we can work around this
by storing that state on ``PyInterpreterState`` instead of on the
respective static type object. Then the ``__dict__``, etc. getter
will do a lookup on the current interpreter, if appropriate, instead
of using ``tp_dict``.

Object Cleanup
--------------

In order to clean up all immortal objects during runtime finalization,
we must keep track of them.

For GC objects ("containers") we'll leverage the GC's permanent
generation by pushing all immortalized containers there. During
runtime shutdown, the strategy will be to first let the runtime try
to do its best effort of deallocating these instances normally. Most
of the module deallocation will now be handled by
``pylifecycle.c:finalize_modules()`` which cleans up the remaining
modules as best as we can. It will change which modules are available
during __del__ but that's already defined as undefined behavior by the
docs. Optionally, we could do some topological disorder to guarantee
that user modules will be deallocated first before the stdlib modules.
Finally, anything leftover (if any) can be found through the permanent
generation gc list which we can clear after finalize_modules().

For non-container objects, the tracking approach will vary on a
case-by-case basis. In nearly every case, each such object is directly
accessible on the runtime state, e.g. in a ``_PyRuntimeState`` or
``PyInterpreterState`` field. We may need to add a tracking mechanism
to the runtime state for a small number of objects.

.. _mitigation:

Performance Regression Mitigation
---------------------------------

In the interest of clarify, here are some of the ways we are going
to try to recover some of the lost `performance <Performance_>`_:

* at the end of runtime init, mark all objects as immortal
* drop refcount operations in code where we know the object is immortal
(e.g. ``Py_RETURN_NONE``)
* specialize for immortal objects in the eval loop (see `Pyston`_)

Regarding that first point, we can apply the concept from
`Immortal Mutable Objects`_ in the pursuit of getting back some of
that 4% performance we lose with the naive implementation of immortal
objects. At the end of runtime init we can mark *all* objects as
immortal and avoid the extra cost in incref/decref. We only need
to worry about immutability with objects that we plan on sharing
between threads without a GIL.

Note that none of this section is part of the proposal.
The above is included here for clarity.

Possible Changes
----------------

* mark every interned string as immortal
* mark the "interned" dict as immortal if shared else share all interned strings
* (Larry,MvL) mark all constants unmarshalled for a module as immortal
* (Larry,MvL) allocate (immutable) immortal objects in their own memory page(s)

Documentation
-------------

The immortal objects behavior and API are internal, implementation
details and will not be added to the documentation.

However, we will update the documentation to make public guarantees
about refcount behavior more clear. That includes, specifically:

* ``Py_INCREF()`` - change "Increment the reference count for object o."
to "Acquire a new reference to object o."
* ``Py_DECREF()`` - change "Decrement the reference count for object o."
to "Release a reference to object o."
* similar for ``Py_XINCREF()``, ``Py_XDECREF()``, ``Py_NewRef()``,
``Py_XNewRef()``, ``Py_Clear()``, ``Py_REFCNT()``, and ``Py_SET_REFCNT()``

We *may* also add a note about immortal objects to the following,
to help reduce any surprise users may have with the change:

* ``Py_SET_REFCNT()`` (a no-op for immortal objects)
* ``Py_REFCNT()`` (value may be surprisingly large)
* ``sys.getrefcount()`` (value may be surprisingly large)

Other API that might benefit from such notes are currently undocumented.
We wouldn't add such a note anywhere else (including for ``Py_INCREF()``
and ``Py_DECREF()``) since the feature is otherwise transparent to users.


Reference Implementation
========================

The implementation is proposed on GitHub:

https://github.com/python/cpython/pull/19474


Open Issues
===========

* is there any other impact on GC?
* `are the copy-on-write benefits real?
<https://mail.python.org/archives/list/pytho...@python.org/message/J53GY7XKFOI4KWHSTTA7FUL7TJLE7WG6/>`__
* must the fate of this PEP be tied to acceptance of a per-interpreter GIL PEP?


References
==========

.. _Pyston: https://mail.python.org/archives/list/pytho...@python.org/message/TPLEYDCXFQ4AMTW6F6OQFINSIFYBRFCR/

Prior Art
---------

* `Pyston`_

Discussions
-----------

This was discussed in December 2021 on python-dev:

* https://mail.python.org/archives/list/pytho...@python.org/thread/7O3FUA52QGTVDC6MDAV5WXKNFEDRK5D6/#TBTHSOI2XRWRO6WQOLUW3X7S5DUXFAOV
* https://mail.python.org/archives/list/pytho...@python.org/thread/PNLBJBNIQDMG2YYGPBCTGOKOAVXRBJWY

Runtime Object State
--------------------

Here is the internal state that the CPython runtime keeps
for each Python object:

* `PyObject.ob_refcnt`_: the object's `refcount <refcounting_>`_
* `_PyGC_Head <PyGC_Head>`_: (optional) the object's node in a list of
`"GC" objects <refcounting_>`_
* `_PyObject_HEAD_EXTRA <PyObject_HEAD_EXTRA>`_: (optional) the
object's node in the list of heap objects

``ob_refcnt`` is part of the memory allocated for every object.
However, ``_PyObject_HEAD_EXTRA`` is allocated only if CPython was built
with ``Py_TRACE_REFS`` defined. ``PyGC_Head`` is allocated only if the
object's type has ``Py_TPFLAGS_HAVE_GC`` set. Typically this is only
container types (e.g. ``list``). Also note that ``PyObject.ob_refcnt``
and ``_PyObject_HEAD_EXTRA`` are part of ``PyObject_HEAD``.

.. _PyObject.ob_refcnt:
https://github.com/python/cpython/blob/80a9ba537f1f1666a9e6c5eceef4683f86967a1f/Include/object.h#L107
.. _PyGC_Head: https://github.com/python/cpython/blob/80a9ba537f1f1666a9e6c5eceef4683f86967a1f/Include/internal/pycore_gc.h#L11-L20
.. _PyObject_HEAD_EXTRA:
https://github.com/python/cpython/blob/80a9ba537f1f1666a9e6c5eceef4683f86967a1f/Include/object.h#L68-L72

.. _refcounting:

Reference Counting, with Cyclic Garbage Collection
--------------------------------------------------

Garbage collection is a memory management feature of some programming
languages. It means objects are cleaned up (e.g. memory freed)
once they are no longer used.

Refcounting is one approach to garbage collection. The language runtime
tracks how many references are held to an object. When code takes
ownership of a reference to an object or releases it, the runtime
is notified and it increments or decrements the refcount accordingly.
When the refcount reaches 0, the runtime cleans up the object.

With CPython, code must explicitly take or release references using
the C-API's ``Py_INCREF()`` and ``Py_DECREF()``. These macros happen
to directly modify the object's refcount (unfortunately, since that
causes ABI compatibility issues if we want to change our garbage
collection scheme). Also, when an object is cleaned up in CPython,
it also releases any references (and resources) it owns
(before it's memory is freed).

Sometimes objects may be involved in reference cycles, e.g. where
object A holds a reference to object B and object B holds a reference
to object A. Consequently, neither object would ever be cleaned up
even if no other references were held (i.e. a memory leak). The
most common objects involved in cycles are containers.

CPython has dedicated machinery to deal with reference cycles, which
we call the "cyclic garbage collector", or often just
"garbage collector" or "GC". Don't let the name confuse you.
It only deals with breaking reference cycles.

See the docs for a more detailed explanation of refcounting
and cyclic garbage collection:

* https://docs.python.org/3.11/c-api/intro.html#reference-counts
* https://docs.python.org/3.11/c-api/refcounting.html
* https://docs.python.org/3.11/c-api/typeobj.html#c.PyObject.ob_refcnt
* https://docs.python.org/3.11/c-api/gcsupport.html


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/KDAR6CCMPOX36GQJUDWHQBKRD5USNV3B/
Code of Conduct: http://python.org/psf/codeofconduct/

Inada Naoki

unread,
Feb 19, 2022, 10:51:17 PM2/19/22
to Eric Snow, Python-Dev
Hi,

I hope per-interpreter GIL success at some point, and I know this is
needed for per-interpreter GIL.

But I am worrying about per-interpreter GIL may be too complex to
implement and maintain for core developers and extension writers.
As you know, immortal don't mean sharable between interpreters. It is
too difficult to know which object can be shared, and where the
shareable objects are leaked to other interpreters.
So I am not sure that per interpreter GIL is achievable goal.

So I think it's too early to introduce the immortal objects in Python
3.11, unless it *improve* performance without per-interpreter GIL
Instead, we can add a configuration option such as
`--enalbe-experimental-immortal`.


On Sat, Feb 19, 2022 at 4:52 PM Eric Snow <ericsnow...@gmail.com> wrote:
>
> Reducing CPU Cache Invalidation
> -------------------------------
>
> Avoiding Data Races
> -------------------
>

Both benefits require a per-interpreter GIL.

>
> Avoiding Copy-on-Write
> ----------------------
>
> For some applications it makes sense to get the application into
> a desired initial state and then fork the process for each worker.
> This can result in a large performance improvement, especially
> memory usage. Several enterprise Python users (e.g. Instagram,
> YouTube) have taken advantage of this. However, the above
> refcount semantics drastically reduce the benefits and
> has led to some sub-optimal workarounds.
>

As I wrote before, fork is very difficult to use safely. We can not
recommend to use it for many users.
And I don't think reducing the size of patch in Instagram or YouTube
is not good rational for this kind of change.


> Also note that "fork" isn't the only operating system mechanism
> that uses copy-on-write semantics. Anything that uses ``mmap``
> relies on copy-on-write, including sharing data from shared objects
> files between processes.
>

It is very difficult to reduce CoW with mmap(MAP_PRIVATE).

You may need to write hash of bytes and unicode. You may be need to
write `tp_type`.
Immortal objects can "reduce" the memory write. But "at least one
memory write" is enough to trigger the CoW.


> Accidental Immortality
> ----------------------
>
> While it isn't impossible, this accidental scenario is so unlikely
> that we need not worry. Even if done deliberately by using
> ``Py_INCREF()`` in a tight loop and each iteration only took 1 CPU
> cycle, it would take 2^61 cycles (on a 64-bit processor). At a fast
> 5 GHz that would still take nearly 500,000,000 seconds (over 5,000 days)!
> If that CPU were 32-bit then it is (technically) more possible though
> still highly unlikely.
>

Technically, `[obj] * (2**(32-4))` is 1GB array on 32bit.


>
> Constraints
> -----------
>
> * ensure that otherwise immutable objects can be truly immutable
> * be careful when immortalizing objects that are not otherwise immutable

I am not sure about what this means.
For example, unicode objects are not immutable because they have hash,
utf8 cache and wchar_t cache. (wchar_t cache will be removed in Python
3.12).


>
> Object Cleanup
> --------------
>
> In order to clean up all immortal objects during runtime finalization,
> we must keep track of them.
>

I don't think we need to clean up all immortal objects.

Of course, we should care immortal by default objects.
But for user-marked immortal objects, it's very difficult to guarantee
__del__ or weakref callback is called safely.

Additionally, if they are marked immortal for avoiding CoW, cleanup cause CoW.

Regards,
--
Inada Naoki <songof...@gmail.com>
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/7FCNNQOTIUZTBFZUPYRDSLND6WCVM3JO/

Petr Viktorin

unread,
Feb 21, 2022, 11:12:50 AM2/21/22
to pytho...@python.org, Eric Snow
On 19. 02. 22 8:46, Eric Snow wrote:
> Thanks to all those that provided feedback. I've worked to
> substantially update the PEP in response. The text is included below.
> Further feedback is appreciated.

Thank you! This version is much clearer. I like the PEP more and more!

I've sent a PR with a some typo fixes:
https://github.com/python/peps/pull/2348
and I have a few comments:


[...]
> Public Refcount Details
[...]
> As part of this proposal, we must make sure that users can clearly
> understand on which parts of the refcount behavior they can rely and
> which are considered implementation details. Specifically, they should
> use the existing public refcount-related API and the only refcount value
> with any meaning is 0. All other values are considered "not 0".

Should we care about hacks/optimizations that rely on having the only
reference (or all references), e.g. mutating a tuple if it has refcount
1? Immortal objects shouldn't break them (the special case simply won't
apply), but this wording would make them illegal.
AFAIK CPython uses this internally, but I don't know how
prevalent/useful it is in third-party code.


[...]
>
> _Py_IMMORTAL_REFCNT
> -------------------
>
> We will add two internal constants::
>
> #define _Py_IMMORTAL_BIT (1LL << (8 * sizeof(Py_ssize_t) - 4))
> #define _Py_IMMORTAL_REFCNT (_Py_IMMORTAL_BIT + (_Py_IMMORTAL_BIT / 2))

As a nitpick: could you say this in prose?

* ``_Py_IMMORTAL_BIT`` has the third top-most bit set.
* ``_Py_IMMORTAL_REFCNT`` has the third and fourth top-most bits set.


[...]
>
> Immortal Global Objects
> -----------------------
>
> All objects that we expect to be shared globally (between interpreters)
> will be made immortal. That includes the following:
>
> * singletons (``None``, ``True``, ``False``, ``Ellipsis``, ``NotImplemented``)
> * all static types (e.g. ``PyLong_Type``, ``PyExc_Exception``)
> * all static objects in ``_PyRuntimeState.global_objects`` (e.g. identifiers,
> small ints)
>
> All such objects will be immutable. In the case of the static types,
> they will be effectively immutable. ``PyTypeObject`` has some mutable
> start (``tp_dict`` and ``tp_subclasses``), but we can work around this
> by storing that state on ``PyInterpreterState`` instead of on the
> respective static type object. Then the ``__dict__``, etc. getter
> will do a lookup on the current interpreter, if appropriate, instead
> of using ``tp_dict``.

But tp_dict is also public C-API. How will that be handled?
Perhaps naively, I thought static types' dicts could be treated as
(deeply) immutable, and shared?

Perhaps it would be best to leave it out here and say say "The details
of sharing ``PyTypeObject`` across interpreters are left to another PEP"?
Even so, I'd love to know the plan. (And even if these are internals,
changes to them should be mentioned in What's New, for the sake of
people who need to maintain old extensions.)



> Object Cleanup
> --------------
>
> In order to clean up all immortal objects during runtime finalization,
> we must keep track of them.
>
> For GC objects ("containers") we'll leverage the GC's permanent
> generation by pushing all immortalized containers there. During
> runtime shutdown, the strategy will be to first let the runtime try
> to do its best effort of deallocating these instances normally. Most
> of the module deallocation will now be handled by
> ``pylifecycle.c:finalize_modules()`` which cleans up the remaining
> modules as best as we can. It will change which modules are available
> during __del__ but that's already defined as undefined behavior by the
> docs. Optionally, we could do some topological disorder to guarantee
> that user modules will be deallocated first before the stdlib modules.
> Finally, anything leftover (if any) can be found through the permanent
> generation gc list which we can clear after finalize_modules().
>
> For non-container objects, the tracking approach will vary on a
> case-by-case basis. In nearly every case, each such object is directly
> accessible on the runtime state, e.g. in a ``_PyRuntimeState`` or
> ``PyInterpreterState`` field. We may need to add a tracking mechanism
> to the runtime state for a small number of objects.

Out of curiosity: How does this extra work affect in the performance? Is
it part of the 4% slowdown?



And from the other thread:

On 17. 02. 22 18:23, Eric Snow wrote:
> On Thu, Feb 17, 2022 at 3:42 AM Petr Viktorin <enc...@gmail.com> wrote:
>>>> Weren't you planning a PEP on subinterpreter GIL as well? Do you
want to
>>>> submit them together?
>>>
>>> I'd have to think about that. The other PEP I'm writing for
>>> per-interpreter GIL doesn't require immortal objects. They just
>>> simplify a number of things. That's my motivation for writing this
>>> PEP, in fact. :)
>>
>> Please think about it.
>> If you removed the benefits for per-interpreter GIL, the motivation
>> section would be reduced to is memory savings for fork/CoW. (And lots of
>> performance improvements that are great in theory but sum up to a 4%
loss.)
>
> Sounds good. Would this involve more than a note at the top of the PEP?

No, a note would work great. If you read the motivation carefully, it's
(IMO) clear that it's rather weak without the other PEP. But that
realization shouldn't come as a surprise to the reader.


> And just to be clear, I don't think the fate of a per-interpreter GIL
> PEP should not depend on this one.

I think that's clear.
It's other way around - the fate of this PEP will probably depend on the
per-interpreter GIL one.

_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/EFTWUNDK7PDELCZSDU6NMKA4W4VJ6BNT/

dw-...@d-woods.co.uk

unread,
Feb 21, 2022, 12:56:09 PM2/21/22
to pytho...@python.org
Petr Viktorin wrote:
> Should we care about hacks/optimizations that rely on having the only
> reference (or all references), e.g. mutating a tuple if it has refcount
> 1? Immortal objects shouldn't break them (the special case simply won't
> apply), but this wording would make them illegal.
> AFAIK CPython uses this internally, but I don't know how
> prevalent/useful it is in third-party code.

For what it's worth Cython does this for string concatenation to concatenate in place if possible (this optimization was copied from CPython). It could be disabled relatively easily if it became a problem (it's already CPython only and version checked so it'd just need another upper-bound version check).
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/CDNQK5RMXSLLYFNIXRORL7GTKU6B4BVR/

Terry Reedy

unread,
Feb 21, 2022, 6:58:15 PM2/21/22
to pytho...@python.org
On 2/21/2022 11:11 AM, Petr Viktorin wrote:
> On 19. 02. 22 8:46, Eric Snow wrote:

>> As part of this proposal, we must make sure that users can clearly
>> understand on which parts of the refcount behavior they can rely and
>> which are considered implementation details.  Specifically, they should
>> use the existing public refcount-related API and the only refcount value
>> with any meaning is 0.  All other values are considered "not 0".
>
> Should we care about hacks/optimizations that rely on having the only
> reference (or all references), e.g. mutating a tuple if it has refcount
> 1? Immortal objects shouldn't break them (the special case simply won't
> apply), but this wording would make them illegal.
> AFAIK CPython uses this internally, but I don't know how
> prevalent/useful it is in third-party code.

We could say that the only refcounts with any meaning are 0, 1, and > 1.


--
Terry Jan Reedy
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/C3R4FKO7PZETOSI5DTGMAXWVUTQM26AW/

Eric Snow

unread,
Feb 22, 2022, 8:13:51 PM2/22/22
to Inada Naoki, Python-Dev
Thanks for the feedback. I've responded inline below.

-eric

On Sat, Feb 19, 2022 at 8:50 PM Inada Naoki <songof...@gmail.com> wrote:
> I hope per-interpreter GIL success at some point, and I know this is
> needed for per-interpreter GIL.
>
> But I am worrying about per-interpreter GIL may be too complex to
> implement and maintain for core developers and extension writers.
> As you know, immortal don't mean sharable between interpreters. It is
> too difficult to know which object can be shared, and where the
> shareable objects are leaked to other interpreters.
> So I am not sure that per interpreter GIL is achievable goal.

I plan on addressing this in the PEP I am working on for
per-interpreter GIL. In the meantime, I doubt the issue will impact
any core devs.

> So I think it's too early to introduce the immortal objects in Python
> 3.11, unless it *improve* performance without per-interpreter GIL
> Instead, we can add a configuration option such as
> `--enalbe-experimental-immortal`.

I agree that immortal objects aren't quite as appealing in general
without per-interpreter GIL. However, there are actual users that
will benefit from it, assuming we can reduce the performance penalty
to acceptable levels. For a recent example, see
https://mail.python.org/archives/list/pytho...@python.org/message/B77BQQFDSTPY4KA4HMHYXJEV3MOU7W3X/.

> On Sat, Feb 19, 2022 at 4:52 PM Eric Snow <ericsnow...@gmail.com> wrote:
> >
> > Reducing CPU Cache Invalidation
> > -------------------------------
> >
> > Avoiding Data Races
> > -------------------
> >
>
> Both benefits require a per-interpreter GIL.

CPU cache invalidation exists regardless. With the current GIL the
effect it is reduced significantly.

Per-interpreter GIL is only one situation where data races matter.
Any attempt to generally eliminate the GIL must deal with races on the
per-object runtime state.

> >
> > Avoiding Copy-on-Write
> > ----------------------
> >
> > For some applications it makes sense to get the application into
> > a desired initial state and then fork the process for each worker.
> > This can result in a large performance improvement, especially
> > memory usage. Several enterprise Python users (e.g. Instagram,
> > YouTube) have taken advantage of this. However, the above
> > refcount semantics drastically reduce the benefits and
> > has led to some sub-optimal workarounds.
> >
>
> As I wrote before, fork is very difficult to use safely. We can not
> recommend to use it for many users.
> And I don't think reducing the size of patch in Instagram or YouTube
> is not good rational for this kind of change.

What do you mean by "this kind of change"? The proposed change is
relatively small. It certainly isn't nearly as intrusive as many
changes we make to internals without a PEP. If you are talking about
the performance penalty, we should be able to eliminate it.

> > Also note that "fork" isn't the only operating system mechanism
> > that uses copy-on-write semantics. Anything that uses ``mmap``
> > relies on copy-on-write, including sharing data from shared objects
> > files between processes.
> >
>
> It is very difficult to reduce CoW with mmap(MAP_PRIVATE).
>
> You may need to write hash of bytes and unicode. You may be need to
> write `tp_type`.
> Immortal objects can "reduce" the memory write. But "at least one
> memory write" is enough to trigger the CoW.

Correct. However, without immortal objects (AKA immutable per-object
runtime-state) it goes from "very difficult" to "basically
impossible".

> > Accidental Immortality
> > ----------------------
> >
> > While it isn't impossible, this accidental scenario is so unlikely
> > that we need not worry. Even if done deliberately by using
> > ``Py_INCREF()`` in a tight loop and each iteration only took 1 CPU
> > cycle, it would take 2^61 cycles (on a 64-bit processor). At a fast
> > 5 GHz that would still take nearly 500,000,000 seconds (over 5,000 days)!
> > If that CPU were 32-bit then it is (technically) more possible though
> > still highly unlikely.
> >
>
> Technically, `[obj] * (2**(32-4))` is 1GB array on 32bit.

The question is if this matters. If really necessary, the PEP can
demonstrate that it doesn't matter in practice.

(Also, the magic value on 32-bit would be 2**29.)

> >
> > Constraints
> > -----------
> >
> > * ensure that otherwise immutable objects can be truly immutable
> > * be careful when immortalizing objects that are not otherwise immutable
>
> I am not sure about what this means.
> For example, unicode objects are not immutable because they have hash,
> utf8 cache and wchar_t cache. (wchar_t cache will be removed in Python
> 3.12).

I think you understood it correctly. In the case of str objects, they
are close enough since a race on any of those values will not cause a
different outcome.

I will clarify the point in the PEP.

> > Object Cleanup
> > --------------
> >
> > In order to clean up all immortal objects during runtime finalization,
> > we must keep track of them.
> >
>
> I don't think we need to clean up all immortal objects.
>
> Of course, we should care immortal by default objects.
> But for user-marked immortal objects, it's very difficult to guarantee
> __del__ or weakref callback is called safely.

There is no such thing as user-marked immortal objects. The concept
is strictly an internal one, with no public API.

> Additionally, if they are marked immortal for avoiding CoW, cleanup cause CoW.

Correct. The PEP does not propose to deal with that situation.
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/JH2EWLUWOOQ2KKABVK6LD76TBYWPRHLG/

Eric Snow

unread,
Feb 22, 2022, 8:48:04 PM2/22/22
to Petr Viktorin, Python-Dev
Thanks for the responses. I've replied inline below.

-eric

On Mon, Feb 21, 2022 at 9:11 AM Petr Viktorin <enc...@gmail.com> wrote:
>
> On 19. 02. 22 8:46, Eric Snow wrote:
> > Thanks to all those that provided feedback. I've worked to
> > substantially update the PEP in response. The text is included below.
> > Further feedback is appreciated.
>
> Thank you! This version is much clearer. I like the PEP more and more!

Great!

> I've sent a PR with a some typo fixes:
> https://github.com/python/peps/pull/2348

Thank you.

> > Public Refcount Details
> [...]
> > As part of this proposal, we must make sure that users can clearly
> > understand on which parts of the refcount behavior they can rely and
> > which are considered implementation details. Specifically, they should
> > use the existing public refcount-related API and the only refcount value
> > with any meaning is 0. All other values are considered "not 0".
>
> Should we care about hacks/optimizations that rely on having the only
> reference (or all references), e.g. mutating a tuple if it has refcount
> 1? Immortal objects shouldn't break them (the special case simply won't
> apply), but this wording would make them illegal.
> AFAIK CPython uses this internally, but I don't know how
> prevalent/useful it is in third-party code.

Good point. As Terry suggested, we could also let 1 have meaning.

Regardless, any documented restriction would only apply to users of
the public C-API, not to internal code.

> > _Py_IMMORTAL_REFCNT
> > -------------------
> >
> > We will add two internal constants::
> >
> > #define _Py_IMMORTAL_BIT (1LL << (8 * sizeof(Py_ssize_t) - 4))
> > #define _Py_IMMORTAL_REFCNT (_Py_IMMORTAL_BIT + (_Py_IMMORTAL_BIT / 2))
>
> As a nitpick: could you say this in prose?
>
> * ``_Py_IMMORTAL_BIT`` has the third top-most bit set.
> * ``_Py_IMMORTAL_REFCNT`` has the third and fourth top-most bits set.

Sure.

> > Immortal Global Objects
> > -----------------------
> >
> > All objects that we expect to be shared globally (between interpreters)
> > will be made immortal. That includes the following:
> >
> > * singletons (``None``, ``True``, ``False``, ``Ellipsis``, ``NotImplemented``)
> > * all static types (e.g. ``PyLong_Type``, ``PyExc_Exception``)
> > * all static objects in ``_PyRuntimeState.global_objects`` (e.g. identifiers,
> > small ints)
> >
> > All such objects will be immutable. In the case of the static types,
> > they will be effectively immutable. ``PyTypeObject`` has some mutable
> > start (``tp_dict`` and ``tp_subclasses``), but we can work around this
> > by storing that state on ``PyInterpreterState`` instead of on the
> > respective static type object. Then the ``__dict__``, etc. getter
> > will do a lookup on the current interpreter, if appropriate, instead
> > of using ``tp_dict``.
>
> But tp_dict is also public C-API. How will that be handled?
> Perhaps naively, I thought static types' dicts could be treated as
> (deeply) immutable, and shared?

They are immutable from Python code but not from C (due to tp_dict).
Basically, we will document that tp_dict should not be used directly
(in the public API) and refer users to a public getter function. I'll
note this in the PEP.

> Perhaps it would be best to leave it out here and say say "The details
> of sharing ``PyTypeObject`` across interpreters are left to another PEP"?
> Even so, I'd love to know the plan.

What else would you like to know? There isn't much to it. For each
of the builtin static types we will keep the relevant mutable state on
PyInterpreterState and look it up there in the relevant getters (e.g.
__dict__ and __subclasses__).

> (And even if these are internals,
> changes to them should be mentioned in What's New, for the sake of
> people who need to maintain old extensions.)

+1

> > Object Cleanup
> > --------------
> >
> > In order to clean up all immortal objects during runtime finalization,
> > we must keep track of them.
> >
> > For GC objects ("containers") we'll leverage the GC's permanent
> > generation by pushing all immortalized containers there. During
> > runtime shutdown, the strategy will be to first let the runtime try
> > to do its best effort of deallocating these instances normally. Most
> > of the module deallocation will now be handled by
> > ``pylifecycle.c:finalize_modules()`` which cleans up the remaining
> > modules as best as we can. It will change which modules are available
> > during __del__ but that's already defined as undefined behavior by the
> > docs. Optionally, we could do some topological disorder to guarantee
> > that user modules will be deallocated first before the stdlib modules.
> > Finally, anything leftover (if any) can be found through the permanent
> > generation gc list which we can clear after finalize_modules().
> >
> > For non-container objects, the tracking approach will vary on a
> > case-by-case basis. In nearly every case, each such object is directly
> > accessible on the runtime state, e.g. in a ``_PyRuntimeState`` or
> > ``PyInterpreterState`` field. We may need to add a tracking mechanism
> > to the runtime state for a small number of objects.
>
> Out of curiosity: How does this extra work affect in the performance? Is
> it part of the 4% slowdown?

The slowdown is exclusively due to the change to Py_INCREF() and
Py_DECREF(). If there are any objects that must be specially tracked,
that will have insignificant performance impact.

> And from the other thread:
>
> On 17. 02. 22 18:23, Eric Snow wrote:
> > On Thu, Feb 17, 2022 at 3:42 AM Petr Viktorin <enc...@gmail.com> wrote:
> >>>> Weren't you planning a PEP on subinterpreter GIL as well? Do you
> want to
> >>>> submit them together?
> >>>
> >>> I'd have to think about that. The other PEP I'm writing for
> >>> per-interpreter GIL doesn't require immortal objects. They just
> >>> simplify a number of things. That's my motivation for writing this
> >>> PEP, in fact. :)
> >>
> >> Please think about it.
> >> If you removed the benefits for per-interpreter GIL, the motivation
> >> section would be reduced to is memory savings for fork/CoW. (And lots of
> >> performance improvements that are great in theory but sum up to a 4%
> loss.)
> >
> > Sounds good. Would this involve more than a note at the top of the PEP?
>
> No, a note would work great. If you read the motivation carefully, it's
> (IMO) clear that it's rather weak without the other PEP. But that
> realization shouldn't come as a surprise to the reader.

Having thought about it some more, I don't think this PEP should be
strictly bound to per-interpreter GIL. That is certainly my personal
motivation. However, we have a small set of users that would benefit
significantly, the change is relatively small and simple, and the risk
of breaking users is also small. In fact, we regularly have more
disruptive changes to internals that do not require a PEP.

So it seems like the bar should be pretty low for this one (assuming
we get the performance penalty low enough). If it were some massive
or broadly impactful (or even clearly public) change then I suppose
you could call the motivation weak. However, this isn't that sort of
PEP. Honestly, it might not have needed a PEP in the first place if I
had been a bit more clear about the idea earlier.
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/TSHZB7VWSS7XXPCMKXISNXGGCHM7PJOP/

Eric Snow

unread,
Feb 22, 2022, 8:50:14 PM2/22/22
to dw-...@d-woods.co.uk, Python-Dev
On Mon, Feb 21, 2022 at 10:56 AM <dw-...@d-woods.co.uk> wrote:
> For what it's worth Cython does this for string concatenation to concatenate in place if possible (this optimization was copied from CPython). It could be disabled relatively easily if it became a problem (it's already CPython only and version checked so it'd just need another upper-bound version check).

That's good to know.

-eric
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/OEZS4KGQJET5DL3M2OTB76I4W7F56FJC/

Eric Snow

unread,
Feb 22, 2022, 8:51:01 PM2/22/22
to Terry Reedy, Python-Dev
On Mon, Feb 21, 2022 at 4:56 PM Terry Reedy <tjr...@udel.edu> wrote:
> We could say that the only refcounts with any meaning are 0, 1, and > 1.

Yeah, that should work.

-eric
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/7HZ7VBJQOYHXFV3ZD4V7DCMLBL4Q34WP/

Eric Snow

unread,
Feb 22, 2022, 9:07:32 PM2/22/22
to Python-Dev
On Sat, Feb 19, 2022 at 12:46 AM Eric Snow <ericsnow...@gmail.com> wrote:
> Performance
> -----------
>
> A naive implementation shows `a 4% slowdown`_.
> Several promising mitigation strategies will be pursued in the effort
> to bring it closer to performance-neutral. See the `mitigation`_
> section below.

FYI, Eddie has been able to get us back to performance-neutral after
applying several of the mitigation strategies we discussed. :)

-eric
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/ZYGZEQSVBS6ODVAHPL3QN4CJ7JN4FYWO/

Inada Naoki

unread,
Feb 22, 2022, 9:23:22 PM2/22/22
to Eric Snow, Python-Dev
On Wed, Feb 23, 2022 at 10:12 AM Eric Snow <ericsnow...@gmail.com> wrote:
>
> Thanks for the feedback. I've responded inline below.
>
> -eric
>
> On Sat, Feb 19, 2022 at 8:50 PM Inada Naoki <songof...@gmail.com> wrote:
> > I hope per-interpreter GIL success at some point, and I know this is
> > needed for per-interpreter GIL.
> >
> > But I am worrying about per-interpreter GIL may be too complex to
> > implement and maintain for core developers and extension writers.
> > As you know, immortal don't mean sharable between interpreters. It is
> > too difficult to know which object can be shared, and where the
> > shareable objects are leaked to other interpreters.
> > So I am not sure that per interpreter GIL is achievable goal.
>
> I plan on addressing this in the PEP I am working on for
> per-interpreter GIL. In the meantime, I doubt the issue will impact
> any core devs.
>

It's nice to hear!


> > So I think it's too early to introduce the immortal objects in Python
> > 3.11, unless it *improve* performance without per-interpreter GIL
> > Instead, we can add a configuration option such as
> > `--enalbe-experimental-immortal`.
>
> I agree that immortal objects aren't quite as appealing in general
> without per-interpreter GIL. However, there are actual users that
> will benefit from it, assuming we can reduce the performance penalty
> to acceptable levels. For a recent example, see
> https://mail.python.org/archives/list/pytho...@python.org/message/B77BQQFDSTPY4KA4HMHYXJEV3MOU7W3X/.
>

It is not proven example, but just a hope at the moment. So option is
fine to prove the idea.

Although I can not read the code, they said "patching ASLR by patching
`ob_type` fields;".
It will cause CoW for most objects, isn't it?

So reducing memory write don't directly means reducing CoW.
Unless we can stop writing on a page completely, the page will be copied.


> > On Sat, Feb 19, 2022 at 4:52 PM Eric Snow <ericsnow...@gmail.com> wrote:
> > >
> > > Reducing CPU Cache Invalidation
> > > -------------------------------
> > >
> > > Avoiding Data Races
> > > -------------------
> > >
> >
> > Both benefits require a per-interpreter GIL.
>
> CPU cache invalidation exists regardless. With the current GIL the
> effect it is reduced significantly.
>

It's an interesting point. We can not see the benefit from
pypeformance, because it doesn't use much data and it runs one process
at a time.
So the pyperformance can not make enough stress to the last level
cache which is shared by many cores.

We need multiprocess performance benchmark apart from pyperformance,
to stress the last level cache from multiple cores.
It helps not only this PEP, but also optimizing containers like dict and set.


> >
> > As I wrote before, fork is very difficult to use safely. We can not
> > recommend to use it for many users.
> > And I don't think reducing the size of patch in Instagram or YouTube
> > is not good rational for this kind of change.
>
> What do you mean by "this kind of change"? The proposed change is
> relatively small. It certainly isn't nearly as intrusive as many
> changes we make to internals without a PEP. If you are talking about
> the performance penalty, we should be able to eliminate it.
>

Can proposed optimizations to eliminate the penalty guarantee that
every __del__, weakref are not broken,
and no memory leak occurs when the Python interpreter is initialized
and finalized multiple times?
I haven't confirmed it yet.


> > > Also note that "fork" isn't the only operating system mechanism
> > > that uses copy-on-write semantics. Anything that uses ``mmap``
> > > relies on copy-on-write, including sharing data from shared objects
> > > files between processes.
> > >
> >
> > It is very difficult to reduce CoW with mmap(MAP_PRIVATE).
> >
> > You may need to write hash of bytes and unicode. You may be need to
> > write `tp_type`.
> > Immortal objects can "reduce" the memory write. But "at least one
> > memory write" is enough to trigger the CoW.
>
> Correct. However, without immortal objects (AKA immutable per-object
> runtime-state) it goes from "very difficult" to "basically
> impossible".
>

Configuration option won't make it impossible.


> > >
> > > Constraints
> > > -----------
> > >
> > > * ensure that otherwise immutable objects can be truly immutable
> > > * be careful when immortalizing objects that are not otherwise immutable
> >
> > I am not sure about what this means.
> > For example, unicode objects are not immutable because they have hash,
> > utf8 cache and wchar_t cache. (wchar_t cache will be removed in Python
> > 3.12).
>
> I think you understood it correctly. In the case of str objects, they
> are close enough since a race on any of those values will not cause a
> different outcome.
>
> I will clarify the point in the PEP.
>

FWIW, I filed an issue to remove hash cache from bytes objects.
https://github.com/faster-cpython/ideas/issues/290

Code objects have many bytes objects, (e.g. co_code, co_linetable, etc...)
Removing it will save some RAM usage and make immortal bytes truly
immutable, safe to be shared between interpreters.



--
Inada Naoki <songof...@gmail.com>
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/KVH7QLJ4VJBQQ45LZTWDXO2SEC6ANX7T/

Larry Hastings

unread,
Feb 22, 2022, 10:27:01 PM2/22/22
to pytho...@python.org
On 2/22/22 6:00 PM, Eric Snow wrote:
On Sat, Feb 19, 2022 at 12:46 AM Eric Snow <ericsnow...@gmail.com> wrote:
Performance
-----------

A naive implementation shows `a 4% slowdown`_.
Several promising mitigation strategies will be pursued in the effort
to bring it closer to performance-neutral.  See the `mitigation`_
section below.
FYI, Eddie has been able to get us back to performance-neutral after
applying several of the mitigation strategies we discussed. :)


Are these optimizations specifically for the PR, or are these optimizations we could apply without taking the immortal objects?  Kind of like how Sam tried to offset the nogil slowdown by adding optimizations that we went ahead and added anyway ;-)


/arry

Eric Snow

unread,
Feb 23, 2022, 1:31:39 AM2/23/22
to Larry Hastings, Python-Dev
On Tue, Feb 22, 2022, 20:26 Larry Hastings <la...@hastings.org> wrote:
Are these optimizations specifically for the PR, or are these optimizations we could apply without taking the immortal objects?  Kind of like how Sam tried to offset the nogil slowdown by adding optimizations that we went ahead and added anyway ;-)

Basically all the optimizations require immortal objects.

-eric

Petr Viktorin

unread,
Feb 23, 2022, 11:17:39 AM2/23/22
to Eric Snow, Python-Dev
On 23. 02. 22 2:46, Eric Snow wrote:
> Thanks for the responses. I've replied inline below.

Same here :)


>>> Immortal Global Objects
>>> -----------------------
>>>
>>> All objects that we expect to be shared globally (between interpreters)
>>> will be made immortal. That includes the following:
>>>
>>> * singletons (``None``, ``True``, ``False``, ``Ellipsis``, ``NotImplemented``)
>>> * all static types (e.g. ``PyLong_Type``, ``PyExc_Exception``)
>>> * all static objects in ``_PyRuntimeState.global_objects`` (e.g. identifiers,
>>> small ints)
>>>
>>> All such objects will be immutable. In the case of the static types,
>>> they will be effectively immutable. ``PyTypeObject`` has some mutable
>>> start (``tp_dict`` and ``tp_subclasses``), but we can work around this
>>> by storing that state on ``PyInterpreterState`` instead of on the
>>> respective static type object. Then the ``__dict__``, etc. getter
>>> will do a lookup on the current interpreter, if appropriate, instead
>>> of using ``tp_dict``.
>>
>> But tp_dict is also public C-API. How will that be handled?
>> Perhaps naively, I thought static types' dicts could be treated as
>> (deeply) immutable, and shared?
>
> They are immutable from Python code but not from C (due to tp_dict).
> Basically, we will document that tp_dict should not be used directly
> (in the public API) and refer users to a public getter function. I'll
> note this in the PEP.

What worries me is that existing users of the API haven't read the new
documentation. What will happen if users do use it?
Or worse, add things to it?

(Hm, the current docs are already rather confusing -- 3.2 added a note
that "It is not safe to ... modify tp_dict with the dictionary C-API.",
but above that it says "extra attributes for the type may be added to
this dictionary [in some cases]")


[...]
Right, with the recent performance improvements it's looking like it
might stand on its own after all.

> So it seems like the bar should be pretty low for this one (assuming
> we get the performance penalty low enough). If it were some massive
> or broadly impactful (or even clearly public) change then I suppose
> you could call the motivation weak. However, this isn't that sort of
> PEP. Honestly, it might not have needed a PEP in the first place if I
> had been a bit more clear about the idea earlier.

Maybe it's good to have a PEP to clear that up :)
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/ZTON72YXUUFV5MX5KIEM3DDNAUAZT4M6/

Brett Cannon

unread,
Feb 23, 2022, 3:02:04 PM2/23/22
to Petr Viktorin, Python-Dev
On Wed, Feb 23, 2022 at 8:19 AM Petr Viktorin <enc...@gmail.com> wrote:
On 23. 02. 22 2:46, Eric Snow wrote:


[SNIP]
 

> So it seems like the bar should be pretty low for this one (assuming
> we get the performance penalty low enough).  If it were some massive
> or broadly impactful (or even clearly public) change then I suppose
> you could call the motivation weak.  However, this isn't that sort of
> PEP.

Yes, but PEPs are not just about complexity, but also impact on users. And "impact" covers backwards-compatibility which includes performance regressions (i.e. making Python slower means it may no longer be a viable for someone with specific performance requirements). So with the initial 4% performance regression it made sense to write a PEP.

Antonio Cuni

unread,
Feb 23, 2022, 6:26:53 PM2/23/22
to Petr Viktorin, pytho...@python.org
On Mon, Feb 21, 2022 at 5:18 PM Petr Viktorin <enc...@gmail.com> wrote:

Should we care about hacks/optimizations that rely on having the only
reference (or all references), e.g. mutating a tuple if it has refcount
1? Immortal objects shouldn't break them (the special case simply won't
apply), but this wording would make them illegal.
AFAIK CPython uses this internally, but I don't know how
prevalent/useful it is in third-party code.

FWIW, a real world example of this is numpy.ndarray.resize(..., refcheck=True):

When refcheck=True (the default), numpy raises an error if you try to resize an array inplace whose refcnt > 2 (although I don't understand why > 2 and not > 1, and the docs aren't very clear about this).

That said, relying on the exact value of the refcnt is very bad for alternative implementations and for HPy, and in particular it is impossible to implement ndarray.resize(refcheck=True) correctly on PyPy. So from this point of view, a wording which explicitly restricts the "legal" usage of the refcnt details would be very welcome.

Sebastian Berg

unread,
Feb 23, 2022, 6:47:19 PM2/23/22
to pytho...@python.org
Yeah, NumPy resizing is a bit of an awkward point, I would be on-board
for just replacing resize for non

NumPy does also have a bit of magic akin to the "string concat" trick
for operations like:

a + b + c

where it will try do magic and use the knowledge that it can
mutate/reuse the temporary array, effectively doing:

tmp = a + b
tmp += c

(which requires some stack walking magic additionally to the refcount!)

Cheers,

Sebastian


> _______________________________________________
> Python-Dev mailing list -- pytho...@python.org
> To unsubscribe send an email to python-d...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/pytho...@python.org/message/ACJIER45M6XLKUWT6TCLB6QXVZSB74EH/
signature.asc

Eric Snow

unread,
Feb 28, 2022, 2:24:07 PM2/28/22
to Inada Naoki, Python-Dev
Responses inline below.

-eric

On Tue, Feb 22, 2022 at 7:22 PM Inada Naoki <songof...@gmail.com> wrote:
> > For a recent example, see
> > https://mail.python.org/archives/list/pytho...@python.org/message/B77BQQFDSTPY4KA4HMHYXJEV3MOU7W3X/.
>
> It is not proven example, but just a hope at the moment. So option is
> fine to prove the idea.
>
> Although I can not read the code, they said "patching ASLR by patching
> `ob_type` fields;".
> It will cause CoW for most objects, isn't it?
>
> So reducing memory write don't directly means reducing CoW.
> Unless we can stop writing on a page completely, the page will be copied.

Yeah, they would have to address that.

> > CPU cache invalidation exists regardless. With the current GIL the
> > effect it is reduced significantly.
>
> It's an interesting point. We can not see the benefit from
> pypeformance, because it doesn't use much data and it runs one process
> at a time.
> So the pyperformance can not make enough stress to the last level
> cache which is shared by many cores.
>
> We need multiprocess performance benchmark apart from pyperformance,
> to stress the last level cache from multiple cores.
> It helps not only this PEP, but also optimizing containers like dict and set.

+1

> Can proposed optimizations to eliminate the penalty guarantee that
> every __del__, weakref are not broken,
> and no memory leak occurs when the Python interpreter is initialized
> and finalized multiple times?
> I haven't confirmed it yet.

They will not break __del__ or weakrefs. No memory will leak after
finalization. If any of that happens then it is a bug.

> FWIW, I filed an issue to remove hash cache from bytes objects.
> https://github.com/faster-cpython/ideas/issues/290
>
> Code objects have many bytes objects, (e.g. co_code, co_linetable, etc...)
> Removing it will save some RAM usage and make immortal bytes truly
> immutable, safe to be shared between interpreters.

+1 Thanks!
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/QKMPALMWGF5366C6PQRSIIFVNXKF4UAM/

Eric Snow

unread,
Feb 28, 2022, 2:36:57 PM2/28/22
to Petr Viktorin, Python-Dev
On Wed, Feb 23, 2022 at 9:16 AM Petr Viktorin <enc...@gmail.com> wrote:
>>> But tp_dict is also public C-API. How will that be handled?
>>> Perhaps naively, I thought static types' dicts could be treated as
>>> (deeply) immutable, and shared?
>>
>> They are immutable from Python code but not from C (due to tp_dict).
>> Basically, we will document that tp_dict should not be used directly
>> (in the public API) and refer users to a public getter function. I'll
>> note this in the PEP.
>
> What worries me is that existing users of the API haven't read the new
> documentation. What will happen if users do use it?
> Or worse, add things to it?

We will probably set it to NULL, so the user code would fail or crash.
I suppose we could set it to a dummy object that emits helpful errors.

However, I don't think that is worth it. We're talking about where
users are directly accessing tp_dict of the builtin static types, not
their own. That is already something they should definitely not be
doing.

> (Hm, the current docs are already rather confusing -- 3.2 added a note
> that "It is not safe to ... modify tp_dict with the dictionary C-API.",
> but above that it says "extra attributes for the type may be added to
> this dictionary [in some cases]")

Yeah, the docs will have to be clarified.

>> Having thought about it some more, I don't think this PEP should be
>> strictly bound to per-interpreter GIL. That is certainly my personal
>> motivation. However, we have a small set of users that would benefit
>> significantly, the change is relatively small and simple, and the risk
>> of breaking users is also small.
>
> Right, with the recent performance improvements it's looking like it
> might stand on its own after all.

Great!

>> Honestly, it might not have needed a PEP in the first place if I
>> had been a bit more clear about the idea earlier.
>
> Maybe it's good to have a PEP to clear that up :)

Yeah, the PEP process has been helpful for that. :)

-eric
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/AKFMFZ45UJXED24YRB4NHQ4HT442XVSP/

Eric Snow

unread,
Feb 28, 2022, 2:39:59 PM2/28/22
to Antonio Cuni, Python-Dev
On Wed, Feb 23, 2022 at 4:21 PM Antonio Cuni <anto...@gmail.com> wrote:
> When refcheck=True (the default), numpy raises an error if you try to resize an array inplace whose refcnt > 2 (although I don't understand why > 2 and not > 1, and the docs aren't very clear about this).
>
> That said, relying on the exact value of the refcnt is very bad for alternative implementations and for HPy, and in particular it is impossible to implement ndarray.resize(refcheck=True) correctly on PyPy. So from this point of view, a wording which explicitly restricts the "legal" usage of the refcnt details would be very welcome.

Thanks for the feedback and example. It helps.

-eric
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/D23Z3C7CQIIGALDRSU4RDDM7GVUAASGW/
Reply all
Reply to author
Forward
0 new messages