[Python-Dev] Add Py_SETREF and Py_XSETREF to the stable C API

89 views
Skip to first unread message

Serhiy Storchaka

unread,
Nov 8, 2017, 11:34:21 AM11/8/17
to pytho...@python.org
Macros Py_SETREF and Py_XSETREF were introduced in 3.6 and backported to
all maintained versions ([1] and [2]). Despite their names they are
private. I think that they are enough stable now and would be helpful in
third-party code. Are there any objections against adding them to the
stable C API? [3]

[1] https://bugs.python.org/issue20440
[2] https://bugs.python.org/issue26200
[3] https://bugs.python.org/issue31983

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Victor Stinner

unread,
Nov 8, 2017, 11:39:38 AM11/8/17
to Serhiy Storchaka, Python Dev
I like these macros!

Technically, would it be possible to use an inline function instead of
a macro for Python 3.7?

Victor
> https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com

Serhiy Storchaka

unread,
Nov 8, 2017, 11:50:28 AM11/8/17
to pytho...@python.org
08.11.17 18:37, Victor Stinner пише:

> I like these macros!
>
> Technically, would it be possible to use an inline function instead of
> a macro for Python 3.7?

No, unless there is a way to pass arguments by reference in C99.
Py_SETREF(x, y) is the safe equivalent of

x = y;
Py_DECREF(x, y);

Raymond Hettinger

unread,
Nov 8, 2017, 9:10:51 PM11/8/17
to Serhiy Storchaka, Python-Dev@Python. Org

> On Nov 8, 2017, at 8:30 AM, Serhiy Storchaka <stor...@gmail.com> wrote:
>
> Macros Py_SETREF and Py_XSETREF were introduced in 3.6 and backported to all maintained versions ([1] and [2]). Despite their names they are private. I think that they are enough stable now and would be helpful in third-party code. Are there any objections against adding them to the stable C API? [3]

I have mixed feeling about this. You and Victor seem to really like these macros, but they have been problematic for me. I'm not sure whether it is a conceptual issue or a naming issue, but the presence of these macros impairs my ability to read code and determine whether the refcounts are correct. I usually end-up replacing the code with the unrolled macro so that I can count the refs across all the code paths.

The other issue is that when there are multiple occurrences of these macros for multiple variables, it interferes with my best practice of deferring all decrefs until the data structures are in a fully consistent state. Any one of these can cause arbitrary code to run. I greatly prefer putting all the decrefs at the end to increase my confidence that it is okay to run other code that might reenter the current code. Pure python functions effectively have this built-in because the locals all get decreffed at the end of the function when a return-statement is encountered. That practice helps me avoid hard to spot re-entrancy issues.

Lastly, I think we should have a preference to not grow the stable C API. Bigger APIs are harder to learn and remember, not so much for you and Victor who use these frequently, but for everyone else who has to lookup all the macros whose function isn't immediately self-evident.


Raymond

Serhiy Storchaka

unread,
Nov 9, 2017, 5:46:15 AM11/9/17
to pytho...@python.org
09.11.17 04:08, Raymond Hettinger пише:

>
>> On Nov 8, 2017, at 8:30 AM, Serhiy Storchaka <stor...@gmail.com> wrote:
>>
>> Macros Py_SETREF and Py_XSETREF were introduced in 3.6 and backported to all maintained versions ([1] and [2]). Despite their names they are private. I think that they are enough stable now and would be helpful in third-party code. Are there any objections against adding them to the stable C API? [3]
>
> I have mixed feeling about this. You and Victor seem to really like these macros, but they have been problematic for me. I'm not sure whether it is a conceptual issue or a naming issue, but the presence of these macros impairs my ability to read code and determine whether the refcounts are correct. I usually end-up replacing the code with the unrolled macro so that I can count the refs across all the code paths.

If the problem is with naming, what names do you prefer? This already
was bikeshedded (I insisted on discussing names before introducing the
macros), but may now you have better ideas?

The current code contains 212 usages of Py_SETREF and Py_XSETREF. Maybe
10% of them correctly used temporary variables before introducing these
macros, and these macros just made the code shorter. But in the rest of
cases the refcount was decremented before setting new value. Not always
this caused a problem, but it is too hard to prove that using such code
is safe in every concrete case. Unrolling all these invocations will
make the code larger and more cumbersome, and it is hard to do
automatically.

> The other issue is that when there are multiple occurrences of these macros for multiple variables, it interferes with my best practice of deferring all decrefs until the data structures are in a fully consistent state. Any one of these can cause arbitrary code to run. I greatly prefer putting all the decrefs at the end to increase my confidence that it is okay to run other code that might reenter the current code. Pure python functions effectively have this built-in because the locals all get decreffed at the end of the function when a return-statement is encountered. That practice helps me avoid hard to spot re-entrancy issues.

I agree with you. If you need to set two or more attributes
synchronously, Py_SETREF will not help you. This should be clearly
explained in the documentation.

Several subsequent Py_SETREFs may be an error. When I created my patches
for using Py_SETREF I encountered several such cases and used different
code for them. Maybe still there is not completely correct code, but in
any case it is better now than before introducing Py_SETREF.

But in many case you need to set only one attribute or different
attributes are not tightly related.

> Lastly, I think we should have a preference to not grow the stable C API. Bigger APIs are harder to learn and remember, not so much for you and Victor who use these frequently, but for everyone else who has to lookup all the macros whose function isn't immediately self-evident.

I agree with you again. But these macros are pretty helpful. They allow
to write the safer code easy. And they are more used than any other C
API addition in 3.5, 3.6, and 3.7.

3.7:
Py_X?SETREF -- 216
Py_UNREACHABLE -- 65
Py_RETURN_RICHCOMPARE -- 15
PyImport_GetModule -- 28
PyTraceMalloc_(T|Unt)rack -- 9
PyOS_(BeforeFork|AfterFork_(Parent|Child)) -- 24
Py_tss_NEEDS_INIT -- 6
PyInterpreterState_GetID -- 3

3.6:
PySlice_Unpack -- 22
PySlice_AdjustIndices -- 22
PyErr_SetImportErrorSubclass -- 3
PyErr_ResourceWarning -- 6
PyOS_FSPath -- 9
Py_FinalizeEx -- 21
Py_MEMBER_SIZE -- 4

3.5:
PyCodec_NameReplaceErrors -- 3
PyErr_FormatV -- 6
PyCoro_New -- 3
PyCoro_CheckExact -- 14
PyModuleDef_Init -- 30
PyModule_FromDefAndSpec2? -- 10
PyModule_ExecDef -- 3
PyModule_SetDocString -- 4
PyModule_AddFunctions -- 3
PyNumber_MatrixMultiply -- 4
PyNumber_InPlaceMatrixMultiply -- 4
Py_DecodeLocale -- 25
Py_EncodeLocale -- 11

Some older macros:
Py_STRINGIFY -- 15
Py_ABS -- 54
Py_MIN -- 66
Py_MAX - 56

The above number include declaration and definition, hence remove 2 or 3
per name for getting the number of usages.

Only added in 3.4 Py_UNUSED beats them (291) because it is used in
generated Argument Clinic code.

Victor Stinner

unread,
Nov 9, 2017, 6:08:44 AM11/9/17
to Raymond Hettinger, Serhiy Storchaka, Python-Dev@Python. Org
Recently, Oren Milman fixed multiple bugs when an __init__() method
was called twice. IMHO Py_SETREF() was nicely used in __init__():

https://github.com/python/cpython/commit/e56ab746a965277ffcc4396d8a0902b6e072d049

https://github.com/python/cpython/commit/c0cabc23bbe474d542ff8a4f1243f4ec3cce5549

While it's possible to rewrite the code *correctly* without
PY_SETREF(), it would be much more verbose. Here the fix remains a
single line:

- self->archive = filename;
+ Py_XSETREF(self->archive, filename);

Victor
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com

Victor Stinner

unread,
Nov 9, 2017, 6:28:54 AM11/9/17
to Raymond Hettinger, Serhiy Storchaka, Python-Dev@Python. Org
2017-11-09 3:08 GMT+01:00 Raymond Hettinger <raymond....@gmail.com>:
> I greatly prefer putting all the decrefs at the end to increase my confidence that it is okay to run other code that might reenter the current code.

There are 3 patterns to update C attributes of an object:

(1)
Py_XDECREF(obj->attr); // can call Python code
obj->attr = new_value;

or

(2)
old_value = obj->attr;
obj->attr = new_value;
Py_XDECREF(old_value); // can call Python code

or

(3)
old_value = obj->attr;
obj->attr = new_value;
... // The assumption here is that nothing here
... // can call arbitrary Python code
// Finally, after setting all other attributes
Py_XDECREF(old_value); // can call Python code


Pattern (1) is likely to be vulnerable to reentrancy issue:
Py_XDECREF() can call arbitrary Python code indirectly by the garbage
collector, while the object being modified contains a *borrowed*
reference instead of a *strong* reference, or can even refer an object
which was just destroyed.

Pattern (2) is better: the object always keeps a strong reference,
*but* the modified attribute can be inconsistent with other
attributes. At least, you prevent hard crashes.

Pattern (3) is likely the most correct way to write C code to
implement a Python object... but it's harder to write such code
correctly :-( You have to be careful to not leak a reference.


If I understood correctly, the purpose of the Py_SETREF() macro is not
to replace (3) with (2), but to fix all incorrect code written as (1).
If I recall correctly, Serhiy modified a *lot* of code written as (1)
when he implemented Py_SETREF().


> Pure python functions effectively have this built-in because the locals all get decreffed at the end of the function when a return-statement is encountered. That practice helps me avoid hard to spot re-entrancy issues.

Except if you use a lock, all Python methods are written as (2): a
different thread or a signal handler is likely to see the object as
inconsistent, when accessed between two instructions modifying an
object attributes.

Example:

def __init__(self, value):
self.value = value
self.double = value * 2

def increment(self):
self.value += 1
# object inconsistent here
self.double *= 2

The increment() method is not atomic: if the object is accessed at "#
object inconsistent here", the object is seen in an inconsistent
state.

Victor

Raymond Hettinger

unread,
Nov 9, 2017, 7:24:21 AM11/9/17
to Serhiy Storchaka, Python-Dev@Python. Org

> On Nov 9, 2017, at 2:44 AM, Serhiy Storchaka <stor...@gmail.com> wrote:
>
> If the problem is with naming, what names do you prefer? This already was bikeshedded (I insisted on discussing names before introducing the macros), but may now you have better ideas?

It didn't really seem like a bad idea until after you swept through the code with 200+ applications of the macro and I saw how unclear the results were. Even code that I wrote myself is now harder for me to grok (for example, the macro was applied 17 times to already correct code in itertools).

We used to employ a somewhat plain coding style that was easy to walk through, but the following examples seem opaque. I find it takes practice to look at any one of these and say that it is unequivocally correct (were the function error return arguments handled correctly, are the typecasts proper, at what point can a reentrant call occur, which is the source operand and which is the destination, is the macro using either of the operands twice, is the destination operand an allowable lvalue, do I need to decref the source operand afterwards, etc):

Py_SETREF(((PyHeapTypeObject*)type)->ht_name, value)
Py_SETREF(newconst, PyFrozenSet_New(newconst));
Py_XSETREF(c->u->u_private, s->v.ClassDef.name);
Py_SETREF(*p, t);
Py_XSETREF(self->lineno, PyTuple_GET_ITEM(info, 1));
Py_SETREF(entry->path, PyUnicode_EncodeFSDefault(entry->path));
Py_XSETREF(self->checker, PyObject_GetAttrString(ob, "_check_retval_"));
Py_XSETREF(fut->fut_source_tb, _PyObject_CallNoArg(traceback_extract_stack));

Stylistically, all of these seem awkward and I think there is more to it than just the name. I'm not sure it is wise to pass complex inputs into a two-argument macro that makes an assignment and has a conditional refcount side-effect. Even now, one of the above looks to me like it might not be correct.

Probably, we're the wrong people to be talking about this. The proposal is to make these macros part of the official API so that it starts to appear in source code everywhere. The question isn't whether the above makes sense to you and me; instead, it is whether other people can make heads or tails out the above examples. As a result of making the macros official, will the Python world have a net increase in complexity or decrease in complexity?

My personal experience with the macros hasn't been positive. Perhaps everyone else thinks it's fine. If so, I won't stand in your way.


Raymond

Antoine Pitrou

unread,
Nov 9, 2017, 7:37:11 AM11/9/17
to pytho...@python.org
On Thu, 9 Nov 2017 04:22:20 -0800
Raymond Hettinger <raymond....@gmail.com> wrote:
>
> Probably, we're the wrong people to be talking about this. The proposal is to make these macros part of the official API so that it starts to appear in source code everywhere. The question isn't whether the above makes sense to you and me; instead, it is whether other people can make heads or tails out the above examples.

Generally I would advocate anyone wanting to write a third-party C
extension, but not very familiar with the C API and its quirks, use
Cython instead. I'm not sure if that's an argument for the SETREF APIs
to remain private or to become public :-)

Regards

Antoine.

Victor Stinner

unread,
Nov 9, 2017, 7:40:39 AM11/9/17
to Raymond Hettinger, Serhiy Storchaka, Python-Dev@Python. Org
Hum, to give more context to the discussion, the two discussed macros
are documented this way:

#ifndef Py_LIMITED_API
/* Safely decref `op` and set `op` to `op2`.
*
* As in case of Py_CLEAR "the obvious" code can be deadly:
*
* Py_DECREF(op);
* op = op2;
*
* The safe way is:
*
* Py_SETREF(op, op2);
*
* That arranges to set `op` to `op2` _before_ decref'ing, so that any code
* triggered as a side-effect of `op` getting torn down no longer believes
* `op` points to a valid object.
*
* Py_XSETREF is a variant of Py_SETREF that uses Py_XDECREF instead of
* Py_DECREF.
*/

#define Py_SETREF(op, op2) \
do { \
PyObject *_py_tmp = (PyObject *)(op); \
(op) = (op2); \
Py_DECREF(_py_tmp); \
} while (0)

#define Py_XSETREF(op, op2) \
do { \
PyObject *_py_tmp = (PyObject *)(op); \
(op) = (op2); \
Py_XDECREF(_py_tmp); \
} while (0)

#endif /* ifndef Py_LIMITED_API */

Victor
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com

Serhiy Storchaka

unread,
Nov 9, 2017, 8:31:33 AM11/9/17
to pytho...@python.org
09.11.17 14:22, Raymond Hettinger пише:

> Stylistically, all of these seem awkward and I think there is more to it than just the name. I'm not sure it is wise to pass complex inputs into a two-argument macro that makes an assignment and has a conditional refcount side-effect. Even now, one of the above looks to me like it might not be correct.

If you have found an incorrect code, please open an issue and provide a
patch. But recently you have rewrote the correct code (Py_SETREF was not
involved) in more complicated way [1] and have rejected my patch that
gets rid of the duplication of this complicated code [2]. Please don't
"fix" the code that is not broken.

[1] https://bugs.python.org/issue26491
[2] https://bugs.python.org/issue31585

> Probably, we're the wrong people to be talking about this. The proposal is to make these macros part of the official API so that it starts to appear in source code everywhere. The question isn't whether the above makes sense to you and me; instead, it is whether other people can make heads or tails out the above examples. As a result of making the macros official, will the Python world have a net increase in complexity or decrease in complexity?

I afraid that these macros will be used in any case, even when they are
not the part of an official C API, because they are handy. The main
purpose of documenting them officially is documenting in what cases
these macros are appropriate and make the code more reliable, and in
what cases they are not enough and a more complex code should be used.
This would be a lesson about correct replacing references. I didn't
write this in the source comment because it was purposed for experienced
Python core developers, and all usages under our control and passes a
peer review.

Nick Coghlan

unread,
Nov 9, 2017, 9:11:24 PM11/9/17
to Antoine Pitrou, pytho...@python.org
On 9 November 2017 at 22:35, Antoine Pitrou <soli...@pitrou.net> wrote:
> On Thu, 9 Nov 2017 04:22:20 -0800
> Raymond Hettinger <raymond....@gmail.com> wrote:
>>
>> Probably, we're the wrong people to be talking about this. The proposal is to make these macros part of the official API so that it starts to appear in source code everywhere. The question isn't whether the above makes sense to you and me; instead, it is whether other people can make heads or tails out the above examples.
>
> Generally I would advocate anyone wanting to write a third-party C
> extension, but not very familiar with the C API and its quirks, use
> Cython instead. I'm not sure if that's an argument for the SETREF APIs
> to remain private or to become public :-)

I'm with Antoine on this - we should be pushing folks writing
extension modules towards code generators like Cython, cffi, SWIG, and
SIP, support libraries like Boost::Python, or safer languages like
Rust (which can then be wrapped with cffi), rather than encouraging
more bespoke C/C++ extensions modules with handcrafted refcount
management. There's a reason the only parts of
https://packaging.python.org/guides/packaging-binary-extensions/ that
have actually been filled in are the ones explaining how to use a tool
to write the extension module for you :)

For me, that translates to being -1 on making these part of the public
API - for code outside CPython, our message should consistently be "Do
not roll your own refcount management, get a code generator or library
to handle it for you".

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Stefan Krah

unread,
Nov 10, 2017, 6:28:20 AM11/10/17
to pytho...@python.org
On Fri, Nov 10, 2017 at 12:09:12PM +1000, Nick Coghlan wrote:
> I'm with Antoine on this - we should be pushing folks writing
> extension modules towards code generators like Cython, cffi, SWIG, and
> SIP, support libraries like Boost::Python, or safer languages like
> Rust (which can then be wrapped with cffi), rather than encouraging
> more bespoke C/C++ extensions modules with handcrafted refcount
> management. There's a reason the only parts of
> https://packaging.python.org/guides/packaging-binary-extensions/ that
> have actually been filled in are the ones explaining how to use a tool
> to write the extension module for you :)

They will be slower and in my experience not easier to maintain -- quite
the opposite.



Stefan Krah
Reply all
Reply to author
Forward
0 new messages