Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Embedding Python crash on PyTuple_New

24 views
Skip to first unread message

Arnaud Loonstra

unread,
Nov 23, 2021, 7:09:01 AM11/23/21
to
Hi,

I've got Python embedded successfully in a program up until now as I'm
now running into weird GC related segfaults. I'm currently trying to
debug this but my understanding of CPython limits me here.

I'm creating a Tuple in C but it crashes on creating it after a while.
It doesn't make sense which makes me wonder something else must be
happening? Could be it just crashes here because the GC is cleaning up
stuff completely unrelated to the allocation of the new tuple? How can I
troubleshoot this?

I've got CPython compiled with --with-valgrind --without-pymalloc
--with-pydebug

In C I'm creating a tuple with the following method:

static PyObject *
s_py_zosc_tuple(pythonactor_t *self, zosc_t *oscmsg)
{
assert(self);
assert(oscmsg);
char *format = zosc_format(oscmsg);

PyObject *rettuple = PyTuple_New((Py_ssize_t) strlen(format) );

It segfaults here (frame 16) after 320 times (consistently)


1 __GI_raise raise.c 49 0x7ffff72c4e71
2 __GI_abort abort.c 79 0x7ffff72ae536
3 fatal_error pylifecycle.c 2183 0x7ffff7d84b4f
4 Py_FatalError pylifecycle.c 2193 0x7ffff7d878b2
5 _PyObject_AssertFailed object.c 2200 0x7ffff7c93cf2
6 visit_decref gcmodule.c 378 0x7ffff7dadfd5
7 tupletraverse tupleobject.c 623 0x7ffff7ca3e81
8 subtract_refs gcmodule.c 406 0x7ffff7dad340
9 collect gcmodule.c 1054 0x7ffff7dae838
10 collect_with_callback gcmodule.c 1240 0x7ffff7daf17b
11 collect_generations gcmodule.c 1262 0x7ffff7daf3f6
12 _PyObject_GC_Alloc gcmodule.c 1977 0x7ffff7daf4f2
13 _PyObject_GC_Malloc gcmodule.c 1987 0x7ffff7dafebc
14 _PyObject_GC_NewVar gcmodule.c 2016 0x7ffff7daffa5
15 PyTuple_New tupleobject.c 118 0x7ffff7ca4da7
16 s_py_zosc_tuple pythonactor.c 366 0x55555568cc82
17 pythonactor_socket pythonactor.c 664 0x55555568dac7
18 pythonactor_handle_msg pythonactor.c 862 0x55555568e472
19 pythonactor_handler pythonactor.c 828 0x55555568e2e2
20 sphactor_actor_run sphactor_actor.c 855 0x5555558cb268
... <More>

Any pointer really appreciated.

Rg,

Arnaud

Arnaud Loonstra

unread,
Nov 23, 2021, 9:21:47 AM11/23/21
to
I've found enabling PYTHONTRACEMALLOC=1 in the environment gives me a
pointer to where to offending block was allocated.

I'm calling this method from C:

18 def handleSocket(self, addr, data, type, name, uuid):
19 if addr == "/pulse":
20 self.lampval += 1
21 return (addr, [0,0])

Modules/gcmodule.c:108: gc_decref: Assertion "gc_get_refs(g) > 0"
failed: refcount is too small
Memory block allocated at (most recent call first):
File "/home/arnaud/src/build-gazebosc-Desktop-Debug/bin/lampen.py",
line 21

object address : 0x7fffd81154c0
object refcount : 1
object type : 0x7ffff7f3df20
object type name: list
object repr : [117, 0]

Fatal Python error: _PyObject_AssertFailed
Python runtime state: initialized

Current thread 0x00007ffff2481640 (most recent call first):
File "/home/arnaud/src/build-gazebosc-Desktop-Debug/bin/lampen.py",
line 21 in handleSocket

Thread 0x00007ffff5d288c0 (most recent call first):
<no Python frame>

Now it clearly says the refcount is 1 so I'm puzzling what it means?

Rg,

Arnaud

MRAB

unread,
Nov 23, 2021, 9:38:25 AM11/23/21
to
You're creating a tuple that'll have the same number of members as the
length of a string? That looks strange to me.

How are you setting the tuple's members?

Arnaud Loonstra

unread,
Nov 23, 2021, 9:44:23 AM11/23/21
to
It's from a serialisation format called OSC. The string describes the
type of bytes, every character is a type.

I'm creating the tuple as follows:

PyObject *rettuple = PyTuple_New((Py_ssize_t) strlen(format) );

Then I iterate the OSC message using the format string, (just showing
handling an int (i))

char type = '0';
Py_ssize_t pos = 0;
const void *data = zosc_first(oscmsg, &type);
while(data)
{
switch (type)
{
case('i'):
{
int32_t val = 9;
int rc = zosc_pop_int32(oscmsg, &val);
assert(rc == 0);
PyObject *o = PyLong_FromLong((long)val);
assert( o );
rc = PyTuple_SetItem(rettuple, pos, o);
assert(rc == 0);
break;
}

Full code is here:

https://github.com/hku-ect/gazebosc/blob/822452dfa27201db274d37ce09e835d98fe500b2/Actors/pythonactor.c#L360

MRAB

unread,
Nov 23, 2021, 10:17:55 AM11/23/21
to
Looking at that code, you have:

PyObject *o = Py_BuildValue("s#", str, 1);

what I'd check is the type of the 1 that you're passing. Wouldn't the
compiler assume that it's an int?

The format string tells the function to expect a Py_ssize_t, but how
would the compiler know that?

MRAB

unread,
Nov 23, 2021, 10:38:04 AM11/23/21
to
Looking at https://www.mankier.com/3/zosc, it says for 'T' and 'F' "(no
value required)", but you're doing:

int rc = zosc_pop_bool(oscmsg, &bl);

If no value is required, is there a bool there to be popped?

Arnaud Loonstra

unread,
Nov 23, 2021, 11:06:02 AM11/23/21
to
The value is only required to set a user provided boolean to the value
in the message. There's no boolean value encoded in the message, only
the T and F in the format string.

With regards to the Py_BuildValue("s#", str, 1);, that's a valid point.
I'll fix that. However in the segfaults I'm testing that code is not
touched.

I'm now testing different parts of the code to see if it runs stable.
I've found it runs stable if I do not process the returned tuple.

PyObject *pReturn = PyObject_CallMethod(self->pyinstance,
"handleSocket", "sOsss",
oscaddress,
py_osctuple,
ev->type, ev->name, strdup(ev->uuid));
Py_XINCREF(pReturn);

https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L673

and a bit further in the code I convert the Python tuple to an OSC message:

zosc_t *retosc = s_py_zosc(pAddress, pData);

https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L732

If I change that line to:

zosc_t *retosc = zosc_create("/temp", "ii", 32, 64);

It runs stable.

I would turn my attention to s_py_zosc function but I'm not sure. Since
the errors are GC related it could caused anywhere?

https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L286

MRAB

unread,
Nov 23, 2021, 12:31:46 PM11/23/21
to
You can use "C" as a format string for Py_BuildValue to convert a C int
representing a character to a Python string.

> I'm now testing different parts of the code to see if it runs stable.
> I've found it runs stable if I do not process the returned tuple.
>
> PyObject *pReturn = PyObject_CallMethod(self->pyinstance,
> "handleSocket", "sOsss",
> oscaddress,
> py_osctuple,
> ev->type, ev->name, strdup(ev->uuid));
> Py_XINCREF(pReturn);
>
Why the Py_XINCREF? PyObject_CallMethod returns a new reference. The
Py_DECREF that you do later won't destroy the object because of that
additional Py_XINCREF, so that's a memory leak.

> https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L673
>
> and a bit further in the code I convert the Python tuple to an OSC message:
>
> zosc_t *retosc = s_py_zosc(pAddress, pData);
>
> https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L732
>
> If I change that line to:
>
> zosc_t *retosc = zosc_create("/temp", "ii", 32, 64);
>
> It runs stable.
>
> I would turn my attention to s_py_zosc function but I'm not sure. Since
> the errors are GC related it could caused anywhere?
>
Basically, yes, but I won't be surprised if it was due to too few
INCREFs or too many DECREFs somewhere.

> https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L286
>
Incidentally, in s_py_zosc_tuple, you're not doing "assert(rc == 0);"
after "after zosc_pop_float" or "zosc_pop_double".

MRAB

unread,
Nov 23, 2021, 12:44:28 PM11/23/21
to
On 2021-11-23 17:31, MRAB wrote:
> On 2021-11-23 16:04, Arnaud Loonstra wrote:
[snip]
>>
>> I would turn my attention to s_py_zosc function but I'm not sure. Since
>> the errors are GC related it could caused anywhere?
>>
> Basically, yes, but I won't be surprised if it was due to too few
> INCREFs or too many DECREFs somewhere.
>
>> https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L286
>>
> Incidentally, in s_py_zosc_tuple, you're not doing "assert(rc == 0);"
> after "after zosc_pop_float" or "zosc_pop_double".
>
Here's something interesting in s_py_zosc:

PyTypeObject* type = Py_TYPE(item);
const char * typename = _PyType_Name(type);
if (streq(typename, "c_int"))
{
zsys_warning("Unsupported ctypes.c_int until we find a
way to access the value :S");
}
else
zsys_warning("unsupported python type");

Py_DECREF(type);

According to the docs, Py_TYPE returns a borrowed reference, but you're
DECREFing it. If you call s_py_zosc enough times, the reference count of
the type will drop to 0. I think that might be the problem.

Arnaud Loonstra

unread,
Nov 23, 2021, 3:27:04 PM11/23/21
to
On 23-11-2021 18:31, MRAB wrote:
> On 2021-11-23 16:04, Arnaud Loonstra wrote:
>> I would turn my attention to s_py_zosc function but I'm not sure. Since
>> the errors are GC related it could caused anywhere?
>>
> Basically, yes, but I won't be surprised if it was due to too few
> INCREFs or too many DECREFs somewhere.
>
>> https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L286
>>
>>
> Incidentally, in s_py_zosc_tuple, you're not doing "assert(rc == 0);"
> after "after zosc_pop_float" or "zosc_pop_double".

Thanks for those pointers! I think your intuition is right. I might have
found the bugger. In s_py_zosc I call Py_DECREF on pAddress and pData.
However they are acquired by PyTuple_GetItem which returns a borrowed
reference. I think pAddress and pData are then also 'decrefed' when the
pReturn tuple which contains pAddress and pData is 'decrefed'?

I'm testing it now but it's running stable for a while now.

Preparing a PR: https://github.com/hku-ect/gazebosc/pull/181

MRAB

unread,
Nov 23, 2021, 7:46:36 PM11/23/21
to
Yes, members of a container are DECREFed when the container is destroyed.

It's bad practice for a function to DECREF its arguments unless the
function's sole purpose is cleanup because the function won't know where
the arguments came from.

Arnaud Loonstra

unread,
Nov 24, 2021, 2:59:37 AM11/24/21
to

On 24-11-2021 01:46, MRAB wrote:
[snip]

>>>>
>>> Basically, yes, but I won't be surprised if it was due to too few
>>> INCREFs or too many DECREFs somewhere.
>>>
>>>> https://github.com/hku-ect/gazebosc/blob/505b30c46bf3f78d188c3f575c80e294d3db7e5d/Actors/pythonactor.c#L286
>>>>
>>>>
>>> Incidentally, in s_py_zosc_tuple, you're not doing "assert(rc == 0);"
>>> after "after zosc_pop_float" or "zosc_pop_double".
>>
>> Thanks for those pointers! I think your intuition is right. I might have
>> found the bugger. In s_py_zosc I call Py_DECREF on pAddress and pData.
>> However they are acquired by PyTuple_GetItem which returns a borrowed
>> reference. I think pAddress and pData are then also 'decrefed' when the
>> pReturn tuple which contains pAddress and pData is 'decrefed'?
>>
> Yes, members of a container are DECREFed when the container is destroyed.
>
> It's bad practice for a function to DECREF its arguments unless the
> function's sole purpose is cleanup because the function won't know where
> the arguments came from.
>

I'm finding it out now. What strikes me was how hard it was to debug
this. I think it was caused because I INCREFed the return object. I
guess I did that to workaround the wrong DECREF data in the return
object. However that caused a hell to debug. I'm really curious what the
best practices are for debugging embedded CPython.

Thanks big time for your feedback!

MRAB

unread,
Nov 24, 2021, 1:16:13 PM11/24/21
to
What I do when writing the code is add comments showing what variables
refer to an object at that point in the code, each suffixed with "+" if
it owns a reference and/or "?" if it could be NULL.

Example 1:

//>
PyObject *my_tuple = PyTuple_New(count);
//> my_tuple+?
if (!my_tuple)
goto error;
//> my_tuple+

"//>" means that there are no variables that point to an object.

"//> my_tuple+?" means that "my_tuple" points to an object and it owns a
reference, but it might be NULL.

"//> my_tuple+" means that "my_tuple" points to an object and it owns a
reference.

Example 2:

//>
PyObject *my_item = PyList_New(my_list, index);
//> my_tuple?
if (!my_tuple)
goto error;
//> my_tuple

"//>" means that there are no variables that point to an object.

"//> my_tuple?" means that "my_tuple" points to an object, but it might
be NULL.

"//> my_tuple" means that "my_tuple" points to an object.

0 new messages