[Python-Dev] What is the purpose of the _PyThreadState_Current symbol in Python 3?

126 views
Skip to first unread message

Gabriele

unread,
Sep 26, 2018, 5:26:46 PM9/26/18
to pytho...@python.org
In trying to find the location of a valid instance of PyInterpreterState in the virtual memory of a running Python (3.6) application (using process_vm_read on Linux), I have noticed that I can only rely on _PyThreadState_Current.interp at the very beginning of the execution. If I try to attach to a running Python process, then _PythreadState_Current.interp doesn't seem to point to anything useful to derive the currently running threads and the frame stacks for each of them. This makes me wonder about the purpose of this symbol in the .dynsym section. Apart from a brute force approach for finding a valid PyInterpreterState, is there a more reliable approach for the version of Python that I'm targeting?

Thanks,
Gabriele

Victor Stinner

unread,
Sep 27, 2018, 4:46:52 PM9/27/18
to phoen...@gmail.com, python-dev
Hi,

Le mer. 26 sept. 2018 à 23:27, Gabriele <phoen...@gmail.com> a écrit :
> In trying to find the location of a valid instance of PyInterpreterState in the virtual memory of a running Python (3.6) application (using process_vm_read on Linux),

I understand that you are writing a debugger and you can only *read*
modify, not execute code, right?

> I have noticed that I can only rely on _PyThreadState_Current.interp at the very beginning of the execution. If I try to attach to a running Python process, then _PythreadState_Current.interp doesn't seem to point to anything useful to derive the currently running threads and the frame stacks for each of them.

In the master branch, it's now _PyRuntime.gilstate.tstate_current. If
you run time.sleep(3600) and look into
_PyRuntime.gilstate.tstate_current using gdb, you can a NULL pointer
(tstate_current=0) because Python releases the GIL..

In faulthandler, I call PyGILState_GetThisThreadState() from signal
handlers to get the Python thread state of the current thread... But
this one is implemented using PyThread_tss_get()
(pthread_getspecific() on most platforms). Moreover, it returns NULL
if the current thread is not a Python thread.

There is also _PyGILState_GetInterpreterStateUnsafe() which gives
access to the current Python interpreter:
_PyRuntime.gilstate.autoInterpreterState. From the interpreter, you
can use the linked list of thread states from interp->tstate_head.

I hope that I helped :-)

Obviously, when you access Python internals, the details change at
each Python release... I described the master branch.

Victor
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Gabriele

unread,
Sep 28, 2018, 10:21:12 AM9/28/18
to pytho...@python.org
Hi Victor,

> I understand that you are writing a debugger and you can only *read*
> modify, not execute code, right?

I'm working on a frame stack sampler that runs independently from the
Python process. The project is "Austin"
(https://github.com/P403n1x87/austin). Whilst I could, in principle,
execute code with other system calls, I prefer not to in this case.


> In the master branch, it's now _PyRuntime.gilstate.tstate_current. If
> you run time.sleep(3600) and look into
> _PyRuntime.gilstate.tstate_current using gdb, you can a NULL pointer
> (tstate_current=0) because Python releases the GIL..

I would like my application to make as few assumptions as possible.
The _PyRuntime symbol might not be available if all the symbols have
been stripped out of the binaries. That's why I was trying to rely on
_PyThreadState_Current, which is in the .dynsym section. Judging by
the output of nm -D `which python3` (I'm on Python 3.6.6 at the
moment) I cannot see anything more useful than that.

My current strategy is to try and make something out of this symbol
and then fall back to a brute force approach to scan the .bss section
for valid PyInterpreterState instances (which works reliably well and
is quite fast too, but a bit ugly).


> There is also _PyGILState_GetInterpreterStateUnsafe() which gives
> access to the current Python interpreter:
> _PyRuntime.gilstate.autoInterpreterState. From the interpreter, you
> can use the linked list of thread states from interp->tstate_head.
>
> I hope that I helped :-)

Yes thanks! Your comment made me realise why I can use
PyThreadState_Current at the very beginning, and it is because Python
is going through the intensive startup process, which involves, among
other things, the loading of frozen modules (I can clearly see most if
not all the steps in the output of Austin, as mentioned in the repo's
README). During this phase, the main (and only thread) holds the GIL
and is quite busy doing stuff. The long-running applications that I
was trying to attach to have very long wait periods where they sit
idle waiting for a timer to trigger the next operations, that fire
very quickly and put the threads back to sleep again.

If this is what the _PyThreadState_Current is designed for, then I
guess I cannot really rely on it, especially when attaching Austin to
another process.

Best regards,
Gabriele

Nathaniel Smith

unread,
Sep 28, 2018, 6:14:17 PM9/28/18
to Gabriele, Python Dev
What information do you wish the interpreter provided, that would make your program simpler and more reliable?

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Gabriele

unread,
Sep 28, 2018, 6:31:34 PM9/28/18
to n...@pobox.com, pytho...@python.org
On Fri, 28 Sep 2018 at 23:12, Nathaniel Smith <n...@pobox.com> wrote:
> What information do you wish the interpreter provided, that would make your program simpler and more reliable?

An exported global variable that points to the head of the
PyInterpreterState linked list (i.e. the return value of
PyInterpreterState_Head). This way my program could just look this up
from the dynsym section instead of scanning a dump of the bss section
in memory to find a possible candidate. It would be grand if also the
string in the rodata section that gives the Python version could be
dereferenced from dynsym, but that's a different question.
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Nathaniel Smith

unread,
Sep 29, 2018, 6:02:36 AM9/29/18
to Gabriele, Python Dev
On Fri, Sep 28, 2018 at 3:29 PM, Gabriele <phoen...@gmail.com> wrote:
> On Fri, 28 Sep 2018 at 23:12, Nathaniel Smith <n...@pobox.com> wrote:
>> What information do you wish the interpreter provided, that would make your program simpler and more reliable?
>
> An exported global variable that points to the head of the
> PyInterpreterState linked list (i.e. the return value of
> PyInterpreterState_Head). This way my program could just look this up
> from the dynsym section instead of scanning a dump of the bss section
> in memory to find a possible candidate.

Hmm, it looks like in 3.7+, _PyRuntime is marked PyAPI_DATA, which I
think should make it exported from dynsym?

https://github.com/python/cpython/blob/4b430e5f6954ef4b248e95bfb4087635dcdefc6d/Include/internal/pystate.h#L206

And PyInterpreterState_Head is just _PyRuntime.interpreters.head. So
maybe this is already done...

-n

--
Nathaniel J. Smith -- https://vorpus.org

Gabriele

unread,
Sep 29, 2018, 12:16:38 PM9/29/18
to n...@pobox.com, pytho...@python.org
Ah ok, this might be related to Victor's observation based on the
latest sources. I haven't tested 3.7 yet, but if _PyRuntime is
available from dynsym then this is already enough.

Thanks,
Gabriele


On Sat, 29 Sep 2018 at 11:00, Nathaniel Smith <n...@pobox.com> wrote:
>
> On Fri, Sep 28, 2018 at 3:29 PM, Gabriele <phoen...@gmail.com> wrote:
> > On Fri, 28 Sep 2018 at 23:12, Nathaniel Smith <n...@pobox.com> wrote:
> >> What information do you wish the interpreter provided, that would make your program simpler and more reliable?
> >
> > An exported global variable that points to the head of the
> > PyInterpreterState linked list (i.e. the return value of
> > PyInterpreterState_Head). This way my program could just look this up
> > from the dynsym section instead of scanning a dump of the bss section
> > in memory to find a possible candidate.
>
> Hmm, it looks like in 3.7+, _PyRuntime is marked PyAPI_DATA, which I
> think should make it exported from dynsym?
>
> https://github.com/python/cpython/blob/4b430e5f6954ef4b248e95bfb4087635dcdefc6d/Include/internal/pystate.h#L206
>
> And PyInterpreterState_Head is just _PyRuntime.interpreters.head. So
> maybe this is already done...
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org

--
"Egli è scritto in lingua matematica, e i caratteri son triangoli,
cerchi, ed altre figure
geometriche, senza i quali mezzi è impossibile a intenderne umanamente parola;
senza questi è un aggirarsi vanamente per un oscuro laberinto."

-- G. Galilei, Il saggiatore.

Reply all
Reply to author
Forward
0 new messages