[Python-Dev] Python initialization and embedded Python

211 views
Skip to first unread message

Victor Stinner

unread,
Nov 17, 2017, 7:04:32 PM11/17/17
to Python Dev
Hi,

The CPython internals evolved during Python 3.7 cycle. I would like to
know if we broke the C API or not.

Nick Coghlan and Eric Snow are working on cleaning up the Python
initialization with the "on going" PEP 432:
https://www.python.org/dev/peps/pep-0432/

Many global variables used by the "Python runtime" were move to a new
single "_PyRuntime" variable (big structure made of sub-structures).
See Include/internal/pystate.h.

A side effect of moving variables from random files into header files
is that it's not more possible to fully initialize _PyRuntime at
"compilation time". For example, previously, it was possible to refer
to local C function (functions declared with "static", so only visible
in the current file). Now a new "initialization function" is required
to must be called.

In short, it means that using the "Python runtime" before it's
initialized by _PyRuntime_Initialize() is now likely to crash. For
example, calling PyMem_RawMalloc(), before calling
_PyRuntime_Initialize(), now calls the function NULL: dereference a
NULL pointer, and so immediately crash with a segmentation fault.

I'm writing this email to ask if this change is an issue or not to
embedded Python and the Python C API. Is it still possible to call
"all" functions of the C API before calling Py_Initialize()?

I was bitten by the bug while reworking the Py_Main() function to
split it into subfunctions and cleanup the code to handle the command
line arguments and environment variables. I fixed the issue in main()
by calling _PyRuntime_Initialize() as soon as possible: it's now the
first instruction of main() :-) (See Programs/python.c)

To give a more concrete example: Py_DecodeLocale() is the recommanded
function to decode bytes from the operating system, but this function
calls PyMem_RawMalloc() which does crash before
_PyRuntime_Initialize() is called. Is Py_DecodeLocale() used to
initialize Python?

For example, "void Py_SetProgramName(wchar_t *);" expects a text
string, whereas main() gives argv as bytes. Calling
Py_SetProgramName() from argv requires to decode bytes... So use
Py_DecodeLocale()...

Should we do something in Py_DecodeLocale()? Maybe crash if
_PyRuntime_Initialize() wasn't called yet?

Maybe, the minimum change is to expose _PyRuntime_Initialize() in the
public C API?

Victor
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Steve Dower

unread,
Nov 17, 2017, 7:19:44 PM11/17/17
to Victor Stinner, Python Dev
On 17Nov2017 1601, Victor Stinner wrote:
> In short, it means that using the "Python runtime" before it's
> initialized by _PyRuntime_Initialize() is now likely to crash. For
> example, calling PyMem_RawMalloc(), before calling
> _PyRuntime_Initialize(), now calls the function NULL: dereference a
> NULL pointer, and so immediately crash with a segmentation fault.
>
> I'm writing this email to ask if this change is an issue or not to
> embedded Python and the Python C API. Is it still possible to call
> "all" functions of the C API before calling Py_Initialize()?

I thought it was never possible to call most of the C API without
initializing, except for certain APIs that are documented as being safe.
I've certainly crashed many times calling C APIs before initialization.
My intuition was that the only safe ones before were those that were
used to initialize the runtime (Py_SetPath and such), which are also the
ones being "upgraded" as part of this work.

If we have a good idea of which ones are [un]safe now, perhaps we should
tag them explicitly in the docs? Do we know which ones are [un]safe?

Cheers,
Steve

Serhiy Storchaka

unread,
Nov 18, 2017, 2:17:53 AM11/18/17
to pytho...@python.org
18.11.17 02:01, Victor Stinner пише:

> Many global variables used by the "Python runtime" were move to a new
> single "_PyRuntime" variable (big structure made of sub-structures).
> See Include/internal/pystate.h.
>
> A side effect of moving variables from random files into header files
> is that it's not more possible to fully initialize _PyRuntime at
> "compilation time". For example, previously, it was possible to refer
> to local C function (functions declared with "static", so only visible
> in the current file). Now a new "initialization function" is required
> to must be called.
>
> In short, it means that using the "Python runtime" before it's
> initialized by _PyRuntime_Initialize() is now likely to crash. For
> example, calling PyMem_RawMalloc(), before calling
> _PyRuntime_Initialize(), now calls the function NULL: dereference a
> NULL pointer, and so immediately crash with a segmentation fault.

Wouldn't be better to revert (the part of) global variables moving?
I still don't see a benefit of it.

> To give a more concrete example: Py_DecodeLocale() is the recommanded
> function to decode bytes from the operating system, but this function
> calls PyMem_RawMalloc() which does crash before
> _PyRuntime_Initialize() is called. Is Py_DecodeLocale() used to
> initialize Python?
>
> For example, "void Py_SetProgramName(wchar_t *);" expects a text
> string, whereas main() gives argv as bytes. Calling
> Py_SetProgramName() from argv requires to decode bytes... So use
> Py_DecodeLocale()...
>
> Should we do something in Py_DecodeLocale()? Maybe crash if
> _PyRuntime_Initialize() wasn't called yet?

I think Py_DecodeLocale() should be usable before calling
Py_Initialize(). In the example in Doc/extending/extending.rst it is
used before Py_Initialize(). If the third-party code is based on this
example, it will crash now.

Antoine Pitrou

unread,
Nov 18, 2017, 6:47:14 AM11/18/17
to pytho...@python.org
On Sat, 18 Nov 2017 01:01:47 +0100
Victor Stinner <victor....@gmail.com> wrote:
>
> Maybe, the minimum change is to expose _PyRuntime_Initialize() in the
> public C API?

+1. Also a symmetric PyRuntime_Finalize() function (even if it's a
no-op currently).

Regards

Antoine.

Nick Coghlan

unread,
Nov 18, 2017, 9:19:58 AM11/18/17
to Victor Stinner, Python Dev
On 18 November 2017 at 10:01, Victor Stinner <victor....@gmail.com> wrote:
> I'm writing this email to ask if this change is an issue or not to
> embedded Python and the Python C API. Is it still possible to call
> "all" functions of the C API before calling Py_Initialize()?

It isn't technically permitted to call any of them, unless their
documentation specifically says that calling them before
`Py_Initialize` is permitted (and that permission is only given for a
select few configuration APIs in
https://docs.python.org/3/c-api/init.html).

While it's still PEP 432's intention to eventually expose a public
multi-phase start-up API, it's *also* the case that we're not actually
ready to do that yet - we're not sure we have the data model right,
and we don't want to commit to a supported API until that's resolved.

So for Python 3.7, I'd suggest pursuing one of the following options:

1. Add a variant of Py_DecodeLocale that accepts a memory allocation
function directly and reports back both the allocated pointer and its
size (allowing the calling program to manage that memory); or
2. Offer a new `Py_SetProgramNameFromString` API that accepts a `char
*` directly. That way, CPython can take care of lazily decoding it
after the decoding machinery has been fully set up, rather than
expecting the embedding application to always do it;

(While we could also make the promise that PyMem_RawMalloc and
Py_DecodeLocale will be callable before Py_Initialize, I don't think
we're far enough into the startup refactoring process to be making
those kinds of promises).

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Serhiy Storchaka

unread,
Nov 18, 2017, 10:48:13 AM11/18/17
to pytho...@python.org
18.11.17 16:17, Nick Coghlan пише:

> On 18 November 2017 at 10:01, Victor Stinner <victor....@gmail.com> wrote:
>> I'm writing this email to ask if this change is an issue or not to
>> embedded Python and the Python C API. Is it still possible to call
>> "all" functions of the C API before calling Py_Initialize()?
>
> It isn't technically permitted to call any of them, unless their
> documentation specifically says that calling them before
> `Py_Initialize` is permitted (and that permission is only given for a
> select few configuration APIs in
> https://docs.python.org/3/c-api/init.html).

The Py_Initialize() is not complete. It mentions only
Py_SetProgramName(), Py_SetPythonHome() and Py_SetPath(). But in other
places it is documented that Py_SetStandardStreamEncoding(),
PyImport_AppendInittab(), PyImport_ExtendInittab() should be called
before Py_Initialize(). And the embedding examples call
Py_DecodeLocale() before Py_Initialize(). PyMem_RawMalloc(),
PyMem_RawFree() and PyInitFrozenExtensions() are called before
Py_Initialize() in Py_FrozenMain(). Also these functions call
_PyMem_RawStrdup().

Hence, the minimal set of functions that can be called before
Py_Initialize() is:

* Py_SetProgramName()
* Py_SetPythonHome()
* Py_SetPath()
* Py_SetStandardStreamEncoding()
* PyImport_AppendInittab()
* PyImport_ExtendInittab()
* Py_DecodeLocale()
* PyMem_RawMalloc()
* PyMem_RawFree()
* PyInitFrozenExtensions()

Nick Coghlan

unread,
Nov 18, 2017, 9:19:20 PM11/18/17
to Serhiy Storchaka, pytho...@python.org

OK, in that case I think the answer to Victor's question is:

1. Breaking calling Py_DecodeLocale() before calling Py_Initialize()
is a compatibility break with the API implied by our own usage
examples, and we'll need to revert the breakage for 3.7, and ensure at
least one release's worth of DeprecationWarning before requiring
either the use of an alternative API (where the caller controls the
memory management), or else a new lower level pre-initialization API
(i.e. making `PyRuntime_Initialize` a public API)
2. We should provide a consolidated list of these functions in the C
API initialization docs
3. We should add more test cases to _testembed.c that ensure they all
work correctly prior to Py_Initialize (some of them are already tested
there, but definitely not all of them)

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Serhiy Storchaka

unread,
Nov 19, 2017, 2:55:36 AM11/19/17
to pytho...@python.org
19.11.17 04:17, Nick Coghlan пише:

> 1. Breaking calling Py_DecodeLocale() before calling Py_Initialize()
> is a compatibility break with the API implied by our own usage
> examples, and we'll need to revert the breakage for 3.7, and ensure at
> least one release's worth of DeprecationWarning before requiring
> either the use of an alternative API (where the caller controls the
> memory management), or else a new lower level pre-initialization API
> (i.e. making `PyRuntime_Initialize` a public API)

There is a way to to control the memory manager. The caller should just
define their own PyMem_RawMalloc(), PyMem_RawFree(), etc. It seems to me
that the reasons of introducing these functions were:

1. Get around the implementation detail when malloc(0) could return
NULL. PyMem_RawMalloc() always should return an unique address (unless
error).

2. Allow the caller to control the memory management by providing their
own implementations.

Let use existing possibilities and not expand the API. I don't think the
deprecation and breaking compatibility are needed here.

Victor Stinner

unread,
Nov 19, 2017, 3:54:40 AM11/19/17
to Serhiy Storchaka, Python Dev
Maybe we can find a compromise: revert the change on memory allocators. They are too special to require to call PyRuntime_Init().

Currently, you cannot call PyMem_SetAllocators() before PyRuntime_Init().

Victor

Nick Coghlan

unread,
Nov 20, 2017, 1:56:27 AM11/20/17
to Victor Stinner, Serhiy Storchaka, Python Dev
On 19 November 2017 at 18:52, Victor Stinner <victor....@gmail.com> wrote:
> Maybe we can find a compromise: revert the change on memory allocators. They
> are too special to require to call PyRuntime_Init().
>
> Currently, you cannot call PyMem_SetAllocators() before PyRuntime_Init().

At least the raw allocators, anyway - that way, the developer facing
documentation/comments can just say that the raw allocators can't have
any prerequisites that aren't shared by regular
malloc/calloc/realloc/free calls.

If that's enough to get Py_DecodeLocale working again prior to
_PyRuntime_Init(), then I'd suggest officially adding that to the
"must work prior to Py_Initialize" list, otherwise we can re-examine
it based on whatever's still broken after reverting the raw allocator
changes.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Victor Stinner

unread,
Nov 20, 2017, 9:24:00 AM11/20/17
to Nick Coghlan, Serhiy Storchaka, Python Dev
To not lost track of the issue, I created this issue on the bpo:
https://bugs.python.org/issue32086

Victor

Eric Snow

unread,
Nov 20, 2017, 10:33:49 AM11/20/17
to Nick Coghlan, Serhiy Storchaka, Python-Dev
On Nov 18, 2017 19:20, "Nick Coghlan" <ncog...@gmail.com> wrote:

OK, in that case I think the answer to Victor's question is:

1. Breaking calling Py_DecodeLocale() before calling Py_Initialize()
is a compatibility break with the API implied by our own usage
examples, and we'll need to revert the breakage for 3.7, 

+1

The break was certainly unintentional. :/  Fortunately, Py_DecodeLocale() should be the only "Process-wide parameter" needing repair.  I suppose, PyMem_RawMalloc() and PyMem_RawFree() *could* be considered too, but my understanding is that they aren't really intended for direct use (especially pre-init).

and ensure at
least one release's worth of DeprecationWarning before requiring
either the use of an alternative API (where the caller controls the
memory management), or else a new lower level pre-initialization API
(i.e. making `PyRuntime_Initialize` a public API)

There shouldn't be a need to deprecate anything, right?  We just need to restore the pre-init behavior of Py_DecodeLocale.

2. We should provide a consolidated list of these functions in the C
API initialization docs

+1

PyMem_Raw*() do not belong in that group, right?  Again, my understanding is that they aren't intended for direct third-party use (are they even a part of the C-API?), and particularly pre-init.  That Py_DecodeLocale() can use PyMem_RawMalloc() pre-init is an implementation detail.

3. We should add more test cases to _testembed.c that ensure they all
work correctly prior to Py_Initialize (some of them are already tested
there, but definitely not all of them)

+1

-eric

Victor Stinner

unread,
Nov 20, 2017, 10:45:48 AM11/20/17
to Eric Snow, Serhiy Storchaka, Nick Coghlan, Python-Dev
2017-11-20 16:31 GMT+01:00 Eric Snow <ericsnow...@gmail.com>:
> That Py_DecodeLocale() can use PyMem_RawMalloc() pre-init is an implementation detail.

Py_DecodeLocale() uses PyMem_RawMalloc(), and so its result must be
freed by PyMem_RawFree(). It's part the documentation.

I'm not sure that I understood correctly. Do you agree to move "PyMem"
globals back to Objects/obmalloc.c? (to allow to call
PyMem_RawMalloc() before Py_Initialize())

Victor

Eric Snow

unread,
Nov 20, 2017, 4:38:08 PM11/20/17
to Victor Stinner, Serhiy Storchaka, Nick Coghlan, Python-Dev
On Mon, Nov 20, 2017 at 8:43 AM, Victor Stinner
<victor....@gmail.com> wrote:
> 2017-11-20 16:31 GMT+01:00 Eric Snow <ericsnow...@gmail.com>:
>> That Py_DecodeLocale() can use PyMem_RawMalloc() pre-init is an implementation detail.
>
> Py_DecodeLocale() uses PyMem_RawMalloc(), and so its result must be
> freed by PyMem_RawFree(). It's part the documentation.

Ah, I'd missed that. Thanks for pointing it out.

>
> I'm not sure that I understood correctly. Do you agree to move "PyMem"
> globals back to Objects/obmalloc.c? (to allow to call
> PyMem_RawMalloc() before Py_Initialize())

I'm okay with that if we can't find another way. However, shouldn't
we be able to statically initialize the raw allocator in _PyRuntime,
much as we were doing before in obmalloc.c? I have a rough PR up:

https://github.com/python/cpython/pull/4481

Also, I opened https://bugs.python.org/issue32096 for the regression.
Thanks for bringing it up.

-eric

Victor Stinner

unread,
Nov 20, 2017, 5:06:04 PM11/20/17
to Eric Snow, Serhiy Storchaka, Nick Coghlan, Python-Dev
2017-11-20 22:35 GMT+01:00 Eric Snow <ericsnow...@gmail.com>:
> I'm okay with that if we can't find another way. However, shouldn't
> we be able to statically initialize the raw allocator in _PyRuntime,
> much as we were doing before in obmalloc.c? I have a rough PR up:
>
> https://github.com/python/cpython/pull/4481
>
> Also, I opened https://bugs.python.org/issue32096 for the regression.
> Thanks for bringing it up.

To statically initialize PyMemAllocatorEx fields, you need to export a
lot of allocator functions. I would prefer to not do that.

static void* _PyMem_DebugRawMalloc(void *ctx, size_t size);
static void* _PyMem_DebugRawCalloc(void *ctx, size_t nelem, size_t elsize);
static void* _PyMem_DebugRawRealloc(void *ctx, void *ptr, size_t size);
static void _PyMem_DebugRawFree(void *ctx, void *ptr);

static void* _PyMem_DebugMalloc(void *ctx, size_t size);
static void* _PyMem_DebugCalloc(void *ctx, size_t nelem, size_t elsize);
static void* _PyMem_DebugRealloc(void *ctx, void *ptr, size_t size);
static void _PyMem_DebugFree(void *ctx, void *p);

static void* _PyObject_Malloc(void *ctx, size_t size);
static void* _PyObject_Calloc(void *ctx, size_t nelem, size_t elsize);
static void _PyObject_Free(void *ctx, void *p);
static void* _PyObject_Realloc(void *ctx, void *ptr, size_t size);

The rules to choose the allocator to each domain are also complex
depending if pymalloc is enabled, debug hooks are enabled by default,
etc. The memory allocator is also linked to _PyMem_Debug which is not
currently in Include/internals/ but Objects/obmalloc.c.

I understand that moving global variables to _PyRuntime helps to
clarify how these variables are initialized and then finalized, but
memory allocators are a complex corner case.

main(), Py_Main() and _PyRuntime_Initialize() now have to change
temporary the allocators to make sure that their initialization and
finalization use the same allocator.

I prefer to revert the change on memory allocators, and retry later to
fix it, once other initializations issues are fixed ;-)

Victor

Nick Coghlan

unread,
Nov 20, 2017, 9:34:41 PM11/20/17
to Eric Snow, Serhiy Storchaka, Python-Dev
On 21 November 2017 at 01:31, Eric Snow <ericsnow...@gmail.com> wrote:
> On Nov 18, 2017 19:20, "Nick Coghlan" <ncog...@gmail.com> wrote:
>
>
> OK, in that case I think the answer to Victor's question is:
>
> 1. Breaking calling Py_DecodeLocale() before calling Py_Initialize()
> is a compatibility break with the API implied by our own usage
> examples, and we'll need to revert the breakage for 3.7,
>
>
> +1
>
> The break was certainly unintentional. :/ Fortunately, Py_DecodeLocale()
> should be the only "Process-wide parameter" needing repair. I suppose,
> PyMem_RawMalloc() and PyMem_RawFree() *could* be considered too, but my
> understanding is that they aren't really intended for direct use (especially
> pre-init).

PyMem_RawFree will need to continue working pre-initialize as well,
since it's the specified cleanup function for Py_DecodeLocale.

Eric Snow

unread,
Nov 21, 2017, 10:59:11 AM11/21/17
to Victor Stinner, Serhiy Storchaka, Nick Coghlan, Python-Dev
On Mon, Nov 20, 2017 at 3:03 PM, Victor Stinner
<victor....@gmail.com> wrote:
> To statically initialize PyMemAllocatorEx fields, you need to export a
> lot of allocator functions. I would prefer to not do that.
>
> [snip]
>
> The rules to choose the allocator to each domain are also complex
> depending if pymalloc is enabled, debug hooks are enabled by default,
> etc. The memory allocator is also linked to _PyMem_Debug which is not
> currently in Include/internals/ but Objects/obmalloc.c.

I'm not suggesting supporting the full machinery. Rather, as my PR
demonstrates, we can statically initialize the minimum needed to
support pre-init use of PyMem_RawMalloc() and PyMem_RawFree(). The
allocators will be fully initialized once the runtime is initialized
(i.e. once Py_Initialize() is called), just as they are now.

FWIW, I'm not sure that's the best approach. See my notes in
https://bugs.python.org/issue32096.

>
> I understand that moving global variables to _PyRuntime helps to
> clarify how these variables are initialized and then finalized, but
> memory allocators are a complex corner case.

Agreed. I spent a large portion of my time getting the allocators
right when working on the original _PyRuntime patch. It's tricky
code.

-eric

Victor Stinner

unread,
Nov 22, 2017, 4:40:47 AM11/22/17
to Eric Snow, Serhiy Storchaka, Nick Coghlan, Python-Dev
2017-11-21 16:57 GMT+01:00 Eric Snow <ericsnow...@gmail.com>:
>> I understand that moving global variables to _PyRuntime helps to
>> clarify how these variables are initialized and then finalized, but
>> memory allocators are a complex corner case.
>
> Agreed. I spent a large portion of my time getting the allocators
> right when working on the original _PyRuntime patch. It's tricky
> code.

Oh, I forgot to notify you: when I worked on Py_Main(), I got crashes
because PyMem_RawMalloc() wasn't usable before calling
Py_Initialize(). This is what I call a regresion, and that's why I
started this thread :-)

I fixed the issue by calling _PyRuntime_Initialize() as the very first
function in main().

I also had to add _PyMem_GetDefaultRawAllocator() to get a
deterministic memory allocator, rather than depending on the allocator
set an application embedding Python, we must be sure that the same
allocator is used to initialize and finalize Python.

Victor

Antoine Pitrou

unread,
Nov 22, 2017, 6:06:38 AM11/22/17
to pytho...@python.org
On Wed, 22 Nov 2017 10:38:32 +0100
Victor Stinner <victor....@gmail.com> wrote:
>
> I fixed the issue by calling _PyRuntime_Initialize() as the very first
> function in main().
>
> I also had to add _PyMem_GetDefaultRawAllocator() to get a
> deterministic memory allocator, rather than depending on the allocator
> set an application embedding Python, we must be sure that the same
> allocator is used to initialize and finalize Python.

This is a bit worrying. Do Python embedders have to go through the
same dance?

IMHO this really needs a simple solution documented somewhere. Also,
hopefully when you do the wrong thing, you get a clear error message to
know how to fix your code?

Regards

Antoine.

Victor Stinner

unread,
Nov 22, 2017, 6:14:45 AM11/22/17
to Antoine Pitrou, Python Dev
2017-11-22 12:04 GMT+01:00 Antoine Pitrou <soli...@pitrou.net>:
> IMHO this really needs a simple solution documented somewhere. Also,
> hopefully when you do the wrong thing, you get a clear error message to
> know how to fix your code?

Right now, calling PyMem_RawMalloc() before calling
_PyRuntime_Initialize() calls the function at address NULL, so you get
a segmentation fault.

Documenting the new requirements is part of the discussion, it's one
option how to fix this issue.

Victor

Antoine Pitrou

unread,
Nov 22, 2017, 6:41:52 AM11/22/17
to pytho...@python.org
On Wed, 22 Nov 2017 12:12:32 +0100
Victor Stinner <victor....@gmail.com> wrote:
> 2017-11-22 12:04 GMT+01:00 Antoine Pitrou <soli...@pitrou.net>:
> > IMHO this really needs a simple solution documented somewhere. Also,
> > hopefully when you do the wrong thing, you get a clear error message to
> > know how to fix your code?
>
> Right now, calling PyMem_RawMalloc() before calling
> _PyRuntime_Initialize() calls the function at address NULL, so you get
> a segmentation fault.

Can we get something more readable? For example:

FATAL ERROR: PyMem_RawMalloc(): malloc function is NULL, did you call
_PyRuntime_Initialize?

Regards

Antoine.

Nick Coghlan

unread,
Nov 22, 2017, 8:26:54 PM11/22/17
to Victor Stinner, Antoine Pitrou, Python Dev
On 22 November 2017 at 21:12, Victor Stinner <victor....@gmail.com> wrote:
2017-11-22 12:04 GMT+01:00 Antoine Pitrou <soli...@pitrou.net>:
> IMHO this really needs a simple solution documented somewhere.  Also,
> hopefully when you do the wrong thing, you get a clear error message to
> know how to fix your code?

Right now, calling PyMem_RawMalloc() before calling
_PyRuntime_Initialize() calls the function at address NULL, so you get
a segmentation fault.

Documenting the new requirements is part of the discussion, it's one
option how to fix this issue.

My own recommendation is that we add Eric's new test case to the embedding test suite and just make sure it works:
    wchar_t *program = Py_DecodeLocale("spam", NULL);
    Py_SetProgramName(program);
    Py_Initialize();
    Py_Finalize();
    PyMem_RawFree(program);
It does place some additional constraints on us in terms of handling static initialization of the allocator state, and ensuring we revert back to that state in Py_Finalize, but I think it's the only way we're going to be able to reliably replace all calls to malloc & free with PyMem_RawMalloc and PyMem_RawFree without causing weird problems.

M.-A. Lemburg

unread,
Nov 23, 2017, 4:39:57 AM11/23/17
to Victor Stinner, Python Dev
On 18.11.2017 01:01, Victor Stinner wrote:
> Hi,
>
> The CPython internals evolved during Python 3.7 cycle. I would like to
> know if we broke the C API or not.
>
> Nick Coghlan and Eric Snow are working on cleaning up the Python
> initialization with the "on going" PEP 432:
> https://www.python.org/dev/peps/pep-0432/
>
> Many global variables used by the "Python runtime" were move to a new
> single "_PyRuntime" variable (big structure made of sub-structures).
> See Include/internal/pystate.h.
>
> A side effect of moving variables from random files into header files
> is that it's not more possible to fully initialize _PyRuntime at
> "compilation time". For example, previously, it was possible to refer
> to local C function (functions declared with "static", so only visible
> in the current file). Now a new "initialization function" is required
> to must be called.
>
> In short, it means that using the "Python runtime" before it's
> initialized by _PyRuntime_Initialize() is now likely to crash. For
> example, calling PyMem_RawMalloc(), before calling
> _PyRuntime_Initialize(), now calls the function NULL: dereference a
> NULL pointer, and so immediately crash with a segmentation fault.

To prevent a complete crash, would it be possible to initialize
the struct entries to a generic function (or set of such functions
with the right signatures), which then issue a message to stderr
hinting to the missing call to _PyRuntime_Initialize()
before terminating ?
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/mal%40egenix.com
>

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Nov 23 2017)
>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/
>>> Python Database Interfaces ... http://products.egenix.com/
>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/
________________________________________________________________________

::: We implement business ideas - efficiently in both time and costs :::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
http://www.malemburg.com/

Antoine Pitrou

unread,
Nov 23, 2017, 5:18:32 AM11/23/17
to pytho...@python.org
+1. This sounds like a good idea.

Regards

Antoine.

Victor Stinner

unread,
Nov 23, 2017, 6:21:46 PM11/23/17
to Python Dev
Hi,

We are close to the 3.7a3 release and the bug is not fixed yet. I
propose to revert the changes on memory allocators right now, and take
time to design a proper fix which will respect all constraints.

https://github.com/python/cpython/pull/4532

Today, someone came to me on IRC to complain that calling
Py_DecodeLocale() does now crash on Python 3.7. He is doing tests to
embed Python on Android. Later he asks me about
PyImport_AppendInittab(), but I don't know this function. He told me
that it does crash in PyMem_Realloc()... But PyImport_AppendInittab()
must be called before Py_Initialize()...

It confirms that Python is embedded and that the C API is used before
Py_Initialize().

We don't know yet exactly how the the C API is used, which functions
are called before Py_Initialize(). Moreover, PEP 432 implementation is
still incomplete, and calling _PyRuntime_Initialize() is just not
possible, since it's a private API which is not exported...

Victor

Nick Coghlan

unread,
Nov 23, 2017, 8:33:14 PM11/23/17
to Victor Stinner, Python Dev
On 24 November 2017 at 09:19, Victor Stinner <victor....@gmail.com> wrote:
Hi,

We are close to the 3.7a3 release and the bug is not fixed yet. I
propose to revert the changes on memory allocators right now, and take
time to design a proper fix which will respect all constraints.

https://github.com/python/cpython/pull/4532

Today, someone came to me on IRC to complain that calling
Py_DecodeLocale() does now crash on Python 3.7. He is doing tests to
embed Python on Android. Later he asks me about
PyImport_AppendInittab(), but I don't know this function. He told me
that it does crash in PyMem_Realloc()... But PyImport_AppendInittab()
must be called before Py_Initialize()...

It confirms that Python is embedded and that the C API is used before
Py_Initialize().

We don't know yet exactly how the the C API is used, which functions
are called before Py_Initialize().

We do note some of them explicitly at https://docs.python.org/3/c-api/init.html (search for "before Py").

What we've been missing is a test case that ensures https://docs.python.org/3/extending/embedding.html#very-high-level-embedding actually works reliably (hence how we managed to break it by way of the internal state management refactoring).

Once that core regression has been fixed, we can review the docs and the test suite and come up with:

- a consolidated list of *all* the APIs that can safely be called before Py_Initialize
- one or more new or updated test cases to ensure that any not yet tested pre-initialization APIs actually work as intended
 
Moreover, PEP 432 implementation is
still incomplete, and calling _PyRuntime_Initialize() is just not
possible, since it's a private API which is not exported...

Even after we reach the point of exposing the more fine-grained initialisation API (which I'm now thinking we may be able to do for 3.8 given Eric & Victor's work on it for 3.7), we're still going to have to ensure the existing configuration API keeps working as expected.

Cheers,

Glenn Linderman

unread,
Nov 23, 2017, 9:50:51 PM11/23/17
to pytho...@python.org
On 11/23/2017 5:31 PM, Nick Coghlan wrote:
- a consolidated list of *all* the APIs that can safely be called before Py_Initialize
So it is interesting to know that list, of course, but the ones that are to be supported and documented might be a smaller list. Or might not.

Nick Coghlan

unread,
Nov 23, 2017, 11:03:11 PM11/23/17
to Glenn Linderman, pytho...@python.org
Ah, sorry - "safely" was a bit ambiguous there. By "safely" I meant "CPython has a regression test that ensures that particular API will keep working before Py_Initialize(), regardless of any changes we may make to the way we handle interpreter initialization".

We've long had a lot of other APIs that happen to work well enough for CPython itself to get away with using them during the startup process, but the official position on those is "Don't count on these APIs working prior to Py_Initialize() in the general case - we only get away with it because we can adjust the exact order in which we do things in order to account for any other changes that break it".

Serhiy Storchaka

unread,
Nov 24, 2017, 1:45:12 AM11/24/17
to pytho...@python.org
24.11.17 04:21, Glenn Linderman пише:

This is a small list, 11 functions.

Victor Stinner

unread,
Nov 24, 2017, 8:26:13 AM11/24/17
to Nick Coghlan, pytho...@python.org
I proposed a PR to explicitly list functions safe to be called before
Py_Initialize():

https://bugs.python.org/issue32124
https://github.com/python/cpython/pull/4540

I found more than 11 functions.. I also found variables ;-)

Victor
> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com
Reply all
Reply to author
Forward
0 new messages