[Python-Dev] RFC: PEP 587 "Python Initialization Configuration": 3rd version

562 views
Skip to first unread message

Victor Stinner

unread,
May 15, 2019, 7:12:51 PM5/15/19
to python-dev
Hi,

Thanks to the constructive discussions, I enhanced my PEP 587. I don't
plan any further change, the PEP is now ready to review (and maybe
even for pronouncement, hi Thomas! :-)).

The Rationale now better explains all challenges and the complexity of
the Python Initialization Configuration.

The "Isolate Python" section is a short guide explaining how configure
Python to embed it into an application.

The "Path Configuration" section elaborates the most interesting part
of the configuration: configure where Python looks for modules
(sys.path). I added PyWideStringList_Insert() to allow to prepend a
path in module_search_paths.

The "Python Issues" section give a long list of issues solved directly
or indirectly by this PEP.

I'm open for bikeshedding on PyConfig fields names and added functions
names ;-) I hesitate for "use_module_search_paths": maybe
"module_search_paths_set" is a better name, as in "is
module_search_paths set?". The purpose of this field is to allow to
have an empty sys.path (ask PyConfig_Read() to not override it). IMHO
an empty sys.path makes sense for some specific use cases, like
executing Pyhon code without any external module.

My PEP 587 proposes better names: Py_FrozenFlag becomes
PyConfig.pathconfig_warnings and Py_DebugFlag becomes
PyConfig.parser_debug. I also avoided double negation. For example,
Py_DontWriteBytecodeFlag becomes write_bytecode.

Changes between version 3 and version 2:

* PyConfig: Add configure_c_stdio and parse_argv; rename _frozen to
pathconfig_warnings.
* Rename functions using bytes strings and wide strings. For example,
Py_PreInitializeFromWideArgs() becomes Py_PreInitializeFromArgs(), and
PyConfig_SetArgv() becomes PyConfig_SetBytesArgv().
* Add PyWideStringList_Insert() function.
* New "Path configuration", "Isolate Python", "Python Issues" and
"Version History" sections.
* PyConfig_SetString() and PyConfig_SetBytesString() now requires the
configuration as the first argument.
* Rename Py_UnixMain() to Py_BytesMain()


HTML version:
https://www.python.org/dev/peps/pep-0587/

Full PEP text below.

I know that the PEP is long, but well, it's a complex topic, and I
chose to add many examples to make the API easier to understand.

Victor

---

PEP: 587
Title: Python Initialization Configuration
Author: Victor Stinner <vsti...@redhat.com>, Nick Coghlan <ncog...@gmail.com>
BDFL-Delegate: Thomas Wouters <tho...@python.org>
Discussions-To: pytho...@python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 27-Mar-2019
Python-Version: 3.8

Abstract
========

Add a new C API to configure the Python Initialization providing finer
control on the whole configuration and better error reporting.

It becomes possible to read the configuration and modify it before it is
applied. It also becomes possible to completely override how Python
computes the module search paths (``sys.path``).

Building a customized Python which behaves as regular Python becomes
easier using the new ``Py_RunMain()`` function. Moreover, command line
arguments passed to ``PyConfig.argv`` are now parsed as the regular
Python parses command line options, and ``PyConfig.xoptions`` are
handled as ``-X opt`` command line options.

This extracts a subset of the API design from the PEP 432 development and
refactoring work that is now considered sufficiently stable to make public
(allowing 3rd party embedding applications access to the same configuration
APIs that the native CPython CLI is now using).


Rationale
=========

Python is highly configurable but its configuration evolved organically.
The initialization configuration is scattered all around the code using
different ways to set them: global configuration variables (ex:
``Py_IsolatedFlag``), environment variables (ex: ``PYTHONPATH``),
command line arguments (ex: ``-b``), configuration files (ex:
``pyvenv.cfg``), function calls (ex: ``Py_SetProgramName()``). A
straightforward and reliable way to configure Python is needed.

Some configuration parameters are not accessible from the C API, or not
easily. For example, there is no API to override the default values of
``sys.executable``.

Some options like ``PYTHONPATH`` can only be set using an environment
variable which has a side effect on Python child processes.

Some options also depends on other options: see `Priority and Rules`_.
Python 3.7 API does not provide a consistent view of the overall
configuration.

The C API of Python 3.7 Initialization takes ``wchar_t*`` strings as
input whereas the Python filesystem encoding is set during the
initialization which can lead to mojibake.

Python 3.7 APIs like ``Py_Initialize()`` aborts the process on memory
allocation failure which is not convenient when Python is embedded.
Moreover, ``Py_Main()`` could exit directly the process rather than
returning an exit code. Proposed new API reports the error or exit code
to the caller which can decide how to handle it.

Implementing the PEP 540 (UTF-8 Mode) and the new ``-X dev`` correctly
was almost impossible in Python 3.6. The code base has been deeply
reworked in Python 3.7 and then in Python 3.8 to read the configuration
into a structure with no side effect. It becomes possible to clear the
configuration (release memory) and read again the configuration if the
encoding changed . It is required to implement properly the UTF-8 which
changes the encoding using ``-X utf8`` command line option. Internally,
bytes ``argv`` strings are decoded from the filesystem encoding. The
``-X dev`` changes the memory allocator (behaves as
``PYTHONMALLOC=debug``), whereas it was not possible to change the
memory allocation *while* parsing the command line arguments. The new
design of the internal implementation not only allowed to implement
properly ``-X utf8`` and ``-X dev``, it also allows to change the Python
behavior way more easily, especially for corner cases like that, and
ensure that the configuration remains consistent: see `Priority and
Rules`_.

This PEP is a partial implementation of PEP 432 which is the overall
design. New fields can be added later to ``PyConfig`` structure to
finish the implementation of the PEP 432 (e.g. by adding a new partial
initialization API which allows to configure Python using Python objects to
finish the full initialization). However, those features are omitted from this
PEP as even the native CPython CLI doesn't work that way - the public API
proposal in this PEP is limited to features which have already been implemented
and adopted as private APIs for us in the native CPython CLI.


Python Initialization C API
===========================

This PEP proposes to add the following new structures, functions and
macros.

New structures (4):

* ``PyConfig``
* ``PyInitError``
* ``PyPreConfig``
* ``PyWideStringList``

New functions (17):

* ``Py_PreInitialize(config)``
* ``Py_PreInitializeFromBytesArgs(config, argc, argv)``
* ``Py_PreInitializeFromArgs(config, argc, argv)``
* ``PyWideStringList_Append(list, item)``
* ``PyWideStringList_Insert(list, index, item)``
* ``PyConfig_SetString(config,config_str, str)``
* ``PyConfig_SetBytesString(config, config_str, str)``
* ``PyConfig_SetBytesArgv(config, argc, argv)``
* ``PyConfig_SetArgv(config, argc, argv)``
* ``PyConfig_Read(config)``
* ``PyConfig_Clear(config)``
* ``Py_InitializeFromConfig(config)``
* ``Py_InitializeFromBytesArgs(config, argc, argv)``
* ``Py_InitializeFromArgs(config, argc, argv)``
* ``Py_BytesMain(argc, argv)``
* ``Py_RunMain()``
* ``Py_ExitInitError(err)``

New macros (9):

* ``PyPreConfig_INIT``
* ``PyConfig_INIT``
* ``Py_INIT_OK()``
* ``Py_INIT_ERR(MSG)``
* ``Py_INIT_NO_MEMORY()``
* ``Py_INIT_EXIT(EXITCODE)``
* ``Py_INIT_IS_ERROR(err)``
* ``Py_INIT_IS_EXIT(err)``
* ``Py_INIT_FAILED(err)``

This PEP also adds ``_PyRuntimeState.preconfig`` (``PyPreConfig`` type)
and ``PyInterpreterState.config`` (``PyConfig`` type) fields to these
internal structures. ``PyInterpreterState.config`` becomes the new
reference configuration, replacing global configuration variables and
other private variables.


PyWideStringList
----------------

``PyWideStringList`` is a list of ``wchar_t*`` strings.

Example to initialize a string from C static array::

static wchar_t* argv[2] = {
L"-c",
L"pass",
};
PyWideStringList config_argv = PyWideStringList_INIT;
config_argv.length = Py_ARRAY_LENGTH(argv);
config_argv.items = argv;

``PyWideStringList`` structure fields:

* ``length`` (``Py_ssize_t``)
* ``items`` (``wchar_t**``)

Methods:

* ``PyInitError PyWideStringList_Append(PyWideStringList *list, const
wchar_t *item)``:
Append *item* to *list*.
* ``PyInitError PyWideStringList_Insert(PyWideStringList *list,
Py_ssize_t index, const wchar_t *item)``:
Insert *item* into *list* at *index*. If *index* is greater than
*list* length, just append *item* to *list*.

If *length* is non-zero, *items* must be non-NULL and all strings must
be non-NULL.

PyInitError
-----------

``PyInitError`` is a structure to store an error message or an exit code
for the Python Initialization. For an error, it stores the C function
name which created the error.

Example::

PyInitError alloc(void **ptr, size_t size)
{
*ptr = PyMem_RawMalloc(size);
if (*ptr == NULL) {
return Py_INIT_NO_MEMORY();
}
return Py_INIT_OK();
}

int main(int argc, char **argv)
{
void *ptr;
PyInitError err = alloc(&ptr, 16);
if (Py_INIT_FAILED(err)) {
Py_ExitInitError(err);
}
PyMem_Free(ptr);
return 0;
}

``PyInitError`` fields:

* ``exitcode`` (``int``):
argument passed to ``exit()``, only set by ``Py_INIT_EXIT()``.
* ``err_msg`` (``const char*``): error message
* private ``_func`` field: used by ``Py_INIT_ERR()`` to store the C
function name which created the error.
* private ``_type`` field: for internal usage only.

Macro to create an error:

* ``Py_INIT_OK()``: Success.
* ``Py_INIT_ERR(err_msg)``: Initialization error with a message.
* ``Py_INIT_NO_MEMORY()``: Memory allocation failure (out of memory).
* ``Py_INIT_EXIT(exitcode)``: Exit Python with the specified exit code.

Other macros and functions:

* ``Py_INIT_IS_ERROR(err)``: Is the result an error?
* ``Py_INIT_IS_EXIT(err)``: Is the result an exit?
* ``Py_INIT_FAILED(err)``: Is the result an error or an exit? Similar
to ``Py_INIT_IS_ERROR(err) || Py_INIT_IS_EXIT(err)``.
* ``Py_ExitInitError(err)``: Call ``exit(exitcode)`` on Unix or
``ExitProcess(exitcode)`` if the result is an exit, call
``Py_FatalError(err_msg)`` if the result is an error. Must not be
called if the result is a success.

Pre-Initialization with PyPreConfig
-----------------------------------

``PyPreConfig`` structure is used to pre-initialize Python:

* Set the memory allocator
* Configure the LC_CTYPE locale
* Set the UTF-8 mode

Example using the pre-initialization to enable the UTF-8 Mode::

PyPreConfig preconfig = PyPreConfig_INIT;
preconfig.utf8_mode = 1;

PyInitError err = Py_PreInitialize(&preconfig);
if (Py_INIT_FAILED(err)) {
Py_ExitInitError(err);
}

/* at this point, Python will speak UTF-8 */

Py_Initialize();
/* ... use Python API here ... */
Py_Finalize();

Functions to pre-initialize Python:

* ``PyInitError Py_PreInitialize(const PyPreConfig *config)``
* ``PyInitError Py_PreInitializeFromBytesArgs(const PyPreConfig
*config, int argc, char **argv)``
* ``PyInitError Py_PreInitializeFromArgs(const PyPreConfig *config,
int argc, wchar_t **argv)``

These functions can be called with *config* set to ``NULL``.

If Python is initialized with command line arguments, the command line
arguments must also be passed to pre-initialize Python, since they have
an effect on the pre-configuration like encodings. For example, the
``-X utf8`` command line option enables the UTF-8 Mode.

These functions can be called with *config* set to ``NULL``. The caller
is responsible to handle error or exit using ``Py_INIT_FAILED()`` and
``Py_ExitInitError()``.

``PyPreConfig`` fields:

* ``allocator`` (``char*``, default: ``NULL``):
Name of the memory allocator (ex: ``"malloc"``).
* ``coerce_c_locale`` (``int``, default: 0):
If equals to 2, coerce the C locale; if equals to 1, read the LC_CTYPE
locale to decide if it should be coerced.
* ``coerce_c_locale_warn`` (``int``, default: 0):
If non-zero, emit a warning if the C locale is coerced.
* ``dev_mode`` (``int``, default: 0):
See ``PyConfig.dev_mode``.
* ``isolated`` (``int``, default: 0):
See ``PyConfig.isolated``.
* ``legacy_windows_fs_encoding`` (``int``, Windows only, default: 0):
If non-zero, disable UTF-8 Mode, set the Python filesystem encoding to
``mbcs``, set the filesystem error handler to ``replace``.
* ``use_environment`` (``int``, default: 1):
See ``PyConfig.use_environment``.
* ``utf8_mode`` (``int``, default: 0):
If non-zero, enable the UTF-8 mode.

``PyPreConfig`` private field, for internal use only:

* ``_config_version`` (``int``, default: config version):
Configuration version, used for ABI compatibility.

The C locale coercion (PEP 538) and the UTF-8 Mode (PEP 540) are
disabled by default in ``PyPreConfig``. Set ``coerce_c_locale``,
``coerce_c_locale_warn`` and ``utf8_mode`` to ``-1`` to let Python
enable them depending on the user configuration. In this case, it's
safer to explicitly pre-initialize Python to ensure that encodings are
configured before the Python initialization starts. Example to get the
same encoding than regular Python::

PyPreConfig preconfig = PyPreConfig_INIT;
preconfig.coerce_c_locale = -1;
preconfig.coerce_c_locale_warn = -1;
preconfig.utf8_mode = -1;

PyInitError err = Py_PreInitialize(&preconfig);
if (Py_INIT_FAILED(err)) {
Py_ExitInitError(err);
}


Initialization with PyConfig
----------------------------

The ``PyConfig`` structure contains all parameters to configure Python.

Example setting the program name::

PyInitError err;
PyConfig config = PyConfig_INIT;

err = PyConfig_SetString(&config.program_name, L"my_program");
if (_Py_INIT_FAILED(err)) {
Py_ExitInitError(err);
}

err = Py_InitializeFromConfig(&config);
PyConfig_Clear(&config);

if (Py_INIT_FAILED(err)) {
Py_ExitInitError(err);
}

``PyConfig`` methods:

* ``PyInitError PyConfig_SetString(PyConfig *config, wchar_t
**config_str, const wchar_t *str)``:
Copy the wide character string *str* into ``*config_str``.
* ``PyInitError PyConfig_SetBytesString(PyConfig *config, wchar_t
**config_str, const char *str)``:
Decode *str* using ``Py_DecodeLocale()`` and set the result into
``*config_str``. Pre-initialize Python if needed to ensure that
encodings are properly configured.
* ``PyInitError PyConfig_SetArgv(PyConfig *config, int argc, wchar_t **argv)``:
Set command line arguments from wide character strings.
* ``PyInitError PyConfig_SetBytesArgv(PyConfig *config, int argc, char
**argv)``:
Set command line arguments: decode bytes using ``Py_DecodeLocale()``.
Pre-initialize Python if needed to ensure that encodings are properly
configured.
* ``PyInitError PyConfig_Read(PyConfig *config)``:
Read all Python configuration. Fields which are already set are left
unchanged.
* ``void PyConfig_Clear(PyConfig *config)``:
Release configuration memory.

Functions to initialize Python:

* ``PyInitError Py_InitializeFromConfig(const PyConfig *config)``:
Initialize Python from *config* configuration. *config* can be
``NULL``.

The caller of these methods and functions is responsible to handle
failure or exit using ``Py_INIT_FAILED()`` and ``Py_ExitInitError()``.

``PyConfig`` fields:

* ``argv`` (``PyWideStringList``, default: empty):
Command line arguments, ``sys.argv``.
It is parsed and updated by default, set ``parse_argv`` to 0 to avoid
that.
* ``base_exec_prefix`` (``wchar_t*``, default: ``NULL``):
``sys.base_exec_prefix``.
* ``base_prefix`` (``wchar_t*``, default: ``NULL``):
``sys.base_prefix``.
* ``buffered_stdio`` (``int``, default: 1):
If equals to 0, enable unbuffered mode, make stdout and stderr streams
to be unbuffered.
* ``bytes_warning`` (``int``, default: 0):
If equals to 1, issue a warning when comparing ``bytes`` or
``bytearray`` with ``str``, or comparing ``bytes`` with ``int``. If
equal or greater to 2, raise a ``BytesWarning`` exception.
* ``check_hash_pycs_mode`` (``wchar_t*``, default: ``"default"``):
``--check-hash-based-pycs`` command line option value (see PEP 552).
* ``configure_c_stdio`` (``int``, default: 1):
If non-zero, configure C standard streams (``stdio``, ``stdout``,
``stdout``). For example, set their mode to ``O_BINARY`` on Windows.
* ``dev_mode`` (``int``, default: 0):
Development mode
* ``dll_path`` (``wchar_t*``, Windows only, default: ``NULL``):
Windows DLL path.
* ``dump_refs`` (``int``, default: 0):
If non-zero, dump all objects which are still alive at exit
* ``exec_prefix`` (``wchar_t*``, default: ``NULL``):
``sys.exec_prefix``.
* ``executable`` (``wchar_t*``, default: ``NULL``):
``sys.executable``.
* ``faulthandler`` (``int``, default: 0):
If non-zero, call ``faulthandler.enable()``.
* ``filesystem_encoding`` (``wchar_t*``, default: ``NULL``):
Filesystem encoding, ``sys.getfilesystemencoding()``.
* ``filesystem_errors`` (``wchar_t*``, default: ``NULL``):
Filesystem encoding errors, ``sys.getfilesystemencodeerrors()``.
* ``use_hash_seed`` (``int``, default: 0),
``hash_seed`` (``unsigned long``, default: 0):
Randomized hash function seed.
* ``home`` (``wchar_t*``, default: ``NULL``):
Python home directory.
* ``import_time`` (``int``, default: 0):
If non-zero, profile import time.
* ``inspect`` (``int``, default: 0):
Enter interactive mode after executing a script or a command.
* ``install_signal_handlers`` (``int``, default: 1):
Install signal handlers?
* ``interactive`` (``int``, default: 0):
Interactive mode.
* ``legacy_windows_stdio`` (``int``, Windows only, default: 0):
If non-zero, use ``io.FileIO`` instead of ``WindowsConsoleIO`` for
``sys.stdin``, ``sys.stdout`` and ``sys.stderr``.
* ``malloc_stats`` (``int``, default: 0):
If non-zero, dump memory allocation statistics at exit.
* ``module_search_path_env`` (``wchar_t*``, default: ``NULL``):
``PYTHONPATH`` environment variale value.
* ``use_module_search_paths`` (``int``, default: 0),
``module_search_paths`` (``PyWideStringList``, default: empty):
``sys.path``.
* ``optimization_level`` (``int``, default: 0):
Compilation optimization level.
* ``parse_argv`` (``int``, default: 1):
If non-zero, parse ``argv`` command line arguments and update
``argv``.
* ``parser_debug`` (``int``, default: 0):
If non-zero, turn on parser debugging output (for expert only,
depending on compilation options).
* ``pathconfig_warnings`` (``int``, default: 1):
If equal to 0, suppress warnings when computing the path
configuration.
* ``prefix`` (``wchar_t*``, default: ``NULL``):
``sys.prefix``.
* ``program_name`` (``wchar_t*``, default: ``NULL``):
Program name.
* ``program`` (``wchar_t*``, default: ``NULL``):
``argv[0]`` or an empty string.
* ``pycache_prefix`` (``wchar_t*``, default: ``NULL``):
``.pyc`` cache prefix.
* ``quiet`` (``int``, default: 0):
Quiet mode. For example, don't display the copyright and version
messages even in interactive mode.
* ``run_command`` (``wchar_t*``, default: ``NULL``):
``-c COMMAND`` argument.
* ``run_filename`` (``wchar_t*``), default: ``NULL``:
``python3 SCRIPT`` argument.
* ``run_module`` (``wchar_t*``, default: ``NULL``):
``python3 -m MODULE`` argument.
* ``show_alloc_count`` (``int``, default: 0):
Show allocation counts at exit?
* ``show_ref_count`` (``int``, default: 0):
Show total reference count at exit?
* ``site_import`` (``int``, default: 1):
Import the ``site`` module at startup?
* ``skip_source_first_line`` (``int``, default: 0):
Skip the first line of the source?
* ``stdio_encoding`` (``wchar_t*``, default: ``NULL``),
``stdio_errors`` (``wchar_t*``, default: ``NULL``):
Encoding and encoding errors of ``sys.stdin``, ``sys.stdout``
and ``sys.stderr``.
* ``tracemalloc`` (``int``, default: 0):
If non-zero, call ``tracemalloc.start(value)``.
* ``user_site_directory`` (``int``, default: 1):
If non-zero, add user site directory to ``sys.path``.
* ``verbose`` (``int``, default: 0):
If non-zero, enable verbose mode.
* ``warnoptions`` (``PyWideStringList``, default: empty):
Options of the ``warnings`` module to build warnings filters.
* ``write_bytecode`` (``int``, default: 1):
If non-zero, write ``.pyc`` files.
* ``xoptions`` (``PyWideStringList``, default: empty):
``sys._xoptions``.

``PyConfig`` private fields, for internal use only:

* ``_config_version`` (``int``, default: config version):
Configuration version, used for ABI compatibility.
* ``_install_importlib`` (``int``, default: 1):
Install importlib?
* ``_init_main`` (``int``, default: 1):
If equal to 0, stop Python initialization before the "main" phase
(see PEP 432).

By default, the ``argv`` arguments are parsed as regular Python command
line arguments and ``argv`` is updated to strip parsed Python arguments:
see `Command Line Arguments`_. Set ``parse_argv`` to 0 to avoid parsing
and updating ``argv``. If ``argv`` is empty, an empty string is added to
ensure that ``sys.argv`` always exists and is never empty.

The ``xoptions`` options are parsed to set other options: see `-X
Options`_.

More complete example modifying the configuration before calling
``PyConfig_Read()``, and then modify the read configuration::

PyInitError init_python(const char *program_name)
{
PyInitError err;
PyConfig config = PyConfig_INIT;

/* Set the program name before reading the configuraton
(decode byte string from the locale encoding) */
err = PyConfig_SetBytesString(&config.program_name,
program_name);
if (_Py_INIT_FAILED(err)) {
goto fail;
}

/* Read all configuration at once */
err = PyConfig_Read(&config);
if (_Py_INIT_FAILED(err)) {
goto fail;
}

/* Append our custom search path to sys.path */
err = PyWideStringList_Append(&config.module_search_paths,
L"/path/to/more/modules");
if (_Py_INIT_FAILED(err)) {
goto fail;
}

/* Override executable computed by PyConfig_Read() */
err = PyConfig_SetString(&config, &config.executable, L"my_executable");
if (_Py_INIT_FAILED(err)) {
goto fail;
}

err = Py_InitializeFromConfig(&config);

/* Py_InitializeFromConfig() copied config which must now be
cleared to release memory */
PyConfig_Clear(&config);

return err;

fail:
PyConfig_Clear(&config);
Py_ExitInitError(err);
}

.. note::
``PyConfig`` does not have any field for extra inittab functions:
``PyImport_AppendInittab()`` and ``PyImport_ExtendInittab()``
functions are still relevant (and can be called before Python
initialization).


Initialization with constant PyConfig
-------------------------------------

When no ``PyConfig`` method is used but only
``Py_InitializeFromConfig()``, the caller is responsible for managing
``PyConfig`` memory. In that case, constant strings and constant string
lists can be used to avoid dynamically allocated memory. It can be used
for most simple configurations.

Example of Python initialization enabling the isolated mode::

PyConfig config = PyConfig_INIT;
config.isolated = 1;

PyInitError err = Py_InitializeFromConfig(&config);
if (Py_INIT_FAILED(err)) {
Py_ExitInitError(err);
}
/* ... use Python API here ... */
Py_Finalize();

``PyConfig_Clear()`` is not needed in this example since ``config`` does
not contain any dynamically allocated string:
``Py_InitializeFromConfig`` is responsible to fill other fields and
manage the memory.

For convenience, two other functions are provided for constant
``PyConfig``:

* ``PyInitError Py_InitializeFromArgs(const PyConfig *config, int
argc, wchar_t **argv)``
* ``PyInitError Py_InitializeFromBytesArgs(const PyConfig *config, int
argc, char **argv)``

They be called with *config* set to ``NULL``. The caller of these
functions is responsible to handle failure or exit using
``Py_INIT_FAILED()`` and ``Py_ExitInitError()``.


Path Configuration
------------------

``PyConfig`` contains multiple fields for the path configuration:

* Path configuration input fields:

* ``home``
* ``module_search_path_env``
* ``pathconfig_warnings``

* Path configuration output fields:

* ``dll_path`` (Windows only)
* ``exec_prefix``
* ``executable``
* ``prefix``
* ``use_module_search_paths``, ``module_search_paths``

Set ``pathconfig_warnings`` to 0 to suppress warnings when computing the
path configuration.

It is possible to completely ignore the function computing the default
path configuration by setting explicitly all path configuration output
fields listed above. A string is considered as set even if it's an empty
string. ``module_search_paths`` is considered as set if
``use_module_search_paths`` is set to 1. In this case, path
configuration input fields are ignored as well.

If ``base_prefix`` or ``base_exec_prefix`` fields are not set, they
inherit their value from ``prefix`` and ``exec_prefix`` respectively.

If ``site_import`` is non-zero, ``sys.path`` can be modified by the
``site`` module. For example, if ``user_site_directory`` is non-zero,
the user site directory is added to ``sys.path`` (if it exists).


Isolate Python
--------------

The default configuration is designed to behave as a regular Python.
To embed Python into an application, it's possible to tune the
configuration to better isolated the embedded Python from the system:

* Set ``isolated`` to 1 to ignore environment variables and not prepend
the current directory to ``sys.path``.
* Set the `Path Configuration`_ ("output fields") to ignore the function
computing the default path configuration.


Py_BytesMain()
--------------

Python 3.7 provides a high-level ``Py_Main()`` function which requires
to pass command line arguments as ``wchar_t*`` strings. It is
non-trivial to use the correct encoding to decode bytes. Python has its
own set of issues with C locale coercion and UTF-8 Mode.

This PEP adds a new ``Py_BytesMain()`` function which takes command line
arguments as bytes::

int Py_BytesMain(int argc, char **argv)

Py_RunMain()
------------

The new ``Py_RunMain()`` function executes the command
(``PyConfig.run_command``), the script (``PyConfig.run_filename``) or
the module (``PyConfig.run_module``) specified on the command line or in
the configuration, and then finalizes Python. It returns an exit status
that can be passed to the ``exit()`` function.

Example of customized Python in isolated mode::

#include <Python.h>

int main(int argc, char *argv[])
{
PyConfig config = PyConfig_INIT;
config.isolated = 1;

PyInitError err = Py_InitializeFromBytesArgs(&config, argc, argv);
if (Py_INIT_FAILED(err)) {
Py_ExitInitError(err);
}

/* put more configuration code here if needed */

return Py_RunMain();
}

The example is a basic implementation of the "System Python Executable"
discussed in PEP 432.


Memory allocations and Py_DecodeLocale()
----------------------------------------

Python memory allocation functions like ``PyMem_RawMalloc()`` must not
be used before Python pre-initialization, whereas calling directly
``malloc()`` and ``free()`` is always safe.

For ``PyPreConfig`` and constant ``PyConfig``, the caller is responsible
to manage dynamically allocated memory; constant strings and constant
string lists can be used to avoid memory allocations.

Dynamic ``PyConfig`` requires to call ``PyConfig_Clear()`` to release
memory.

``Py_DecodeLocale()`` must not be called before the pre-initialization.


Backwards Compatibility
=======================

This PEP only adds a new API: it leaves the existing API unchanged and
has no impact on the backwards compatibility.

The implementation ensures that the existing API is compatible with the
new API. For example, ``PyConfig`` uses the value of global
configuration variables as default values.


Annex: Python Configuration
===========================

Priority and Rules
------------------

Priority of configuration parameters, highest to lowest:

* ``PyConfig``
* ``PyPreConfig``
* Configuration files
* Command line options
* Environment variables
* Global configuration variables

Priority of warning options, highest to lowest:

* ``PyConfig.warnoptions``
* ``PyConfig.dev_mode`` (add ``"default"``)
* ``PYTHONWARNINGS`` environment variables
* ``-W WARNOPTION`` command line argument
* ``PyConfig.bytes_warning`` (add ``"error::BytesWarning"`` if greater
than 1, or add ``"default::BytesWarning``)

Rules on ``PyConfig`` parameters:

* If ``isolated`` is non-zero, ``use_environment`` and
``user_site_directory`` are set to 0.
* If ``legacy_windows_fs_encoding`` is non-zero, ``utf8_mode`` is set to
0.
* If ``dev_mode`` is non-zero, ``allocator`` is set to ``"debug"``,
``faulthandler`` is set to 1, and ``"default"`` filter is added to
``warnoptions``. But the ``PYTHONMALLOC`` environment variable has the
priority over ``dev_mode`` to set the memory allocator.
* If ``base_prefix`` is not set, it inherits ``prefix`` value.
* If ``base_exec_prefix`` is not set, it inherits ``exec_prefix`` value.
* If the ``python._pth`` configuration file is present, ``isolated`` is
set to 1 and ``site_import`` is set to 0; but ``site_import`` is set
to 1 if ``python._pth`` contains ``import site``.

Rules on ``PyConfig`` and ``PyPreConfig`` parameters:

* If ``PyPreConfig.legacy_windows_fs_encoding`` is non-zero,
set ``PyConfig.utf8_mode`` to 0, set ``PyConfig.filesystem_encoding``
to ``mbcs``, and set ``PyConfig.filesystem_errors`` to ``replace``.

Configuration Files
-------------------

Python configuration files:

* ``pyvenv.cfg``
* ``python._pth`` (Windows only)
* ``pybuilddir.txt`` (Unix only)

Global Configuration Variables
------------------------------

Global configuration variables mapped to ``PyPreConfig`` fields:

======================================== ================================
Variable Field
======================================== ================================
``Py_IgnoreEnvironmentFlag`` ``use_environment`` (NOT)
``Py_IsolatedFlag`` ``isolated``
``Py_LegacyWindowsFSEncodingFlag`` ``legacy_windows_fs_encoding``
``Py_UTF8Mode`` ``utf8_mode``
======================================== ================================

(NOT) means that the ``PyPreConfig`` value is the oposite of the global
configuration variable value.

Global configuration variables mapped to ``PyConfig`` fields:

======================================== ================================
Variable Field
======================================== ================================
``Py_BytesWarningFlag`` ``bytes_warning``
``Py_DebugFlag`` ``parser_debug``
``Py_DontWriteBytecodeFlag`` ``write_bytecode`` (NOT)
``Py_FileSystemDefaultEncodeErrors`` ``filesystem_errors``
``Py_FileSystemDefaultEncoding`` ``filesystem_encoding``
``Py_FrozenFlag`` ``pathconfig_warnings`` (NOT)
``Py_HasFileSystemDefaultEncoding`` ``filesystem_encoding``
``Py_HashRandomizationFlag`` ``use_hash_seed``, ``hash_seed``
``Py_IgnoreEnvironmentFlag`` ``use_environment`` (NOT)
``Py_InspectFlag`` ``inspect``
``Py_InteractiveFlag`` ``interactive``
``Py_IsolatedFlag`` ``isolated``
``Py_LegacyWindowsStdioFlag`` ``legacy_windows_stdio``
``Py_NoSiteFlag`` ``site_import`` (NOT)
``Py_NoUserSiteDirectory`` ``user_site_directory`` (NOT)
``Py_OptimizeFlag`` ``optimization_level``
``Py_QuietFlag`` ``quiet``
``Py_UnbufferedStdioFlag`` ``buffered_stdio`` (NOT)
``Py_VerboseFlag`` ``verbose``
``_Py_HasFileSystemDefaultEncodeErrors`` ``filesystem_errors``
======================================== ================================

(NOT) means that the ``PyConfig`` value is the oposite of the global
configuration variable value.

``Py_LegacyWindowsFSEncodingFlag`` and ``Py_LegacyWindowsStdioFlag`` are
only available on Windows.

Command Line Arguments
----------------------

Usage::

python3 [options]
python3 [options] -c COMMAND
python3 [options] -m MODULE
python3 [options] SCRIPT


Command line options mapped to pseudo-action on ``PyPreConfig`` fields:

================================ ================================
Option ``PyConfig`` field
================================ ================================
``-E`` ``use_environment = 0``
``-I`` ``isolated = 1``
``-X dev`` ``dev_mode = 1``
``-X utf8`` ``utf8_mode = 1``
``-X utf8=VALUE`` ``utf8_mode = VALUE``
================================ ================================

Command line options mapped to pseudo-action on ``PyConfig`` fields:

================================ ================================
Option ``PyConfig`` field
================================ ================================
``-b`` ``bytes_warning++``
``-B`` ``write_bytecode = 0``
``-c COMMAND`` ``run_command = COMMAND``
``--check-hash-based-pycs=MODE`` ``_check_hash_pycs_mode = MODE``
``-d`` ``parser_debug++``
``-E`` ``use_environment = 0``
``-i`` ``inspect++`` and ``interactive++``
``-I`` ``isolated = 1``
``-m MODULE`` ``run_module = MODULE``
``-O`` ``optimization_level++``
``-q`` ``quiet++``
``-R`` ``use_hash_seed = 0``
``-s`` ``user_site_directory = 0``
``-S`` ``site_import``
``-t`` ignored (kept for backwards compatibility)
``-u`` ``buffered_stdio = 0``
``-v`` ``verbose++``
``-W WARNING`` add ``WARNING`` to ``warnoptions``
``-x`` ``skip_source_first_line = 1``
``-X OPTION`` add ``OPTION`` to ``xoptions``
================================ ================================

``-h``, ``-?`` and ``-V`` options are handled without ``PyConfig``.

-X Options
----------

-X options mapped to pseudo-action on ``PyConfig`` fields:

================================ ================================
Option ``PyConfig`` field
================================ ================================
``-X dev`` ``dev_mode = 1``
``-X faulthandler`` ``faulthandler = 1``
``-X importtime`` ``import_time = 1``
``-X pycache_prefix=PREFIX`` ``pycache_prefix = PREFIX``
``-X showalloccount`` ``show_alloc_count = 1``
``-X showrefcount`` ``show_ref_count = 1``
``-X tracemalloc=N`` ``tracemalloc = N``
================================ ================================

Environment Variables
---------------------

Environment variables mapped to ``PyPreConfig`` fields:

================================= =============================================
Variable ``PyPreConfig`` field
================================= =============================================
``PYTHONCOERCECLOCALE`` ``coerce_c_locale``, ``coerce_c_locale_warn``
``PYTHONDEVMODE`` ``dev_mode``
``PYTHONLEGACYWINDOWSFSENCODING`` ``legacy_windows_fs_encoding``
``PYTHONMALLOC`` ``allocator``
``PYTHONUTF8`` ``utf8_mode``
================================= =============================================

Environment variables mapped to ``PyConfig`` fields:

================================= ====================================
Variable ``PyConfig`` field
================================= ====================================
``PYTHONDEBUG`` ``parser_debug``
``PYTHONDEVMODE`` ``dev_mode``
``PYTHONDONTWRITEBYTECODE`` ``write_bytecode``
``PYTHONDUMPREFS`` ``dump_refs``
``PYTHONEXECUTABLE`` ``program_name``
``PYTHONFAULTHANDLER`` ``faulthandler``
``PYTHONHASHSEED`` ``use_hash_seed``, ``hash_seed``
``PYTHONHOME`` ``home``
``PYTHONINSPECT`` ``inspect``
``PYTHONIOENCODING`` ``stdio_encoding``, ``stdio_errors``
``PYTHONLEGACYWINDOWSSTDIO`` ``legacy_windows_stdio``
``PYTHONMALLOCSTATS`` ``malloc_stats``
``PYTHONNOUSERSITE`` ``user_site_directory``
``PYTHONOPTIMIZE`` ``optimization_level``
``PYTHONPATH`` ``module_search_path_env``
``PYTHONPROFILEIMPORTTIME`` ``import_time``
``PYTHONPYCACHEPREFIX,`` ``pycache_prefix``
``PYTHONTRACEMALLOC`` ``tracemalloc``
``PYTHONUNBUFFERED`` ``buffered_stdio``
``PYTHONVERBOSE`` ``verbose``
``PYTHONWARNINGS`` ``warnoptions``
================================= ====================================

``PYTHONLEGACYWINDOWSFSENCODING`` and ``PYTHONLEGACYWINDOWSSTDIO`` are
specific to Windows.


Annex: Python 3.7 API
=====================

Python 3.7 has 4 functions in its C API to initialize and finalize
Python:

* ``Py_Initialize()``, ``Py_InitializeEx()``: initialize Python
* ``Py_Finalize()``, ``Py_FinalizeEx()``: finalize Python

Python 3.7 can be configured using `Global Configuration Variables`_,
`Environment Variables`_, and the following functions:

* ``PyImport_AppendInittab()``
* ``PyImport_ExtendInittab()``
* ``PyMem_SetAllocator()``
* ``PyMem_SetupDebugHooks()``
* ``PyObject_SetArenaAllocator()``
* ``Py_SetPath()``
* ``Py_SetProgramName()``
* ``Py_SetPythonHome()``
* ``Py_SetStandardStreamEncoding()``
* ``PySys_AddWarnOption()``
* ``PySys_AddXOption()``
* ``PySys_ResetWarnOptions()``

There is also a high-level ``Py_Main()`` function.


Python Issues
=============

Issues that will be fixed by this PEP, directly or indirectly:

* `bpo-1195571 <https://bugs.python.org/issue1195571>`_: "simple
callback system for Py_FatalError"
* `bpo-11320 <https://bugs.python.org/issue11320>`_:
"Usage of API method Py_SetPath causes errors in Py_Initialize()
(Posix ony)"
* `bpo-13533 <https://bugs.python.org/issue13533>`_: "Would like
Py_Initialize to play friendly with host app"
* `bpo-14956 <https://bugs.python.org/issue14956>`_: "custom PYTHONPATH
may break apps embedding Python"
* `bpo-19983 <https://bugs.python.org/issue19983>`_: "When interrupted
during startup, Python should not call abort() but exit()"
* `bpo-22213 <https://bugs.python.org/issue22213>`_: "Make pyvenv style
virtual environments easier to configure when embedding Python". This
PEP more or
* `bpo-22257 <https://bugs.python.org/issue22257>`_: "PEP 432: Redesign
the interpreter startup sequence"
* `bpo-29778 <https://bugs.python.org/issue29778>`_: "_Py_CheckPython3
uses uninitialized dllpath when embedder sets module path with
Py_SetPath"
* `bpo-30560 <https://bugs.python.org/issue30560>`_: "Add
Py_SetFatalErrorAbortFunc: Allow embedding program to handle fatal
errors".
* `bpo-31745 <https://bugs.python.org/issue31745>`_: "Overloading
"Py_GetPath" does not work"
* `bpo-32573 <https://bugs.python.org/issue32573>`_: "All sys attributes
(.argv, ...) should exist in embedded environments".
* `bpo-34725 <https://bugs.python.org/issue34725>`_:
"Py_GetProgramFullPath() odd behaviour in Windows"
* `bpo-36204 <https://bugs.python.org/issue36204>`_: "Deprecate calling
Py_Main() after Py_Initialize()? Add Py_InitializeFromArgv()?"
* `bpo-33135 <https://bugs.python.org/issue33135>`_: "Define field
prefixes for the various config structs". The PEP now defines well
how warnings options are handled.

Issues of the PEP implementation:

* `bpo-16961 <https://bugs.python.org/issue16961>`_: "No regression
tests for -E and individual environment vars"
* `bpo-20361 <https://bugs.python.org/issue20361>`_: "-W command line
options and PYTHONWARNINGS environmental variable should not override
-b / -bb command line options"
* `bpo-26122 <https://bugs.python.org/issue26122>`_: "Isolated mode
doesn't ignore PYTHONHASHSEED"
* `bpo-29818 <https://bugs.python.org/issue29818>`_:
"Py_SetStandardStreamEncoding leads to a memory error in debug mode"
* `bpo-31845 <https://bugs.python.org/issue31845>`_:
"PYTHONDONTWRITEBYTECODE and PYTHONOPTIMIZE have no effect"
* `bpo-32030 <https://bugs.python.org/issue32030>`_: "PEP 432: Rewrite
Py_Main()"
* `bpo-32124 <https://bugs.python.org/issue32124>`_: "Document functions
safe to be called before Py_Initialize()"
* `bpo-33042 <https://bugs.python.org/issue33042>`_: "New 3.7 startup
sequence crashes PyInstaller"
* `bpo-33932 <https://bugs.python.org/issue33932>`_: "Calling
Py_Initialize() twice now triggers a fatal error (Python 3.7)"
* `bpo-34008 <https://bugs.python.org/issue34008>`_: "Do we support
calling Py_Main() after Py_Initialize()?"
* `bpo-34170 <https://bugs.python.org/issue34170>`_: "Py_Initialize():
computing path configuration must not have side effect (PEP 432)"
* `bpo-34589 <https://bugs.python.org/issue34589>`_: "Py_Initialize()
and Py_Main() should not enable C locale coercion"
* `bpo-34639 <https://bugs.python.org/issue34639>`_:
"PYTHONCOERCECLOCALE is ignored when using -E or -I option"
* `bpo-36142 <https://bugs.python.org/issue36142>`_: "Add a new
_PyPreConfig step to Python initialization to setup memory allocator
and encodings"
* `bpo-36202 <https://bugs.python.org/issue36202>`_: "Calling
Py_DecodeLocale() before _PyPreConfig_Write() can produce mojibake"
* `bpo-36301 <https://bugs.python.org/issue36301>`_: "Add
_Py_PreInitialize() function"
* `bpo-36443 <https://bugs.python.org/issue36443>`_: "Disable
coerce_c_locale and utf8_mode by default in _PyPreConfig?"
* `bpo-36444 <https://bugs.python.org/issue36444>`_: "Python
initialization: remove _PyMainInterpreterConfig"
* `bpo-36471 <https://bugs.python.org/issue36471>`_: "PEP 432, PEP 587:
Add _Py_RunMain()"
* `bpo-36763 <https://bugs.python.org/issue36763>`_: "PEP 587: Rework
initialization API to prepare second version of the PEP"
* `bpo-36775 <https://bugs.python.org/issue36775>`_: "Rework filesystem
codec implementation"
* `bpo-36900 <https://bugs.python.org/issue36900>`_: "Use _PyCoreConfig
rather than global configuration variables"

Issues related to this PEP:

* `bpo-12598 <https://bugs.python.org/issue12598>`_: "Move sys variable
initialization from import.c to sysmodule.c"
* `bpo-15577 <https://bugs.python.org/issue15577>`_: "Real argc and argv
in embedded interpreter"
* `bpo-16202 <https://bugs.python.org/issue16202>`_: "sys.path[0]
security issues"
* `bpo-18309 <https://bugs.python.org/issue18309>`_: "Make python
slightly more relocatable"
* `bpo-25631 <https://bugs.python.org/issue25631>`_: "Segmentation fault
with invalid Unicode command-line arguments in embedded Python"
* `bpo-26007 <https://bugs.python.org/issue26007>`_: "Support embedding
the standard library in an executable"
* `bpo-31210 <https://bugs.python.org/issue31210>`_: "Can not import
modules if sys.prefix contains DELIM".
* `bpo-31349 <https://bugs.python.org/issue31349>`_: "Embedded
initialization ignores Py_SetProgramName()"
* `bpo-33919 <https://bugs.python.org/issue33919>`_: "Expose
_PyCoreConfig structure to Python"
* `bpo-35173 <https://bugs.python.org/issue35173>`_: "Re-use already
existing functionality to allow Python 2.7.x (both embedded and
standalone) to locate the module path according to the shared library"


Version History
===============

* Version 3:

* ``PyConfig``: Add ``configure_c_stdio`` and ``parse_argv``,
rename ``_frozen`` to ``pathconfig_warnings``.
* Rename functions using bytes strings and wide character strings. For
example, ``Py_PreInitializeFromWideArgs`` becomes
``Py_PreInitializeFromArgs``, and ``PyConfig_SetArgv`` becomes
``PyConfig_SetBytesArgv``.
* Add ``PyWideStringList_Insert()`` function.
* New "Path configuration", "Isolate Python", "Python Issues"
and "Version History" sections.
* ``PyConfig_SetString()`` and ``PyConfig_SetBytesString()`` now
requires the configuration as the first argument.
* Rename ``Py_UnixMain()`` to ``Py_BytesMain()``

* Version 2: Add ``PyConfig`` methods (ex: ``PyConfig_Read()``), add
``PyWideStringList_Append()``, rename ``PyWideCharList`` to
``PyWideStringList``.
* Version 1: Initial version.

Copyright
=========

This document has been placed in the public domain.
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Gregory Szorc

unread,
May 16, 2019, 12:40:37 AM5/16/19
to Victor Stinner, python-dev
I saw your request for feedback on Twitter a few days back and found
this thread.

This PEP is of interest to me because I'm the maintainer of PyOxidizer -
a project for creating single file executables embedding Python. As part
of hacking on PyOxidizer, I will admit to grumbling about the current
state of the configuration and initialization mechanisms. The reliance
on global variables and the haphazard way in which you must call certain
functions before others was definitely a bit frustrating to deal with.

I don't want to wade into too much bikeshedding in my review. I'll let
the professionals deal with things like naming :) Also, I haven't read
previous posts about this PEP. Apologies if my comments bring up old topics.

Let's get on with the review...

My most important piece of feedback is: thank you for tackling this!
Your work to shore up the inner workings of interpreter state and
management is a big deal on multiple dimensions. I send my sincere
gratitude.

Overall, I'm very happy with the state of the proposal. Better than what
we currently have on nearly every dimension. When reading my feedback,
please keep in mind that I'm in like 95% agreement with the proposal as is.

The following paragraphs detail points of feedback.

PyPreConfig_INIT and PyConfig_INIT as macros that return a struct feel
weird to me. Specifically, the `PyPreConfig preconfig =
PyPreConfig_INIT;` pattern doesn't feel right. I'm sort of OK with these
being implemented as macros. But I think they should look like function
calls so the door is open to converting them to function calls in the
future. An argument to make them actual function calls today is to
facilitate better FFI interop. As it stands, non-C/C++ bindings to the
API will need to reimplement the macro's logic. That might be simple
today. But who knows what complexity may be added in years ahead. An
opaque function implementation future proofs the API.

PyPreConfig.allocator being a char* seems a bit weird. Does this imply
having to use strcmp() to determine which allocator to use? Perhaps the
allocator setting should be an int mapping to a constant instead?
Relatedly, how are custom allocators registered? e.g. from Rust, I want
to use Rust's allocator. How would I do that in this API? Do I still
need to call PyMem_SetAllocator()? I thought a point of this proposal
was to consolidate per-interpreter config settings?

I'm a little confused about the pre-initialization functions that take
command arguments. Is this intended to only be used for parsing the
arguments that `python` recognizes? Presumably a custom application
embedding Python would never use these APIs unless it wants to emulate
the behavior of `python`? (I suppose this can be clarified in the API
docs once this is implemented.)

What about PyImport_FrozenModules? This is a global variable related to
Python initialization (it contains _frozen_importlib and
_frozen_importlib_external) but it is not accounted for in the PEP. I
rely on this struct in PyOxidizer to replace the importlib modules with
custom versions so we can do 0-copy in-memory import of Python bytecode
for the entirety of the standard library. Should the PyConfig have a
reference to the _frozen[] to use? Should the _frozen struct be made
part of the public API?

The PEP mentions a private PyConfig._install_importlib member. I'm
curious what this is because it may be relevant to PyOxidizer. FWIW I
/might/ be interested in a mechanism to better control importlib
initialization because PyOxidizer is currently doing dirty things at
run-time to register the custom 0-copy meta path importer. I /think/ my
desired API would be a mechanism to control the name(s) of the frozen
module(s) to use to bootstrap importlib. Or there would be a way to
register the names of additional frozen modules to import and run as
part of initializing importlib (before any .py-based stdlib modules are
imported). Then PyOxidizer wouldn't need to hack up the source code to
importlib, compile custom bytecode, and inject it via
PyImport_FrozenModules. I concede this may be out of scope for the PEP.
But if the API is being reworked, I'd certainly welcome making it easier
for tools like PyOxidizer to work their crazy module importing magic :)

I really like the new Py_RunMain() API and associated PyConfig members.
I also invented this wheel in PyOxidizer and the new API should result
in me deleting some code that I wish I didn't have to write in the first
place :)

Since I mentioned PyOxidizer a lot, you may want to take a gander at
https://github.com/indygreg/PyOxidizer/blob/64514f862b57846801f9ae4af5968e2e4a541ab7/pyoxidizer/src/pyembed/pyinterp.rs.
That's the code I wrote for embedding a Python interpreter in Rust. I
invented a data structure for representing a Python interpreter
configuration. And the similarities to PyConfig are striking. I think
that's a good sign :) It might be useful to read through that file -
especially the init function (line with `pub fn init`) to see if
anything I'm doing pushes the boundaries of the proposed API. Feel free
to file GitHub issues if you see obvious bugs with PyOxidizer's Python
initialization logic while you're at it :)

Also, one thing that tripped me up a few times when writing PyOxidizer
was managing the lifetimes of memory that various global variables point
to. The short version is I was setting Python globals to point to memory
allocated by Rust and I managed to crash Python by freeing memory before
it should have been. Since the new API seems to preserve support for
global variables, I'm curious if there are changes to how memory must be
managed. It would be really nice to get to a state where you only need
to ensure the PyConfig instance and all its referenced memory only needs
to outlive the interpreter it configures. That would make the memory
lifetimes story intuitive and easily compatible with Rust.

One feature that I think is missing from the proposal (and this is
related to the previous paragraph) is the ability to prevent config
fallback to things that aren't PyConfig and PyPreConfig. There is
`PyConfig.parse_argv` to disable command argument parsing and
`PyConfig.use_environment` to disable environment variable fallback. But
AFAICT there is no option to disable configuration file fallback nor
global variable fallback. As someone who embeds Python and loves total
control, I would absolutely love to opt in to a "Py[Pre]Config only"
mode where those structs are the only things that control interpreter
behavior. I'd opt in to that in a heartbeat if it supported all the
customization that PyOxidizer requires!

... and I think that's all the points of feedback I have!

Again, this proposal is terrific overall and so much better than what we
have today. The wall of text I just wrote is disproportionate in size to
the quality of the PEP. I almost feel bad writing so much feedback for
such a terrific PEP ;)

Excellent work, Victor. I can't wait to see these changes materialize!

Victor Stinner

unread,
May 16, 2019, 8:04:50 AM5/16/19
to Gregory Szorc, python-dev
(Le jeu. 16 mai 2019 à 06:34, Gregory Szorc <gregor...@gmail.com> a écrit :
> > I know that the PEP is long, but well, it's a complex topic, and I
> > chose to add many examples to make the API easier to understand.
>
> I saw your request for feedback on Twitter a few days back and found
> this thread.
>
> This PEP is of interest to me because I'm the maintainer of PyOxidizer -
> a project for creating single file executables embedding Python.

Aha, interesting :-)

> As part
> of hacking on PyOxidizer, I will admit to grumbling about the current
> state of the configuration and initialization mechanisms. The reliance
> on global variables and the haphazard way in which you must call certain
> functions before others was definitely a bit frustrating to deal with.

Yeah, that's what I tried to explain in the PEP 587 Rationale.


> My most important piece of feedback is: thank you for tackling this!
> Your work to shore up the inner workings of interpreter state and
> management is a big deal on multiple dimensions. I send my sincere
> gratitude.

You're welcome ;-)


> PyPreConfig_INIT and PyConfig_INIT as macros that return a struct feel
> weird to me. Specifically, the `PyPreConfig preconfig =
> PyPreConfig_INIT;` pattern doesn't feel right. I'm sort of OK with these
> being implemented as macros. But I think they should look like function
> calls so the door is open to converting them to function calls in the
> future.

Ah yes, I noticed that some projects can only import symbols, not use
directly the C API. You're right that such macro can be an issue.

Would you be ok with a "PyConfig_Init(PyConfig *config);" function
which would initialize all fields to theire default values? Maybe
PyConfig_INIT should be renamed to PyConfig_STATIC_INIT.

You can find a similar API for pthread mutex, there is a init function
*and* a macro for static initialization:

int pthread_mutex_init(pthread_mutex_t *restrict mutex,
const pthread_mutexattr_t *restrict attr);

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;


> PyPreConfig.allocator being a char* seems a bit weird. Does this imply
> having to use strcmp() to determine which allocator to use? Perhaps the
> allocator setting should be an int mapping to a constant instead?

Yes, _PyMem_SetupAllocators() uses strcmp(). There are 6 supported values:

* "default"
* "debug"
* "pymalloc"
* "pymalloc_debug"
* "malloc"
* "malloc_debug"

Note: pymalloc and pymalloc_debug are not supported if Python is
explicitly configure using --without-pymalloc.

I think that I chose to use string because the feature was first
implemented using an environment variable.

Actually, I *like* the idea of avoiding string in PyPreConfig because
a string might need memory allocation, whereas the pre-initialization
is supposed to configure memory allocation :-) I will change the type
to an enum.


> Relatedly, how are custom allocators registered? e.g. from Rust, I want
> to use Rust's allocator. How would I do that in this API? Do I still
> need to call PyMem_SetAllocator()?

By default, PyPreConfig.allocator is set to NULL. In that case,
_PyPreConfig_Write() leaves the memory allocator unmodified.

As PyImport_AppendInittab() and PyImport_ExtendInittab(),
PyMem_SetAllocator() remains relevant and continue to work as
previously.

Example to set your custom allocator:
---
PyInitError err = Py_PreInitialize(NULL);
if (Py_INIT_FAILED(err)) {
Py_ExitInitError(err);
}
PyMem_SetAllocator(PYMEM_DOMAIN_MEM, my_cool_allocator);
---

Well, it also works in the opposite order, but I prefer to call
PyMem_SetAllocator() after the pre-initialization to make it more
explicit :-)
---
PyMem_SetAllocator(PYMEM_DOMAIN_MEM, my_cool_allocator);
PyInitError err = Py_PreInitialize(NULL);
if (Py_INIT_FAILED(err)) {
Py_ExitInitError(err);
}
---


> I thought a point of this proposal
> was to consolidate per-interpreter config settings?

Right. But PyMem_SetAllocator() uses PyMemAllocatorDomain enum and
PyMemAllocatorEx structure which are not really "future-proof". For
example, I already replaced PyMemAllocator with PyMemAllocatorEx to
add "calloc". We might extend it later one more time to add allocator
with a specific memory alignement (even if the issue is now closed):

https://bugs.python.org/issue18835

I consider that PyMem_SetAllocator() is too specific to be added to PyPreConfig.

Are you fine with that?


> I'm a little confused about the pre-initialization functions that take
> command arguments. Is this intended to only be used for parsing the
> arguments that `python` recognizes? Presumably a custom application
> embedding Python would never use these APIs unless it wants to emulate
> the behavior of `python`? (I suppose this can be clarified in the API
> docs once this is implemented.)

Yes, Py_PreInitializeFromArgs() parses -E, -I, -X dev and -X utf8 options:
https://www.python.org/dev/peps/pep-0587/#command-line-arguments

Extract of my "Isolate Python" section:

"The default configuration is designed to behave as a regular Python.
To embed Python into an application, it's possible to tune the
configuration to better isolated the embedded Python from the system:

(...)"

https://www.python.org/dev/peps/pep-0587/#isolate-python

I wasn't sure if I should mention parse_argv=0 in this section or not.
According to what you wrote, I should :-)

*Maybe* rather than documenting how to isolate Python, we might even
provide a function for that?

void PyConfig_Isolate(PyConfig *config)
{ config->isolated = 1; config->parse_argv = 0; }

I didn't propose that because so far, I'm not sure that everybody has
the same opinion on what "isolation" means. Does it only mean ignore
environment variables? Or also ignore configuration files? What about
the path configuration?

That's why I propose to start without such opiniated
PyConfig_Isolate() function :-)


> What about PyImport_FrozenModules? This is a global variable related to
> Python initialization (it contains _frozen_importlib and
> _frozen_importlib_external) but it is not accounted for in the PEP.
> I rely on this struct in PyOxidizer to replace the importlib modules with
> custom versions so we can do 0-copy in-memory import of Python bytecode
> for the entirety of the standard library. Should the PyConfig have a
> reference to the _frozen[] to use? Should the _frozen struct be made
> part of the public API?


First of all, PEP 587 is designed to be easily extendable :-) I added
_config_version field to even provide backward ABI compatibility.

Honestly, I never looked at PyImport_FrozenModules. It seems to fall
into the same category than "importtab": kind of corner case use case
which cannot be easily generalized into PyConfig structure.

As I would say the same that what I wrote about
PyImport_AppendInittab(): PyImport_FrozenModules symbol remains
relevant and continue to work as expected. I understand that it must
be set before the initialization, and it seems safe to set it even
before the pre-initialization since it's a static array.

Note: I renamed PyConfig._frozen to PyConfig.pathconfig_warnings: it's
an int and it's unrelated to PyImport_FrozenModules.


> I rely on this struct in PyOxidizer to replace the importlib modules with
> custom versions so we can do 0-copy in-memory import of Python bytecode
> for the entirety of the standard library.

Wait, that sounds like a cool feature! Would it make sense to make
this feature upstream? If yes, maybe send a separated email to
python-dev and/or open an issue.


> The PEP mentions a private PyConfig._install_importlib member. I'm
> curious what this is because it may be relevant to PyOxidizer. FWIW I
> /might/ be interested in a mechanism to better control importlib
> initialization because PyOxidizer is currently doing dirty things at
> run-time to register the custom 0-copy meta path importer. I /think/ my
> desired API would be a mechanism to control the name(s) of the frozen
> module(s) to use to bootstrap importlib. Or there would be a way to
> register the names of additional frozen modules to import and run as
> part of initializing importlib (before any .py-based stdlib modules are
> imported). Then PyOxidizer wouldn't need to hack up the source code to
> importlib, compile custom bytecode, and inject it via
> PyImport_FrozenModules. I concede this may be out of scope for the PEP.
> But if the API is being reworked, I'd certainly welcome making it easier
> for tools like PyOxidizer to work their crazy module importing magic :)

PEP 587 is an incomplete implementation of the PEP 432. We are
discussing with Nick Coghlan, Steve Dower and some others about having
2 phases for the Python initialization: "core" and "main". The "core"
phase would provide a bare minimum working Python: builtin exceptions
and types, maybe builtin imports, and that's basically all. It would
allow to configure Python using the newly created interpreter, for
example configure Python by running Python code.

The problem is that these 2 phases are not well defined yet, it's
still under discussion. Nick and me agreed to start with PEP 587 as a
first milestone, and see later how to implement "core" and "main"
phases.

If the private field "_init_main" of the PEP 587 is set to 0,
Py_InitializeFromConfig() stops at the "core" phase (in fact, it's
already implemented!). But I didn't implement yet a
_Py_InitializeMain() function to "finish" the initialization. Let's
say that it exists, we would get:

---
PyConfig config = PyConfig_INIT;
config._init_main = 0;


PyInitError err = Py_InitializeFromConfig(&config);
if (Py_INIT_FAILED(err)) {
Py_ExitInitError(err);
}

/* add your code to customize Python here */
/* calling PyRun_SimpleString() here is safe */

/* finish Python initialization */
PyInitError err = _Py_InitializeMain(&config);
if (Py_INIT_FAILED(err)) {
Py_ExitInitError(err);
}
---

Would it solve your use case?

Sorry, I didn't understand properly what you mean by "controlling the
names of the frozen modules to use to bootstrap importlib".


> I really like the new Py_RunMain() API and associated PyConfig members.
> I also invented this wheel in PyOxidizer and the new API should result
> in me deleting some code that I wish I didn't have to write in the first
> place :)

Great!


> I invented a data structure for representing a Python interpreter
> configuration. And the similarities to PyConfig are striking. I think
> that's a good sign :)

He he :-)

> It might be useful to read through that file -
> especially the init function (line with `pub fn init`) to see if
> anything I'm doing pushes the boundaries of the proposed API. Feel free
> to file GitHub issues if you see obvious bugs with PyOxidizer's Python
> initialization logic while you're at it :)

Your link didn't work, but I found:
https://github.com/indygreg/PyOxidizer/blob/master/pyoxidizer/src/pyembed/pyinterp.rs

"write_modules_directory_env" seems very specific to your needs. Apart
of that, I confirm that PythonConfig is very close to PEP 587
PyConfig! I notice that you also avoided double negation, thanks ;-)


/* Pre-initialization functions we could support:
*
* PyObject_SetArenaAllocator()


* PySys_AddWarnOption()
* PySys_AddXOption()
* PySys_ResetWarnOptions()

*/

Apart PyObject_SetArenaAllocator(), PyConfig implements the 3 other functions.

Again, ss PyMem_SetAllocator(), PyObject_SetArenaAllocator() remains
relevant and can be used with the pre-initialization.

PySys_SetObject("argv", obj) is covered by PyConfig.argv.

PySys_SetObject("argvb", obj): I'm not sure why you are doing that,
it's easy to retrieve sys.argv as bytes, it's now even documented:
https://docs.python.org/dev/library/sys.html#sys.argv

---

Sorry, I'm not an importlib expert. I'm not sure what could be done in
PEP 587 for your specific importlib changes.


> Also, one thing that tripped me up a few times when writing PyOxidizer
> was managing the lifetimes of memory that various global variables point
> to. The short version is I was setting Python globals to point to memory
> allocated by Rust and I managed to crash Python by freeing memory before
> it should have been. Since the new API seems to preserve support for
> global variables, I'm curious if there are changes to how memory must be
> managed. It would be really nice to get to a state where you only need
> to ensure the PyConfig instance and all its referenced memory only needs
> to outlive the interpreter it configures. That would make the memory
> lifetimes story intuitive and easily compatible with Rust.

For the specific case of PyConfig, you have to call
PyConfig_Clear(config) after you called Py_InitializeFromConfig().
Python keeps a copy of your configuration (and it completes the
missing fields, if needed).

I modified a lot of functions to ensure that Python cleanups more
globals at exit in Py_Finalize() and at the end of Py_Main() /
Py_RunMain().

I'm not sure if it replies to your question. If you want a more
specific, can you please give more concrete examples of globals?

There is also an on-going refactoring to move globals into
_PyRuntimeState and PyInterpreterState: change needed to support
subinterpreters, see Eric Snow's PEP 554.


> One feature that I think is missing from the proposal (and this is
> related to the previous paragraph) is the ability to prevent config
> fallback to things that aren't PyConfig and PyPreConfig. There is
> `PyConfig.parse_argv` to disable command argument parsing and
> `PyConfig.use_environment` to disable environment variable fallback. But
> AFAICT there is no option to disable configuration file fallback nor
> global variable fallback.

If you embed Python, you control global configuration variables, no? I
chose to design PyConfig to inherit global configuration variables
because it allows to support both ways to configure Python using a
single implementation.

Would you prefer an explicit PyConfig_SetDefaults(config) which would
completely ignore global configuration variables?

See Lib/test/test_embed.py unit tests which uses Programs/_testembed.c:
https://github.com/python/cpython/blob/master/Programs/_testembed.c

python._pth (Windows only), pybuilddir.txt (Unix only) and pyvenv.cfg
configuration files are only used by the function building the "Path
Configuration".

Using PEP 587, you can now completely ignore this function:
https://www.python.org/dev/peps/pep-0587/#path-configuration


> Again, this proposal is terrific overall and so much better than what we
> have today. The wall of text I just wrote is disproportionate in size to
> the quality of the PEP. I almost feel bad writing so much feedback for
> such a terrific PEP ;)
>
> Excellent work, Victor. I can't wait to see these changes materialize!

Thanks :-)

Thanks for your very interesting feedback. It's really helpful to see
how the API is used "for real" :-)

Victor

Paul Moore

unread,
May 16, 2019, 9:32:37 AM5/16/19
to Victor Stinner, Gregory Szorc, python-dev
On Thu, 16 May 2019 at 13:05, Victor Stinner <vsti...@redhat.com> wrote:
> > PyPreConfig_INIT and PyConfig_INIT as macros that return a struct feel
> > weird to me. Specifically, the `PyPreConfig preconfig =
> > PyPreConfig_INIT;` pattern doesn't feel right. I'm sort of OK with these
> > being implemented as macros. But I think they should look like function
> > calls so the door is open to converting them to function calls in the
> > future.
>
> Ah yes, I noticed that some projects can only import symbols, not use
> directly the C API. You're right that such macro can be an issue.

I've not been following this PEP particularly, but I can confirm that
the Vim bindings for Python also have this restriction (at least on
Windows). To allow binding to the Python interpreter at runtime, and
only on demand, the interface does an explicit
LoadLibrary/GetProcAddress call for each C API function that's used.
That means macros are unavailable (short of wholesale copying of the
Python headers). (It's also a painfully laborious bit of code, and it
would be nice if there were a better way of doing it, but I've never
had the time/motivation to try to improve this, so that's just how
it's stayed).

Paul

Thomas Wouters

unread,
May 16, 2019, 10:12:10 AM5/16/19
to Victor Stinner, Gregory Szorc, python-dev
On Thu, May 16, 2019 at 2:03 PM Victor Stinner <vsti...@redhat.com> wrote:
(Le jeu. 16 mai 2019 à 06:34, Gregory Szorc <gregor...@gmail.com> a écrit :
> > I know that the PEP is long, but well, it's a complex topic, and I
> > chose to add many examples to make the API easier to understand.
>
> I saw your request for feedback on Twitter a few days back and found
> this thread.
>
> This PEP is of interest to me because I'm the maintainer of PyOxidizer -
> a project for creating single file executables embedding Python.

Aha, interesting :-)

Just for some context to everyone: Gregory's PyOxidizer is very similar to Hermetic Python, the thing we use at Google for all Python programs in our mono-repo. We had a short email discussion facilitated by Augie Fackler, who wants to use PyOxidizer for Mercurial, about how Hermetic Python works.

At the PyCon sprints last week, I sat down with Victor, Steve Dower and Eric Snow, showing them how Hermetic Python embeds CPython, and what hoops it has to jump through and what issues we encountered. I think most of those issues would also apply to PyOxidizer, lthough it sounds like Gregory solved some of the issues a bit differently. (Hermetic Python was originally written for Python 2.7, so it doesn't try to deal with importlib's bootstrapping, for example.)

I have some comments and questions about the PEP as well, some of which overlap with Gregory's or Victor's answers:
 
[...]
> PyPreConfig_INIT and PyConfig_INIT as macros that return a struct feel
> weird to me. Specifically, the `PyPreConfig preconfig =
> PyPreConfig_INIT;` pattern doesn't feel right. I'm sort of OK with these
> being implemented as macros. But I think they should look like function
> calls so the door is open to converting them to function calls in the
> future.

Ah yes, I noticed that some projects can only import symbols, not use
directly the C API. You're right that such macro can be an issue.

Would you be ok with a "PyConfig_Init(PyConfig *config);" function
which would initialize all fields to theire default values? Maybe
PyConfig_INIT should be renamed to PyConfig_STATIC_INIT.

You can find a similar API for pthread mutex, there is a init function
*and* a macro for static initialization:

       int pthread_mutex_init(pthread_mutex_t *restrict mutex,
           const pthread_mutexattr_t *restrict attr);

       pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

This was going to be my suggestion as well: for any non-trivial macro, we should have a function for it instead. I would also point out that PEP 587 has a code example that uses PyWideStringList_INIT, but that macro isn't mention anywhere else. The PEP is a bit unclear as to the semantics of PyWideStringList as a whole: the example uses a static array with length, but doesn't explain what would happen with statically allocated data like that if you call the Append or Extend functions. It also doesn't cover how e.g. argv parsing would remove items from the list. (I would also suggest the PEP shouldn't use the term 'list', at least not unqualified, if it isn't an actual Python list.)

I understand the desire to make static allocation and initialisation possible, but since you only need PyWideStringList for PyConfig, not PyPreConfig (which sets the allocator), perhaps having a PyWideStringList_Init(), which copies memory, and PyWideStringList_Clear() to clear it, would be better?
FWIW, I understand the need here: for Hermetic Python, we solved it by adding a new API similar to PyImport_AppendInittab, but instead registering a generic callback hook to be called *during* the initialisation process: after the base runtime and the import mechanism are initialised (at which point you can create Python objects), but before *any* modules are imported. We use that callback to insert a meta-importer that satisfies all stdlib imports from an embedded archive. (Using a meta-importer allows us to bypass the fileysystem altogether, even for what would otherwise be failed path lookups.)

As I mentioned, Hermetic Python was originally written for Python 2.7, but this approach works fine with a frozen importlib as well. The idea of 'core' and 'main' initialisation will likely work for this, as well.
 
Other questions/comments about PEP 587:

I really like the PyInitError struct. I would like more functions to use it, e.g. the PyRrun_* "very high level" API, which currently calls exit() for you on SystemExit, and returns -1 without any other information on error. For those, I'm not entirely sure 'Init' makes sense in the name... but I can live with it.

A couple of things are documented as performing pre-initialisation (PyConfig_SetBytesString, PyConfig_SetBytesArgv). I understand why, but I feel like that might be confusing and error-prone. Would it not be better to have them fail if pre-initialisation hasn't been performed yet?

The buffered_stdio field of PyConfig mentions stdout and stderr, but not stdin. Does it not affect stdin? (Many of the fields could do with a bit more explicit documentation, to be honest.)

The configure_c_stdio field of PyConfig sounds like it might not set sys.stdin/stdout/stderr. That would be new behaviour, but configure_c_stdio doesn't have an existing equivalence, so I'm not sure if that's what you meant or not. 

The dll_path field of PyConfig says "Windows only". Does that meant the struct doesn't have that field except in a Windows build? Or is it ignored, instead? If it doesn't have that field at all, what #define can be used to determine if the PyConfig struct will have it or not?

It feels a bit weird to have both 'inspect' and 'interactive' in PyConfig. Is there a substantive difference between them? Is this just so you can easily tell if any of run_module / run_command / run_filename are set?

"module_search_path_env" sounds like an awkward and somewhat misleading name for the translation of PYTHONPATH. Can we not just use, say, pythonpath_env? I expect the intended audience to know that PYTHONPATH != sys.path.

The module_search_paths field in PyConfig doesn't mention if it's setting or adding to the calculated sys.path. As a whole, the path-calculation bits are a bit under-documented. Since this is an awkward bit of CPython, it wouldn't hurt to mention what "the default path configuration" does (i.e. search for python's home starting at program_name, add fixed subdirs to it, etc.)

Path configuration is mentioned as being able to issue warnings, but it doesn't mention *how*. It can't be the warnings module at this stage. I presume it's just printing to stderr.

Regarding Py_RunMain(): does it do the right thing when something calls PyErr_Print() with SystemExit set? (I mentioned last week that PyErr_Print() will call C's exit() in that case, which is obviously terrible for embedders.)

Regarding isolated_mode and the site module, should we make stronger guarantees about site.py's behaviour being optional? The problem with site is that it does four things that aren't configurable, one of which is usually very desirable, one of which probably doesn't matter to embedders, and two that are iffy: sys.path deduplication and canonicalisation (and fixing up __file__/__cached__ attributes of already-imported modules); adding site-packages directories; looking for and importing sitecustomize.py; executing .pth files. The site module doesn't easily allow doing only some of these. (user-site directories are an exception, as they have their own flag, so I'm not listing that here.) With Hermetic Python we don't care about any of these (for a variety of different reasons), but I'm always a little worried that future Python versions would add behaviour to site that we *do* need.

(As a side note, here's an issue I forgot to talk about last week: with Hermetic Python's meta-importers we have an ancillary regular import hook for correctly dealing with packages with modified __path__, so that for example 'xml' from the embedded stdlib zip can still import '_xmlplus' from the filesystem or a separate zip, and append its __path__ entries to its own. To do that, we use a special prefix to use for the embedded archive meta-importers; we don't want to use a file because they are not files on disk. The prefixes used to be something like '<embedded archive XXX at YYY>'. This works fine, and with correct ordering of import hooks nothing will try to find files named '<embedded archive XXX at YYY>'... until user code imports site for some reason, which then canonicalises sys.path, replacing the magic prefixes with '/path/to/cwd/<embedded archive XXX at YYY>'. We've since made the magic prefixes start with /, but I'm not happy with it :P)

--
Thomas Wouters <tho...@python.org>

Hi! I'm an email virus! Think twice before sending your email to help me spread!

Steve Dower

unread,
May 16, 2019, 11:25:41 AM5/16/19
to Gregory Szorc, python-dev
Thanks for adding your input, Gregory! It's much appreciated.

I'll shuffle your comments around a bit, as I'd rather address the
themes than each individual point.

On 15May2019 2134, Gregory Szorc wrote:
> PyPreConfig_INIT and PyConfig_INIT as macros that return a struct feel
> weird to me. Specifically, the `PyPreConfig preconfig =
> PyPreConfig_INIT;` pattern doesn't feel right.

I see Victor agreed here, but I think this is the right pattern for
PreConfig. The "_INIT" macro pattern is common throughout as a way to
initialize a stack-allocated struct - we really can't change it to be
anything than "{ .member = static value }" without breaking users, but
if you have another way to initialize it correctly then that is fine.
The important factor here is that this struct has to be allocated
_without_ any memory management provided by Python.

That said, I don't particularly like this approach for PyConfig. As you
said:

> Also, one thing that tripped me up a few times when writing PyOxidizer
> was managing the lifetimes of memory that various global variables point
> to.

My preference here is for PreConfig to get far enough that we can
construct the full configuration as regular Python objects (e.g. using
PyDict_New, PyUnicode_FromString, etc.)[1] rather than a brand new C
struct with a new set of functions. That will cost copies/allocations at
startup, but it will also ensure that the lifetime of the configuration
info is managed by the runtime.

I assume you already have code/helpers for constructing Python strings
and basic data structures, so I wonder whether it would be helpful to be
able to use them to create the configuration info?

([1]: Yes, this requires implementation changes so they work
pre-interpreter and cross-interpreter. This work probably has to happen
anyway, so I don't see any harm in assuming it will happen.)

> I'm a little confused about the pre-initialization functions that take
> command arguments. Is this intended to only be used for parsing the
> arguments that `python` recognizes? Presumably a custom application
> embedding Python would never use these APIs unless it wants to emulate
> the behavior of `python`? (I suppose this can be clarified in the API
> docs once this is implemented.)

> One feature that I think is missing from the proposal (and this is
> related to the previous paragraph) is the ability to prevent config
> fallback to things that aren't PyConfig and PyPreConfig.

This is certainly my intent, and I *think* Victor is coming around to it
too ;)

My preference is that embedding by default does not use any information
outside of the configuration provided by the host application. Then our
"python" host application can read the environment/argv/etc. and convert
it into configuration. Since some embedders also want to do this, we can
provide helper functions to replicate the behaviour.

Does this sound like a balance that would suit your needs? Would you
expect an embedded Python to be isolated by default? Or would you assume
that it's going to pick up configuration from various places and that
you (as an embedder) need to explicitly suppress that?

(Some parts of the stdlib and some 3rd-party libraries use their own
environment variables at runtime. We may not be able to convert all of
those to configuration, but at least they're read lazily, and so Python
code can override them by setting the variables.)

> What about PyImport_FrozenModules?

> FWIW I /might/ be interested in a mechanism to better control importlib
> initialization because PyOxidizer is currently doing dirty things at
> run-time to register the custom 0-copy meta path importer.

Controlling imports early in initialization is one of our broader goals
here, and one that I would particularly like to figure out before we
commit to a new public API. Registering new importers should not have to
be "dirty".

Cheers,
Steve

Steve Dower

unread,
May 16, 2019, 11:35:06 AM5/16/19
to python-dev
On 15May2019 1610, Victor Stinner wrote:
> Thanks to the constructive discussions, I enhanced my PEP 587. I don't
> plan any further change, the PEP is now ready to review (and maybe
> even for pronouncement, hi Thomas! :-)).

My view is that while this is a fantastic PEP and the groundwork that
already exists as private API is excellent, it is too early to commit to
a new public API and we need to do more analysis work. We should not
accept this PEP at this time.

So far, the API being exposed here has not been tested with embedders.
We have very little feedback on whether it meets their needs or would
help them simplify or make their projects more robust. I have concerns
about the number of new functions being added, the new patterns being
proposed, and both forward and backwards compatibility as we inevitably
make changes. (I have discussed all of these in person with Victor,
Nick, and Thomas at PyCon last week, which is why I'm not doing a
point-by-point here.)

As soon as we publish a PEP declaring a new embedding API, users will
assume that it's here to stay. I don't believe that is true, as there is
much more we can and should do to improve embedding. But we don't get to
totally revise the public API on each release without alienating users,
which is why I would rather hold the public API changes until 3.9,
investigate and design them properly. It does our users a disservice to
make major changes like this without due process.

Cheers,
Steve

Steve Dower

unread,
May 16, 2019, 11:46:59 AM5/16/19
to Thomas Wouters, Victor Stinner, Gregory Szorc, python-dev
On 16May2019 0710, Thomas Wouters wrote:
> A couple of things are documented as performing pre-initialisation
> (PyConfig_SetBytesString, PyConfig_SetBytesArgv). I understand why, but
> I feel like that might be confusing and error-prone. Would it not be
> better to have them fail if pre-initialisation hasn't been performed yet?

I agree. Anything other than setting up the struct for
pre-initialization settings doesn't need to work here.

> The dll_path field of PyConfig says "Windows only". Does that meant the
> struct doesn't have that field except in a Windows build? Or is it
> ignored, instead? If it doesn't have that field at all, what #define can
> be used to determine if the PyConfig struct will have it or not?

This field doesn't need to be here. It exists because it was used in
getpathp.c, and Victor's internal refactoring has kept it around through
all the field movement.

If we properly design initialization instead of just refactoring until
it's public, I bet this field will go away.

> "module_search_path_env" sounds like an awkward and somewhat misleading
> name for the translation of PYTHONPATH. Can we not just use, say,
> pythonpath_env? I expect the intended audience to know that PYTHONPATH
> != sys.path.

Again, this doesn't need to be its own configuration field, but because
of the refactoring approach taken here it's flowed out to public API.

A "init config from environment" can load this value and put it into the
"sys.path-equivalent field" in the config.

> The module_search_paths field in PyConfig doesn't mention if it's
> setting or adding to the calculated sys.path. As a whole, the
> path-calculation bits are a bit under-documented. Since this is an
> awkward bit of CPython, it wouldn't hurt to mention what "the default
> path configuration" does (i.e. search for python's home starting at
> program_name, add fixed subdirs to it, etc.)

Again, let's design this part properly instead of exposing what we've
had for years :)

> Regarding Py_RunMain(): does it do the right thing when something calls
> PyErr_Print() with SystemExit set? (I mentioned last week that
> PyErr_Print() will call C's exit() in that case, which is obviously
> terrible for embedders.)

Can we just fix PyErr_Print() to not exit? Surely we only depend on it
in one or two places (sys.excepthook?) and it's almost certainly not
helping anyone else.

> Regarding isolated_mode and the site module, should we make stronger
> guarantees about site.py's behaviour being optional?

Yes, I've been forgetting about this too. There's a lot of configuration
that's split between site.py and initialization, so it's very hard to
understand what will be ready when you leave out site.py. Straightening
this out would help (everyone except virtualenv, probably ;) )

Cheers,
Steve

Victor Stinner

unread,
May 16, 2019, 1:27:09 PM5/16/19
to Thomas Wouters, Gregory Szorc, python-dev
Le jeu. 16 mai 2019 à 16:10, Thomas Wouters <tho...@python.org> a écrit :
>> Would you be ok with a "PyConfig_Init(PyConfig *config);" function
>> which would initialize all fields to theire default values? Maybe
>> PyConfig_INIT should be renamed to PyConfig_STATIC_INIT.
>>
>> You can find a similar API for pthread mutex, there is a init function
>> *and* a macro for static initialization:
>>
>> int pthread_mutex_init(pthread_mutex_t *restrict mutex,
>> const pthread_mutexattr_t *restrict attr);
>>
>> pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
>
>
> This was going to be my suggestion as well: for any non-trivial macro, we should have a function for it instead.

Ok, I will do that.


> I would also point out that PEP 587 has a code example that uses PyWideStringList_INIT, but that macro isn't mention anywhere else.

Oh, I forgot to better document it. Well, the macro is trivial:

#define _PyWstrList_INIT (_PyWstrList){.length = 0, .items = NULL}

For consistency, I prefer to not initialize manually these fields, but
use a macro instead.

(Variables are allocated on the stack and so *must* be initialized.)


> The PEP is a bit unclear as to the semantics of PyWideStringList as a whole: the example uses a static array with length, but doesn't explain what would happen with statically allocated data like that if you call the Append or Extend functions. It also doesn't cover how e.g. argv parsing would remove items from the list. (I would also suggest the PEP shouldn't use the term 'list', at least not unqualified, if it isn't an actual Python list.)

Calling PyWideStringList_Append() or PyWideStringList_Insert() on a
"constant" list will crash: don't do that :-)

I tried to explain the subtle details of "constant" vs "dynamic"
configurations in "Initialization with constant PyConfig" and "Memory
allocations and Py_DecodeLocale()" functions.

A "constant" PyWideStringList must not be used with a "dynamic"
PyConfig: otherwise, PyConfig_Clear() will crash as well.

I would prefer to have separated "const PyWideStringList" and "const
PyConfig" types, but the C language doesn't convert "wchat_*" to
"const wchar_t*" when you do that. We would need duplicated
PyConstantWideStringList and PyConstantConfig structures, which would
require to be "casted" to PyWideStringList and PyConfig internally to
reuse the same code for constant and dynamic configuration.

If you consider that the specific case of "constant configuration"
adds too much burden / complexity, we mgiht remove it and always
require to use dynamic configuration.

Right now, Programs/_testembed.c almost uses only "constant"
configuration. Using dynamic memory would make the code longer: need
to handle memory allocation failures.


> I understand the desire to make static allocation and initialisation possible, but since you only need PyWideStringList for PyConfig, not PyPreConfig (which sets the allocator), perhaps having a PyWideStringList_Init(), which copies memory, and PyWideStringList_Clear() to clear it, would be better?

Do you mean to always require to build dynamic lists? Said
differently, not allow to write something like the following code?

static wchar_t* argv[] = {
L"python3",


L"-c",
L"pass",

L"arg2",
};

_PyCoreConfig config = _PyCoreConfig_INIT;
config.argv.length = Py_ARRAY_LENGTH(argv);
config.argv.items = argv;


>> If the private field "_init_main" of the PEP 587 is set to 0,
>> Py_InitializeFromConfig() stops at the "core" phase (in fact, it's
>> already implemented!). But I didn't implement yet a
>> _Py_InitializeMain() function to "finish" the initialization. Let's
>> say that it exists, we would get:
>>
>> ---
>> PyConfig config = PyConfig_INIT;
>> config._init_main = 0;
>> PyInitError err = Py_InitializeFromConfig(&config);
>> if (Py_INIT_FAILED(err)) {
>> Py_ExitInitError(err);
>> }
>>
>> /* add your code to customize Python here */
>> /* calling PyRun_SimpleString() here is safe */
>>
>> /* finish Python initialization */
>> PyInitError err = _Py_InitializeMain(&config);
>> if (Py_INIT_FAILED(err)) {
>> Py_ExitInitError(err);
>> }
>> ---
>>
>> Would it solve your use case?
>
>
> FWIW, I understand the need here: for Hermetic Python, we solved it by adding a new API similar to PyImport_AppendInittab, but instead registering a generic callback hook to be called *during* the initialisation process: after the base runtime and the import mechanism are initialised (at which point you can create Python objects), but before *any* modules are imported. We use that callback to insert a meta-importer that satisfies all stdlib imports from an embedded archive. (Using a meta-importer allows us to bypass the fileysystem altogether, even for what would otherwise be failed path lookups.)
>
> As I mentioned, Hermetic Python was originally written for Python 2.7, but this approach works fine with a frozen importlib as well. The idea of 'core' and 'main' initialisation will likely work for this, as well.

Well, even if it's not part of the PEP 587, I just implemented it
anyway while fixing a bug:
https://github.com/python/cpython/commit/9ef5dcaa0b3c7c7ba28dbb3ec0c9507d9d05e3a9

Example:

static int test_init_main(void)
{
_PyCoreConfig config = _PyCoreConfig_INIT;
configure_init_main(&config);
config._init_main = 0;

_PyInitError err = _Py_InitializeFromConfig(&config);
if (_Py_INIT_FAILED(err)) {
_Py_ExitInitError(err);
}

/* sys.stdout don't exist yet: it is created by _Py_InitializeMain() */
int res = PyRun_SimpleString(
"import sys; "
"print('Run Python code before _Py_InitializeMain', "
"file=sys.stderr)");
if (res < 0) {
exit(1);
}

err = _Py_InitializeMain();
if (_Py_INIT_FAILED(err)) {
_Py_ExitInitError(err);
}

return _Py_RunMain();
}

As you can see, it's possible execute Python between "core" and "main"
initialization phases. Moreover, I even fixed Python to be able to use
"import sys" before the "main" initialization phase ;-) (Only builtin
and frozen modules are available at this stage.)

Again, I'm not comfortable to make PyConfig._init_main and
_Py_InitializeMain() public, because I consider that they are too
experimental and we don't have enough time to discuss what is the
"core" initialization phase exactly.


> Other questions/comments about PEP 587:
>
> I really like the PyInitError struct. I would like more functions to use it, e.g. the PyRrun_* "very high level" API, which currently calls exit() for you on SystemExit, and returns -1 without any other information on error. For those, I'm not entirely sure 'Init' makes sense in the name... but I can live with it.

PyInitError structure can be renamed PyError, but it should only be
used with functions which can exit Python. In short, are you talking
"The Very High Level Layer" of the C API?
https://docs.python.org/dev/c-api/veryhigh.html

One issue is that I dislike adding new functions to the C API, but it
seems like we should add a few to provide a better API for embedded
Python. libpython must never exit the process! (or only when you
explicity asks that :-))

Note: PyRun_SimpleStringFlags() is a wrapper which makes
PyRun_StringFlags() usage easier. PyRun_StringFlags() doesn't handle
the exception and so let you decide how to handle it.


> A couple of things are documented as performing pre-initialisation (PyConfig_SetBytesString, PyConfig_SetBytesArgv). I understand why, but I feel like that might be confusing and error-prone. Would it not be better to have them fail if pre-initialisation hasn't been performed yet?

It's easier to modify the code to fail with an error if Python is not
pre-initialized.

I propose to implicitly pre-initialize Python to make the API easier
to use. In practice, you rarely have to explicitly pre-initialize
Python. The default PyPreConfig is just fine for almost all use cases,
especially since Nick Coghlan and me decided to disable C locale
coercion and UTF-8 Mode by default. You now have to opt-in to enable
these encoding features.


> The buffered_stdio field of PyConfig mentions stdout and stderr, but not stdin. Does it not affect stdin?

Extract of create_stdio():

/* stdin is always opened in buffered mode, first because it shouldn't
make a difference in common use cases, second because TextIOWrapper
depends on the presence of a read1() method which only exists on
buffered streams.
*/

Note: Unbuffered stdin doesn't magically make the producer on the
other side of a pipe flushing its (stdout/stderr) buffer more
frequently :-)


> (Many of the fields could do with a bit more explicit documentation, to be honest.)

Well, 2 years ago, almost no configuration parameter was documented
:-) I helped to document "Global configuration variables" at:
https://docs.python.org/dev/c-api/init.html#global-configuration-variables

I had to reverse engineer the code to be able to document it :-D

Right now, my reference documentation lives in
Include/cpython/coreconfig.h. Some parameters are better documented
there, than in the PEP. I can try to enhance the documentation in the
PEP.


> The configure_c_stdio field of PyConfig sounds like it might not set sys.stdin/stdout/stderr. That would be new behaviour, but configure_c_stdio doesn't have an existing equivalence, so I'm not sure if that's what you meant or not.

In Python 3.7, only Py_Main() configured C standard streams.

I moved the code into _PyCoreConfig_Write() which is called by
_Py_InitializeFromConfig() and so by Py_Initialize() as well.

My intent is to be able to get the same behavior using Py_Initialize()
+ Py_RunMain(), than using Py_Main().

Said differently, Python 3.8 now always configures C standard streams.
Maybe I should modify the configure_c_stdio default value to 0, and
only enable it by default in Py_Main()?

Honestly, I'm a little bit confused here. I'm not sure what is the
expected behavior. Usually, in case of doubt, I look at the behavior
before my refactoring. The old behaviour was that only Py_Main()
configured C standard streams. Maybe I should restore this behavior.

But to build a customized Python which should behave as the regular
Python, you would like opt-in for configure_c_stdio=1.

Maybe we need a function to set the configuration to get "regular
Python" behavior?

Something like: PyConfig_SetRegularPythonBehavior()? (sorry for the silly name!)


> The dll_path field of PyConfig says "Windows only". Does that meant the struct doesn't have that field except in a Windows build? Or is it ignored, instead? If it doesn't have that field at all, what #define can be used to determine if the PyConfig struct will have it or not?

The field doesn't exist on non-Windows platforms.

I chose to expose it to let the developer chooses where Python looks for DLL.

But Steve just said (in an email below) that there is no reason to
make it configurable. In that case, I will make it internal again. It
seems like I misunderstood the purpose of this parameter.


> It feels a bit weird to have both 'inspect' and 'interactive' in PyConfig. Is there a substantive difference between them? Is this just so you can easily tell if any of run_module / run_command / run_filename are set?

In Python 3.7, there are Py_InspectFlag and Py_InteractiveFlag.

If "interactive" parameter is non-zero, C standard streams are
configured as buffered. It is also used to decide if stdin is
considered as interactive or not:

/* Return non-zero is stdin is a TTY or if -i command line option is used */
static int
stdin_is_interactive(const _PyCoreConfig *config)
{
return (isatty(fileno(stdin)) || config->interactive);
}

The "inspect" parameter is used to decide if we start a REPL or not.

The "-i" command line option sets inspect (Py_InspectFlag) and
interactive (Py_InteractiveFlag) to 1.

These flags are exposed at Python level as sys.flags.inspect and
sys.flags.interactive.

... Honestly, I'm not sure if there is a real difference between these
two flags, but they are exposed and exist for years... so I decided to
keep them.


> "module_search_path_env" sounds like an awkward and somewhat misleading name for the translation of PYTHONPATH. Can we not just use, say, pythonpath_env? I expect the intended audience to know that PYTHONPATH != sys.path.

Sure, I can rename it.


> The module_search_paths field in PyConfig doesn't mention if it's setting or adding to the calculated sys.path. As a whole, the path-calculation bits are a bit under-documented.

Py_InitializeFromConfig() sets sys.path from module_search_paths.

sys.path doesn't exist before Py_InitializeFromConfig() is called.


> Since this is an awkward bit of CPython, it wouldn't hurt to mention what "the default path configuration" does (i.e. search for python's home starting at program_name, add fixed subdirs to it, etc.)

Oh, that's a big task :-) Nobody knows what getpath.c and getpathp.c do :-D


> Path configuration is mentioned as being able to issue warnings, but it doesn't mention *how*. It can't be the warnings module at this stage. I presume it's just printing to stderr.

First, I didn't know, but I just saw that it's only on Unix
(getpath.c). On Windows (getpathp.c), no warning is emitted.

The warning is written into C stderr.

The flag isn't new: it's based on Py_FrozenFlag. When I looked at how
Python is embedded, I was surprised by the number of applications
setting Py_FrozenFlag to 1 to suppress these warnings.


> Regarding Py_RunMain(): does it do the right thing when something calls PyErr_Print() with SystemExit set? (I mentioned last week that PyErr_Print() will call C's exit() in that case, which is obviously terrible for embedders.)

I spent a significant amount of time to ensure that
Py_InitializeFromConfig() and Py_RunMain() don't exit directly, but
return a proper failure or exit code. For example, Python 3.6 contains
around 319 calls to Py_FatalError(). The master branch contains around
181 calls to Py_FatalError(): still a lot, but I converted 138 calls
to _Py_INIT_ERR() ;-)

The work is not complete: I just checked, Py_RunMain() still calls
directly PyErr_Print() at many places. Well, the code can be fixed,
and it's not directly related to the PEP, is it? The issue already
existed in Python 3.7 with Py_Main().


> Regarding isolated_mode and the site module, should we make stronger guarantees about site.py's behaviour being optional? The problem with site is that it does four things that aren't configurable, one of which is usually very desirable, one of which probably doesn't matter to embedders, and two that are iffy: sys.path deduplication and canonicalisation (and fixing up __file__/__cached__ attributes of already-imported modules); adding site-packages directories; looking for and importing sitecustomize.py; executing .pth files. The site module doesn't easily allow doing only some of these. (user-site directories are an exception, as they have their own flag, so I'm not listing that here.) With Hermetic Python we don't care about any of these (for a variety of different reasons), but I'm always a little worried that future Python versions would add behaviour to site that we *do* need.

Honestly, I would prefer to simply remove the site module, I dislike
it because it makes Python startup way slower :-) ... But well, it
does a few important things :-)

About the PEP 587: PyConfig.user_site_directory is exported as
sys.flags.no_user_site (negative value) which is used by the site
module.

I'm not sure if you are asking me to modify my PEP, or if it's more a
general remark. The PEP 587 gives control on how sys.path is
initialized.

In the "Isolate Python" section, I suggest to set the "isolated"
parameter to 1 which imply setting user_site_directory to 0. So
sys.path isn't modified afterwards. What you pass to PyConfig is what
you get in sys.path in this case.

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.

Gregory Szorc

unread,
May 19, 2019, 3:01:27 PM5/19/19
to Victor Stinner, Thomas Wouters, python-dev

It sounds like PyOxidizer and Hermetic Python are on the same page and
we're working towards a more official solution. But I want to make sure
by explicitly stating what PyOxidizer is doing.

Essentially, to facilitate in-memory import, we need to register a
custom sys.meta_path importer *before* any file-based imports are
attempted. In other words, we don't want the PathFinder registered on
sys.meta_path to be used. Since we don't have a clear "hook point" to
run custom code between init_importlib() (which is where the base
importlib system is initialized) and _PyCodecRegistry_Init() (which has
the first import of a .py-backed module - "encodings"), my current hack
in PyOxidizer is to compile a modified version of the
importlib._bootstrap_external module which contains my custom
MemoryImporter. I inject this custom version at run-time by updating the
global frozen modules table and the Python initialization mechanism
executes my custom code when the _frozen_importlib_external module is
imported/executed as part of init_importlib_external().

The CPython API can facilitate making this less hacky by enabling
embedders to run custom code between importlib initialization but before
any path-based modules are imported. I /think/ providing a 2-phase
initialization that stops between _Py_InitializeCore() and
_Py_InitializeMainInterpreter() would get the job done for PyOxidizer
today. But it may be a bit more complicated than that for others (or
possibly me in the future) since importlib is initialized in both these
phases. The "external" importlib bits providing the path-based importer
are initialized later during _Py_InitializeMainInterpreter. It's quite
possible we would want to muck around with the state of external
importers in our initialization "hook" and if they aren't loaded yet...
Furthermore, new_interpreter() makes both importlib initialization
function calls at the same time. We would need to inject a custom meta
path importer for sub-interpreters, so we would need a code injection
point in new_interpreter() as well. Installing a custom
importlib._bootstrap_external/_frozen_importlib_external module and
globally changing the code that runs during importlib init gets the job.
But it is hacky.

FWIW my notes on how this all works (in Python 3.7) are at
https://github.com/indygreg/PyOxidizer/blob/57c823d6ca2321d12067bf36603a9a0ad2320c75/docs/technotes.rst,
https://github.com/indygreg/PyOxidizer/blob/57c823d6ca2321d12067bf36603a9a0ad2320c75/README.rst#how-it-works,
and
https://gregoryszorc.com/blog/2018/12/18/distributing-standalone-python-applications/.

Again, it sounds like we're working towards a robust solution. I just
wanted to brain dump to make sure we are.

Regarding site.py, I agree it is problematic for embedding scenarios.
Some features of site.py can be useful. Others aren't. It would be
useful to have more granular control over which bits of site.run are
run. My naive suggestion would be to add individual flags to control
which functions site.py:main() runs. That way embedders can cherry-pick
site.py features without having to manually import the module and call
functions within. That feels much more robust for long-term maintainability.

While I was reading replies on this thread, a few other points regarding
embedding crept to my mind. I apologize in advance for the feature creep...

Regarding Python calling exit(), this is problematic for embedding
scenarios. This thread called attention to exit() during interpreter
initialization. But it is also a problem elsewhere. For example,
PyErr_PrintEx() will call Py_Exit() if the exception is a SystemExit.
There's definitely room to improve the exception handling mechanism to
give embedders better control when SystemExit is raised. As it stands,
we need to check for SystemExit manually and reimplement
_Py_HandleSystemExit() to emulate its behavior for e.g. exception value
handling (fun fact: you can pass non-None, non-integer values to
sys.exit/SystemExit).

Regarding builtin and frozen modules, they both are backed by global
arrays (PyImport_Inittab and PyImport_FrozenModules, respectively). In
my initial reply I mentioned that this seemed problematic because I
think a goal of initialization configuration overhaul should be to
remove dependence on global variables. In addition to that concern, I'd
like to point out that the way things work today is the BuiltinImporter
and FrozenImporter meta path importers test for module availability by
iterating these global arrays and calling
_PyUnicode_EqualToASCIIString() on each member. This is done via
import.c:is_builtin() and import.c:find_frozen(), respectively (both C
functions are exported to Python and called as part of their respective
find_spec() implementations). BuiltinImporter and FrozenImporter are
always registered as the first two sys.meta_path importers.

This behavior means every "import" results in an array walk + string
compare over these two arrays. Yes, this is done from C and should be
reasonably fast. But it feels inefficient to me. Slow import performance
is a common complaint against Python and anything that can be done to
minimize overhead could be useful.

Having a more efficient member lookup for BuiltinImporter and
FrozenImporter might shave off a millisecond or two from startup. This
would require some kind of derived data structure. Unfortunately, as
long as there is a global data structure that can be mutated any time
(the API contract doesn't prohibit modifying these global arrays after
initialization), you would need to check for "cache invalidation" on
every lookup, undermining performance benefits. But if the interpreter
config contained references to the builtin and frozen module arrays and
refused to allow them to be modified after initialization,
initialization could build a derived data structure used for efficient
lookups and all the problems go away!

What I'm trying to say is that moving the builtin and frozen modules
arrays to the initialization config data structure is not only the right
thing to do from an architectural perspective, but it can also open the
door to optimizing import performance.

FWIW PyOxidizer uses a Rust HashMap to index modules data, so testing
for module presence is O(1)~. It's on my laundry list of TODOs to create
an UberImporter that indexes every known module at startup and proxies
to BuiltinImporter and FrozenImporter as necessary. The cost to
satisfying a find_spec() or similar MetaPathFinder interface request
would then be a Rust HashMap lookup, avoiding the O(n) traversal of
sys.meta_path entries (which is Python heavy and relatively slow
compared to compiled code) and further avoiding the O(n) lookups in
BuiltinImporter and FrozenImporter. I don't expect this to yield the
performance wins that doing away with filesystem-based module importing
did (importing the entire standard library completed in ~70% of the
time). But startup overhead is a problem and every little improvement
can help. Perhaps there is room to add importlib APIs to facilitate
indexing. Then MetaPathFinders which are immutable could contribute to a
global lookup dict, facilitating much faster import operations. It
wouldn't work for PathFinder. But it opens some interesting possibilities...

Victor Stinner

unread,
May 20, 2019, 7:11:30 AM5/20/19
to Gregory Szorc, Thomas Wouters, python-dev
Hi Gregory,

IMHO your remarks are not directly related to the PEP 587 and can be
addressed in parallel.

> It sounds like PyOxidizer and Hermetic Python are on the same page and
> we're working towards a more official solution. But I want to make sure
> by explicitly stating what PyOxidizer is doing.
>
> Essentially, to facilitate in-memory import, we need to register a
> custom sys.meta_path importer *before* any file-based imports are
> attempted. (...)
> I /think/ providing a 2-phase
> initialization that stops between _Py_InitializeCore() and
> _Py_InitializeMainInterpreter() would get the job done for PyOxidizer
> today. (...)

Extract of PEP 587: "This extracts a subset of the API design from the
PEP 432 development and refactoring work that is now considered
sufficiently stable to make public (allowing 3rd party embedding
applications access to the same configuration APIs that the native
CPython CLI is now using)."

We know that my PEP 587 is incomplete, but the work will continue in
Python 3.9 to support your use case.

The PEP 587 introduces an experimental separation between "core" and
"main" initialization phases. PyConfig._init_main=0 stops at the
"core" phase, then you are free to run C and Python,
_Py_InitializeMain() finishes the Python initialization ("main"
phase).


> > In the "Isolate Python" section, I suggest to set the "isolated"
> > parameter to 1 which imply setting user_site_directory to 0. So
> > sys.path isn't modified afterwards. What you pass to PyConfig is what
> > you get in sys.path in this case.
>
> Regarding site.py, I agree it is problematic for embedding scenarios.
> Some features of site.py can be useful. Others aren't. It would be
> useful to have more granular control over which bits of site.run are
> run. My naive suggestion would be to add individual flags to control
> which functions site.py:main() runs. That way embedders can cherry-pick
> site.py features without having to manually import the module and call
> functions within. That feels much more robust for long-term maintainability.

I agree that more work can be done on the site module. IMHO core
features which are needed by everybody should be done before calling
site. Maybe using a frozen "presite" module or whatever. I would be
interested to make possible to use Python for most cases without the
site module.


> Regarding Python calling exit(), this is problematic for embedding
> scenarios.

I am working on that. I fixed dozens of functions. For example,
Py_RunMain() should not longer exit if there is an uncaught SystemExit
when calling PyErr_Print(). SystemExit is now handled separately
before calling PyErr_Print(). The work is not done, but it should be
way better than Python 3.6 and 3.7 state.


> This thread called attention to exit() during interpreter
> initialization. But it is also a problem elsewhere. For example,
> PyErr_PrintEx() will call Py_Exit() if the exception is a SystemExit.
> There's definitely room to improve the exception handling mechanism to
> give embedders better control when SystemExit is raised. As it stands,
> we need to check for SystemExit manually and reimplement
> _Py_HandleSystemExit() to emulate its behavior for e.g. exception value
> handling (fun fact: you can pass non-None, non-integer values to
> sys.exit/SystemExit).

I don't know well these functions, maybe new functions are needed. It
can be done without/outside the PEP 587.


> Having a more efficient member lookup for BuiltinImporter and
> FrozenImporter might shave off a millisecond or two from startup. This
> would require some kind of derived data structure. (...)

I don't think that the structures/API to define frozen/builtin modules
has to change. We can convert these lists into an hash table during
the initialization of the importlib module.

I'm not saying that the current API is perfect, just that IMHO it can
be solved without the API.


> Unfortunately, as long as there is a global data structure that can be mutated any time
> (the API contract doesn't prohibit modifying these global arrays after
> initialization), you would need to check for "cache invalidation" on
> every lookup, undermining performance benefits.

Do you really expect an application modifying these lists dynamically?

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.

Gregory Szorc

unread,
May 31, 2019, 11:55:46 PM5/31/19
to Victor Stinner, Thomas Wouters, python-dev
At this time, not really. But the mark of good API design IMO is that
its flexibility empowers new and novel ideas and ways of doing things.
Bad APIs (including the use of global variables) inhibit flexibility and
constrain creativity and advanced usage.

For this particular item, I could see some potential uses in processes
hosting multiple, independent interpreters. Maybe you want to give each
interpreter its own set of modules. That's not something you see today
because of the GIL and all the other global state in CPython. But with
the GIL's days apparently being numbered, who knows what the future holds.

I would highly encourage the official API surface to do away with
globals completely and for the internals to only use globals in ways
that impose minimal restrictions/caveats on usage/behavior. This PEP
along with others are huge steps in the right direction.
Reply all
Reply to author
Forward
0 new messages