Aha, interesting :-)
> As part
> of hacking on PyOxidizer, I will admit to grumbling about the current
> state of the configuration and initialization mechanisms. The reliance
> on global variables and the haphazard way in which you must call certain
> functions before others was definitely a bit frustrating to deal with.
Yeah, that's what I tried to explain in the PEP 587 Rationale.
> My most important piece of feedback is: thank you for tackling this!
> Your work to shore up the inner workings of interpreter state and
> management is a big deal on multiple dimensions. I send my sincere
> gratitude.
You're welcome ;-)
> PyPreConfig_INIT and PyConfig_INIT as macros that return a struct feel
> weird to me. Specifically, the `PyPreConfig preconfig =
> PyPreConfig_INIT;` pattern doesn't feel right. I'm sort of OK with these
> being implemented as macros. But I think they should look like function
> calls so the door is open to converting them to function calls in the
> future.
Ah yes, I noticed that some projects can only import symbols, not use
directly the C API. You're right that such macro can be an issue.
Would you be ok with a "PyConfig_Init(PyConfig *config);" function
which would initialize all fields to theire default values? Maybe
PyConfig_INIT should be renamed to PyConfig_STATIC_INIT.
You can find a similar API for pthread mutex, there is a init function
*and* a macro for static initialization:
int pthread_mutex_init(pthread_mutex_t *restrict mutex,
const pthread_mutexattr_t *restrict attr);
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
> PyPreConfig.allocator being a char* seems a bit weird. Does this imply
> having to use strcmp() to determine which allocator to use? Perhaps the
> allocator setting should be an int mapping to a constant instead?
Yes, _PyMem_SetupAllocators() uses strcmp(). There are 6 supported values:
* "default"
* "debug"
* "pymalloc"
* "pymalloc_debug"
* "malloc"
* "malloc_debug"
Note: pymalloc and pymalloc_debug are not supported if Python is
explicitly configure using --without-pymalloc.
I think that I chose to use string because the feature was first
implemented using an environment variable.
Actually, I *like* the idea of avoiding string in PyPreConfig because
a string might need memory allocation, whereas the pre-initialization
is supposed to configure memory allocation :-) I will change the type
to an enum.
> Relatedly, how are custom allocators registered? e.g. from Rust, I want
> to use Rust's allocator. How would I do that in this API? Do I still
> need to call PyMem_SetAllocator()?
By default, PyPreConfig.allocator is set to NULL. In that case,
_PyPreConfig_Write() leaves the memory allocator unmodified.
As PyImport_AppendInittab() and PyImport_ExtendInittab(),
PyMem_SetAllocator() remains relevant and continue to work as
previously.
Example to set your custom allocator:
---
PyInitError err = Py_PreInitialize(NULL);
if (Py_INIT_FAILED(err)) {
Py_ExitInitError(err);
}
PyMem_SetAllocator(PYMEM_DOMAIN_MEM, my_cool_allocator);
---
Well, it also works in the opposite order, but I prefer to call
PyMem_SetAllocator() after the pre-initialization to make it more
explicit :-)
---
PyMem_SetAllocator(PYMEM_DOMAIN_MEM, my_cool_allocator);
PyInitError err = Py_PreInitialize(NULL);
if (Py_INIT_FAILED(err)) {
Py_ExitInitError(err);
}
---
> I thought a point of this proposal
> was to consolidate per-interpreter config settings?
Right. But PyMem_SetAllocator() uses PyMemAllocatorDomain enum and
PyMemAllocatorEx structure which are not really "future-proof". For
example, I already replaced PyMemAllocator with PyMemAllocatorEx to
add "calloc". We might extend it later one more time to add allocator
with a specific memory alignement (even if the issue is now closed):
https://bugs.python.org/issue18835
I consider that PyMem_SetAllocator() is too specific to be added to PyPreConfig.
Are you fine with that?
> I'm a little confused about the pre-initialization functions that take
> command arguments. Is this intended to only be used for parsing the
> arguments that `python` recognizes? Presumably a custom application
> embedding Python would never use these APIs unless it wants to emulate
> the behavior of `python`? (I suppose this can be clarified in the API
> docs once this is implemented.)
Yes, Py_PreInitializeFromArgs() parses -E, -I, -X dev and -X utf8 options:
https://www.python.org/dev/peps/pep-0587/#command-line-arguments
Extract of my "Isolate Python" section:
"The default configuration is designed to behave as a regular Python.
To embed Python into an application, it's possible to tune the
configuration to better isolated the embedded Python from the system:
(...)"
https://www.python.org/dev/peps/pep-0587/#isolate-python
I wasn't sure if I should mention parse_argv=0 in this section or not.
According to what you wrote, I should :-)
*Maybe* rather than documenting how to isolate Python, we might even
provide a function for that?
void PyConfig_Isolate(PyConfig *config)
{ config->isolated = 1; config->parse_argv = 0; }
I didn't propose that because so far, I'm not sure that everybody has
the same opinion on what "isolation" means. Does it only mean ignore
environment variables? Or also ignore configuration files? What about
the path configuration?
That's why I propose to start without such opiniated
PyConfig_Isolate() function :-)
> What about PyImport_FrozenModules? This is a global variable related to
> Python initialization (it contains _frozen_importlib and
> _frozen_importlib_external) but it is not accounted for in the PEP.
> I rely on this struct in PyOxidizer to replace the importlib modules with
> custom versions so we can do 0-copy in-memory import of Python bytecode
> for the entirety of the standard library. Should the PyConfig have a
> reference to the _frozen[] to use? Should the _frozen struct be made
> part of the public API?
First of all, PEP 587 is designed to be easily extendable :-) I added
_config_version field to even provide backward ABI compatibility.
Honestly, I never looked at PyImport_FrozenModules. It seems to fall
into the same category than "importtab": kind of corner case use case
which cannot be easily generalized into PyConfig structure.
As I would say the same that what I wrote about
PyImport_AppendInittab(): PyImport_FrozenModules symbol remains
relevant and continue to work as expected. I understand that it must
be set before the initialization, and it seems safe to set it even
before the pre-initialization since it's a static array.
Note: I renamed PyConfig._frozen to PyConfig.pathconfig_warnings: it's
an int and it's unrelated to PyImport_FrozenModules.
> I rely on this struct in PyOxidizer to replace the importlib modules with
> custom versions so we can do 0-copy in-memory import of Python bytecode
> for the entirety of the standard library.
Wait, that sounds like a cool feature! Would it make sense to make
this feature upstream? If yes, maybe send a separated email to
python-dev and/or open an issue.
> The PEP mentions a private PyConfig._install_importlib member. I'm
> curious what this is because it may be relevant to PyOxidizer. FWIW I
> /might/ be interested in a mechanism to better control importlib
> initialization because PyOxidizer is currently doing dirty things at
> run-time to register the custom 0-copy meta path importer. I /think/ my
> desired API would be a mechanism to control the name(s) of the frozen
> module(s) to use to bootstrap importlib. Or there would be a way to
> register the names of additional frozen modules to import and run as
> part of initializing importlib (before any .py-based stdlib modules are
> imported). Then PyOxidizer wouldn't need to hack up the source code to
> importlib, compile custom bytecode, and inject it via
> PyImport_FrozenModules. I concede this may be out of scope for the PEP.
> But if the API is being reworked, I'd certainly welcome making it easier
> for tools like PyOxidizer to work their crazy module importing magic :)
PEP 587 is an incomplete implementation of the PEP 432. We are
discussing with Nick Coghlan, Steve Dower and some others about having
2 phases for the Python initialization: "core" and "main". The "core"
phase would provide a bare minimum working Python: builtin exceptions
and types, maybe builtin imports, and that's basically all. It would
allow to configure Python using the newly created interpreter, for
example configure Python by running Python code.
The problem is that these 2 phases are not well defined yet, it's
still under discussion. Nick and me agreed to start with PEP 587 as a
first milestone, and see later how to implement "core" and "main"
phases.
If the private field "_init_main" of the PEP 587 is set to 0,
Py_InitializeFromConfig() stops at the "core" phase (in fact, it's
already implemented!). But I didn't implement yet a
_Py_InitializeMain() function to "finish" the initialization. Let's
say that it exists, we would get:
---
PyConfig config = PyConfig_INIT;
config._init_main = 0;
PyInitError err = Py_InitializeFromConfig(&config);
if (Py_INIT_FAILED(err)) {
Py_ExitInitError(err);
}
/* add your code to customize Python here */
/* calling PyRun_SimpleString() here is safe */
/* finish Python initialization */
PyInitError err = _Py_InitializeMain(&config);
if (Py_INIT_FAILED(err)) {
Py_ExitInitError(err);
}
---
Would it solve your use case?
Sorry, I didn't understand properly what you mean by "controlling the
names of the frozen modules to use to bootstrap importlib".
> I really like the new Py_RunMain() API and associated PyConfig members.
> I also invented this wheel in PyOxidizer and the new API should result
> in me deleting some code that I wish I didn't have to write in the first
> place :)
Great!
> I invented a data structure for representing a Python interpreter
> configuration. And the similarities to PyConfig are striking. I think
> that's a good sign :)
He he :-)
> It might be useful to read through that file -
> especially the init function (line with `pub fn init`) to see if
> anything I'm doing pushes the boundaries of the proposed API. Feel free
> to file GitHub issues if you see obvious bugs with PyOxidizer's Python
> initialization logic while you're at it :)
Your link didn't work, but I found:
https://github.com/indygreg/PyOxidizer/blob/master/pyoxidizer/src/pyembed/pyinterp.rs
"write_modules_directory_env" seems very specific to your needs. Apart
of that, I confirm that PythonConfig is very close to PEP 587
PyConfig! I notice that you also avoided double negation, thanks ;-)
/* Pre-initialization functions we could support:
*
* PyObject_SetArenaAllocator()
* PySys_AddWarnOption()
* PySys_AddXOption()
* PySys_ResetWarnOptions()
*/
Apart PyObject_SetArenaAllocator(), PyConfig implements the 3 other functions.
Again, ss PyMem_SetAllocator(), PyObject_SetArenaAllocator() remains
relevant and can be used with the pre-initialization.
PySys_SetObject("argv", obj) is covered by PyConfig.argv.
PySys_SetObject("argvb", obj): I'm not sure why you are doing that,
it's easy to retrieve sys.argv as bytes, it's now even documented:
https://docs.python.org/dev/library/sys.html#sys.argv
---
Sorry, I'm not an importlib expert. I'm not sure what could be done in
PEP 587 for your specific importlib changes.
> Also, one thing that tripped me up a few times when writing PyOxidizer
> was managing the lifetimes of memory that various global variables point
> to. The short version is I was setting Python globals to point to memory
> allocated by Rust and I managed to crash Python by freeing memory before
> it should have been. Since the new API seems to preserve support for
> global variables, I'm curious if there are changes to how memory must be
> managed. It would be really nice to get to a state where you only need
> to ensure the PyConfig instance and all its referenced memory only needs
> to outlive the interpreter it configures. That would make the memory
> lifetimes story intuitive and easily compatible with Rust.
For the specific case of PyConfig, you have to call
PyConfig_Clear(config) after you called Py_InitializeFromConfig().
Python keeps a copy of your configuration (and it completes the
missing fields, if needed).
I modified a lot of functions to ensure that Python cleanups more
globals at exit in Py_Finalize() and at the end of Py_Main() /
Py_RunMain().
I'm not sure if it replies to your question. If you want a more
specific, can you please give more concrete examples of globals?
There is also an on-going refactoring to move globals into
_PyRuntimeState and PyInterpreterState: change needed to support
subinterpreters, see Eric Snow's PEP 554.
> One feature that I think is missing from the proposal (and this is
> related to the previous paragraph) is the ability to prevent config
> fallback to things that aren't PyConfig and PyPreConfig. There is
> `PyConfig.parse_argv` to disable command argument parsing and
> `PyConfig.use_environment` to disable environment variable fallback. But
> AFAICT there is no option to disable configuration file fallback nor
> global variable fallback.
If you embed Python, you control global configuration variables, no? I
chose to design PyConfig to inherit global configuration variables
because it allows to support both ways to configure Python using a
single implementation.
Would you prefer an explicit PyConfig_SetDefaults(config) which would
completely ignore global configuration variables?
See Lib/test/test_embed.py unit tests which uses Programs/_testembed.c:
https://github.com/python/cpython/blob/master/Programs/_testembed.c
python._pth (Windows only), pybuilddir.txt (Unix only) and pyvenv.cfg
configuration files are only used by the function building the "Path
Configuration".
Using PEP 587, you can now completely ignore this function:
https://www.python.org/dev/peps/pep-0587/#path-configuration
> Again, this proposal is terrific overall and so much better than what we
> have today. The wall of text I just wrote is disproportionate in size to
> the quality of the PEP. I almost feel bad writing so much feedback for
> such a terrific PEP ;)
>
> Excellent work, Victor. I can't wait to see these changes materialize!
Thanks :-)
Thanks for your very interesting feedback. It's really helpful to see
how the API is used "for real" :-)
Victor
(Le jeu. 16 mai 2019 à 06:34, Gregory Szorc <gregor...@gmail.com> a écrit :
> > I know that the PEP is long, but well, it's a complex topic, and I
> > chose to add many examples to make the API easier to understand.
>
> I saw your request for feedback on Twitter a few days back and found
> this thread.
>
> This PEP is of interest to me because I'm the maintainer of PyOxidizer -
> a project for creating single file executables embedding Python.
Aha, interesting :-)
> PyPreConfig_INIT and PyConfig_INIT as macros that return a struct feel
> weird to me. Specifically, the `PyPreConfig preconfig =
> PyPreConfig_INIT;` pattern doesn't feel right. I'm sort of OK with these
> being implemented as macros. But I think they should look like function
> calls so the door is open to converting them to function calls in the
> future.
Ah yes, I noticed that some projects can only import symbols, not use
directly the C API. You're right that such macro can be an issue.
Would you be ok with a "PyConfig_Init(PyConfig *config);" function
which would initialize all fields to theire default values? Maybe
PyConfig_INIT should be renamed to PyConfig_STATIC_INIT.
You can find a similar API for pthread mutex, there is a init function
*and* a macro for static initialization:
int pthread_mutex_init(pthread_mutex_t *restrict mutex,
const pthread_mutexattr_t *restrict attr);
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
I agree. Anything other than setting up the struct for
pre-initialization settings doesn't need to work here.
> The dll_path field of PyConfig says "Windows only". Does that meant the
> struct doesn't have that field except in a Windows build? Or is it
> ignored, instead? If it doesn't have that field at all, what #define can
> be used to determine if the PyConfig struct will have it or not?
This field doesn't need to be here. It exists because it was used in
getpathp.c, and Victor's internal refactoring has kept it around through
all the field movement.
If we properly design initialization instead of just refactoring until
it's public, I bet this field will go away.
> "module_search_path_env" sounds like an awkward and somewhat misleading
> name for the translation of PYTHONPATH. Can we not just use, say,
> pythonpath_env? I expect the intended audience to know that PYTHONPATH
> != sys.path.
Again, this doesn't need to be its own configuration field, but because
of the refactoring approach taken here it's flowed out to public API.
A "init config from environment" can load this value and put it into the
"sys.path-equivalent field" in the config.
> The module_search_paths field in PyConfig doesn't mention if it's
> setting or adding to the calculated sys.path. As a whole, the
> path-calculation bits are a bit under-documented. Since this is an
> awkward bit of CPython, it wouldn't hurt to mention what "the default
> path configuration" does (i.e. search for python's home starting at
> program_name, add fixed subdirs to it, etc.)
Again, let's design this part properly instead of exposing what we've
had for years :)
> Regarding Py_RunMain(): does it do the right thing when something calls
> PyErr_Print() with SystemExit set? (I mentioned last week that
> PyErr_Print() will call C's exit() in that case, which is obviously
> terrible for embedders.)
Can we just fix PyErr_Print() to not exit? Surely we only depend on it
in one or two places (sys.excepthook?) and it's almost certainly not
helping anyone else.
> Regarding isolated_mode and the site module, should we make stronger
> guarantees about site.py's behaviour being optional?
Yes, I've been forgetting about this too. There's a lot of configuration
that's split between site.py and initialization, so it's very hard to
understand what will be ready when you leave out site.py. Straightening
this out would help (everyone except virtualenv, probably ;) )
Cheers,
Steve
Ok, I will do that.
> I would also point out that PEP 587 has a code example that uses PyWideStringList_INIT, but that macro isn't mention anywhere else.
Oh, I forgot to better document it. Well, the macro is trivial:
#define _PyWstrList_INIT (_PyWstrList){.length = 0, .items = NULL}
For consistency, I prefer to not initialize manually these fields, but
use a macro instead.
(Variables are allocated on the stack and so *must* be initialized.)
> The PEP is a bit unclear as to the semantics of PyWideStringList as a whole: the example uses a static array with length, but doesn't explain what would happen with statically allocated data like that if you call the Append or Extend functions. It also doesn't cover how e.g. argv parsing would remove items from the list. (I would also suggest the PEP shouldn't use the term 'list', at least not unqualified, if it isn't an actual Python list.)
Calling PyWideStringList_Append() or PyWideStringList_Insert() on a
"constant" list will crash: don't do that :-)
I tried to explain the subtle details of "constant" vs "dynamic"
configurations in "Initialization with constant PyConfig" and "Memory
allocations and Py_DecodeLocale()" functions.
A "constant" PyWideStringList must not be used with a "dynamic"
PyConfig: otherwise, PyConfig_Clear() will crash as well.
I would prefer to have separated "const PyWideStringList" and "const
PyConfig" types, but the C language doesn't convert "wchat_*" to
"const wchar_t*" when you do that. We would need duplicated
PyConstantWideStringList and PyConstantConfig structures, which would
require to be "casted" to PyWideStringList and PyConfig internally to
reuse the same code for constant and dynamic configuration.
If you consider that the specific case of "constant configuration"
adds too much burden / complexity, we mgiht remove it and always
require to use dynamic configuration.
Right now, Programs/_testembed.c almost uses only "constant"
configuration. Using dynamic memory would make the code longer: need
to handle memory allocation failures.
> I understand the desire to make static allocation and initialisation possible, but since you only need PyWideStringList for PyConfig, not PyPreConfig (which sets the allocator), perhaps having a PyWideStringList_Init(), which copies memory, and PyWideStringList_Clear() to clear it, would be better?
Do you mean to always require to build dynamic lists? Said
differently, not allow to write something like the following code?
static wchar_t* argv[] = {
L"python3",
L"-c",
L"pass",
L"arg2",
};
_PyCoreConfig config = _PyCoreConfig_INIT;
config.argv.length = Py_ARRAY_LENGTH(argv);
config.argv.items = argv;
>> If the private field "_init_main" of the PEP 587 is set to 0,
>> Py_InitializeFromConfig() stops at the "core" phase (in fact, it's
>> already implemented!). But I didn't implement yet a
>> _Py_InitializeMain() function to "finish" the initialization. Let's
>> say that it exists, we would get:
>>
>> ---
>> PyConfig config = PyConfig_INIT;
>> config._init_main = 0;
>> PyInitError err = Py_InitializeFromConfig(&config);
>> if (Py_INIT_FAILED(err)) {
>> Py_ExitInitError(err);
>> }
>>
>> /* add your code to customize Python here */
>> /* calling PyRun_SimpleString() here is safe */
>>
>> /* finish Python initialization */
>> PyInitError err = _Py_InitializeMain(&config);
>> if (Py_INIT_FAILED(err)) {
>> Py_ExitInitError(err);
>> }
>> ---
>>
>> Would it solve your use case?
>
>
> FWIW, I understand the need here: for Hermetic Python, we solved it by adding a new API similar to PyImport_AppendInittab, but instead registering a generic callback hook to be called *during* the initialisation process: after the base runtime and the import mechanism are initialised (at which point you can create Python objects), but before *any* modules are imported. We use that callback to insert a meta-importer that satisfies all stdlib imports from an embedded archive. (Using a meta-importer allows us to bypass the fileysystem altogether, even for what would otherwise be failed path lookups.)
>
> As I mentioned, Hermetic Python was originally written for Python 2.7, but this approach works fine with a frozen importlib as well. The idea of 'core' and 'main' initialisation will likely work for this, as well.
Well, even if it's not part of the PEP 587, I just implemented it
anyway while fixing a bug:
https://github.com/python/cpython/commit/9ef5dcaa0b3c7c7ba28dbb3ec0c9507d9d05e3a9
Example:
static int test_init_main(void)
{
_PyCoreConfig config = _PyCoreConfig_INIT;
configure_init_main(&config);
config._init_main = 0;
_PyInitError err = _Py_InitializeFromConfig(&config);
if (_Py_INIT_FAILED(err)) {
_Py_ExitInitError(err);
}
/* sys.stdout don't exist yet: it is created by _Py_InitializeMain() */
int res = PyRun_SimpleString(
"import sys; "
"print('Run Python code before _Py_InitializeMain', "
"file=sys.stderr)");
if (res < 0) {
exit(1);
}
err = _Py_InitializeMain();
if (_Py_INIT_FAILED(err)) {
_Py_ExitInitError(err);
}
return _Py_RunMain();
}
As you can see, it's possible execute Python between "core" and "main"
initialization phases. Moreover, I even fixed Python to be able to use
"import sys" before the "main" initialization phase ;-) (Only builtin
and frozen modules are available at this stage.)
Again, I'm not comfortable to make PyConfig._init_main and
_Py_InitializeMain() public, because I consider that they are too
experimental and we don't have enough time to discuss what is the
"core" initialization phase exactly.
> Other questions/comments about PEP 587:
>
> I really like the PyInitError struct. I would like more functions to use it, e.g. the PyRrun_* "very high level" API, which currently calls exit() for you on SystemExit, and returns -1 without any other information on error. For those, I'm not entirely sure 'Init' makes sense in the name... but I can live with it.
PyInitError structure can be renamed PyError, but it should only be
used with functions which can exit Python. In short, are you talking
"The Very High Level Layer" of the C API?
https://docs.python.org/dev/c-api/veryhigh.html
One issue is that I dislike adding new functions to the C API, but it
seems like we should add a few to provide a better API for embedded
Python. libpython must never exit the process! (or only when you
explicity asks that :-))
Note: PyRun_SimpleStringFlags() is a wrapper which makes
PyRun_StringFlags() usage easier. PyRun_StringFlags() doesn't handle
the exception and so let you decide how to handle it.
> A couple of things are documented as performing pre-initialisation (PyConfig_SetBytesString, PyConfig_SetBytesArgv). I understand why, but I feel like that might be confusing and error-prone. Would it not be better to have them fail if pre-initialisation hasn't been performed yet?
It's easier to modify the code to fail with an error if Python is not
pre-initialized.
I propose to implicitly pre-initialize Python to make the API easier
to use. In practice, you rarely have to explicitly pre-initialize
Python. The default PyPreConfig is just fine for almost all use cases,
especially since Nick Coghlan and me decided to disable C locale
coercion and UTF-8 Mode by default. You now have to opt-in to enable
these encoding features.
> The buffered_stdio field of PyConfig mentions stdout and stderr, but not stdin. Does it not affect stdin?
Extract of create_stdio():
/* stdin is always opened in buffered mode, first because it shouldn't
make a difference in common use cases, second because TextIOWrapper
depends on the presence of a read1() method which only exists on
buffered streams.
*/
Note: Unbuffered stdin doesn't magically make the producer on the
other side of a pipe flushing its (stdout/stderr) buffer more
frequently :-)
> (Many of the fields could do with a bit more explicit documentation, to be honest.)
Well, 2 years ago, almost no configuration parameter was documented
:-) I helped to document "Global configuration variables" at:
https://docs.python.org/dev/c-api/init.html#global-configuration-variables
I had to reverse engineer the code to be able to document it :-D
Right now, my reference documentation lives in
Include/cpython/coreconfig.h. Some parameters are better documented
there, than in the PEP. I can try to enhance the documentation in the
PEP.
> The configure_c_stdio field of PyConfig sounds like it might not set sys.stdin/stdout/stderr. That would be new behaviour, but configure_c_stdio doesn't have an existing equivalence, so I'm not sure if that's what you meant or not.
In Python 3.7, only Py_Main() configured C standard streams.
I moved the code into _PyCoreConfig_Write() which is called by
_Py_InitializeFromConfig() and so by Py_Initialize() as well.
My intent is to be able to get the same behavior using Py_Initialize()
+ Py_RunMain(), than using Py_Main().
Said differently, Python 3.8 now always configures C standard streams.
Maybe I should modify the configure_c_stdio default value to 0, and
only enable it by default in Py_Main()?
Honestly, I'm a little bit confused here. I'm not sure what is the
expected behavior. Usually, in case of doubt, I look at the behavior
before my refactoring. The old behaviour was that only Py_Main()
configured C standard streams. Maybe I should restore this behavior.
But to build a customized Python which should behave as the regular
Python, you would like opt-in for configure_c_stdio=1.
Maybe we need a function to set the configuration to get "regular
Python" behavior?
Something like: PyConfig_SetRegularPythonBehavior()? (sorry for the silly name!)
> The dll_path field of PyConfig says "Windows only". Does that meant the struct doesn't have that field except in a Windows build? Or is it ignored, instead? If it doesn't have that field at all, what #define can be used to determine if the PyConfig struct will have it or not?
The field doesn't exist on non-Windows platforms.
I chose to expose it to let the developer chooses where Python looks for DLL.
But Steve just said (in an email below) that there is no reason to
make it configurable. In that case, I will make it internal again. It
seems like I misunderstood the purpose of this parameter.
> It feels a bit weird to have both 'inspect' and 'interactive' in PyConfig. Is there a substantive difference between them? Is this just so you can easily tell if any of run_module / run_command / run_filename are set?
In Python 3.7, there are Py_InspectFlag and Py_InteractiveFlag.
If "interactive" parameter is non-zero, C standard streams are
configured as buffered. It is also used to decide if stdin is
considered as interactive or not:
/* Return non-zero is stdin is a TTY or if -i command line option is used */
static int
stdin_is_interactive(const _PyCoreConfig *config)
{
return (isatty(fileno(stdin)) || config->interactive);
}
The "inspect" parameter is used to decide if we start a REPL or not.
The "-i" command line option sets inspect (Py_InspectFlag) and
interactive (Py_InteractiveFlag) to 1.
These flags are exposed at Python level as sys.flags.inspect and
sys.flags.interactive.
... Honestly, I'm not sure if there is a real difference between these
two flags, but they are exposed and exist for years... so I decided to
keep them.
> "module_search_path_env" sounds like an awkward and somewhat misleading name for the translation of PYTHONPATH. Can we not just use, say, pythonpath_env? I expect the intended audience to know that PYTHONPATH != sys.path.
Sure, I can rename it.
> The module_search_paths field in PyConfig doesn't mention if it's setting or adding to the calculated sys.path. As a whole, the path-calculation bits are a bit under-documented.
Py_InitializeFromConfig() sets sys.path from module_search_paths.
sys.path doesn't exist before Py_InitializeFromConfig() is called.
> Since this is an awkward bit of CPython, it wouldn't hurt to mention what "the default path configuration" does (i.e. search for python's home starting at program_name, add fixed subdirs to it, etc.)
Oh, that's a big task :-) Nobody knows what getpath.c and getpathp.c do :-D
> Path configuration is mentioned as being able to issue warnings, but it doesn't mention *how*. It can't be the warnings module at this stage. I presume it's just printing to stderr.
First, I didn't know, but I just saw that it's only on Unix
(getpath.c). On Windows (getpathp.c), no warning is emitted.
The warning is written into C stderr.
The flag isn't new: it's based on Py_FrozenFlag. When I looked at how
Python is embedded, I was surprised by the number of applications
setting Py_FrozenFlag to 1 to suppress these warnings.
> Regarding Py_RunMain(): does it do the right thing when something calls PyErr_Print() with SystemExit set? (I mentioned last week that PyErr_Print() will call C's exit() in that case, which is obviously terrible for embedders.)
I spent a significant amount of time to ensure that
Py_InitializeFromConfig() and Py_RunMain() don't exit directly, but
return a proper failure or exit code. For example, Python 3.6 contains
around 319 calls to Py_FatalError(). The master branch contains around
181 calls to Py_FatalError(): still a lot, but I converted 138 calls
to _Py_INIT_ERR() ;-)
The work is not complete: I just checked, Py_RunMain() still calls
directly PyErr_Print() at many places. Well, the code can be fixed,
and it's not directly related to the PEP, is it? The issue already
existed in Python 3.7 with Py_Main().
> Regarding isolated_mode and the site module, should we make stronger guarantees about site.py's behaviour being optional? The problem with site is that it does four things that aren't configurable, one of which is usually very desirable, one of which probably doesn't matter to embedders, and two that are iffy: sys.path deduplication and canonicalisation (and fixing up __file__/__cached__ attributes of already-imported modules); adding site-packages directories; looking for and importing sitecustomize.py; executing .pth files. The site module doesn't easily allow doing only some of these. (user-site directories are an exception, as they have their own flag, so I'm not listing that here.) With Hermetic Python we don't care about any of these (for a variety of different reasons), but I'm always a little worried that future Python versions would add behaviour to site that we *do* need.
Honestly, I would prefer to simply remove the site module, I dislike
it because it makes Python startup way slower :-) ... But well, it
does a few important things :-)
About the PEP 587: PyConfig.user_site_directory is exported as
sys.flags.no_user_site (negative value) which is used by the site
module.
I'm not sure if you are asking me to modify my PEP, or if it's more a
general remark. The PEP 587 gives control on how sys.path is
initialized.
In the "Isolate Python" section, I suggest to set the "isolated"
parameter to 1 which imply setting user_site_directory to 0. So
sys.path isn't modified afterwards. What you pass to PyConfig is what
you get in sys.path in this case.
Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
It sounds like PyOxidizer and Hermetic Python are on the same page and
we're working towards a more official solution. But I want to make sure
by explicitly stating what PyOxidizer is doing.
Essentially, to facilitate in-memory import, we need to register a
custom sys.meta_path importer *before* any file-based imports are
attempted. In other words, we don't want the PathFinder registered on
sys.meta_path to be used. Since we don't have a clear "hook point" to
run custom code between init_importlib() (which is where the base
importlib system is initialized) and _PyCodecRegistry_Init() (which has
the first import of a .py-backed module - "encodings"), my current hack
in PyOxidizer is to compile a modified version of the
importlib._bootstrap_external module which contains my custom
MemoryImporter. I inject this custom version at run-time by updating the
global frozen modules table and the Python initialization mechanism
executes my custom code when the _frozen_importlib_external module is
imported/executed as part of init_importlib_external().
The CPython API can facilitate making this less hacky by enabling
embedders to run custom code between importlib initialization but before
any path-based modules are imported. I /think/ providing a 2-phase
initialization that stops between _Py_InitializeCore() and
_Py_InitializeMainInterpreter() would get the job done for PyOxidizer
today. But it may be a bit more complicated than that for others (or
possibly me in the future) since importlib is initialized in both these
phases. The "external" importlib bits providing the path-based importer
are initialized later during _Py_InitializeMainInterpreter. It's quite
possible we would want to muck around with the state of external
importers in our initialization "hook" and if they aren't loaded yet...
Furthermore, new_interpreter() makes both importlib initialization
function calls at the same time. We would need to inject a custom meta
path importer for sub-interpreters, so we would need a code injection
point in new_interpreter() as well. Installing a custom
importlib._bootstrap_external/_frozen_importlib_external module and
globally changing the code that runs during importlib init gets the job.
But it is hacky.
FWIW my notes on how this all works (in Python 3.7) are at
https://github.com/indygreg/PyOxidizer/blob/57c823d6ca2321d12067bf36603a9a0ad2320c75/docs/technotes.rst,
https://github.com/indygreg/PyOxidizer/blob/57c823d6ca2321d12067bf36603a9a0ad2320c75/README.rst#how-it-works,
and
https://gregoryszorc.com/blog/2018/12/18/distributing-standalone-python-applications/.
Again, it sounds like we're working towards a robust solution. I just
wanted to brain dump to make sure we are.
Regarding site.py, I agree it is problematic for embedding scenarios.
Some features of site.py can be useful. Others aren't. It would be
useful to have more granular control over which bits of site.run are
run. My naive suggestion would be to add individual flags to control
which functions site.py:main() runs. That way embedders can cherry-pick
site.py features without having to manually import the module and call
functions within. That feels much more robust for long-term maintainability.
While I was reading replies on this thread, a few other points regarding
embedding crept to my mind. I apologize in advance for the feature creep...
Regarding Python calling exit(), this is problematic for embedding
scenarios. This thread called attention to exit() during interpreter
initialization. But it is also a problem elsewhere. For example,
PyErr_PrintEx() will call Py_Exit() if the exception is a SystemExit.
There's definitely room to improve the exception handling mechanism to
give embedders better control when SystemExit is raised. As it stands,
we need to check for SystemExit manually and reimplement
_Py_HandleSystemExit() to emulate its behavior for e.g. exception value
handling (fun fact: you can pass non-None, non-integer values to
sys.exit/SystemExit).
Regarding builtin and frozen modules, they both are backed by global
arrays (PyImport_Inittab and PyImport_FrozenModules, respectively). In
my initial reply I mentioned that this seemed problematic because I
think a goal of initialization configuration overhaul should be to
remove dependence on global variables. In addition to that concern, I'd
like to point out that the way things work today is the BuiltinImporter
and FrozenImporter meta path importers test for module availability by
iterating these global arrays and calling
_PyUnicode_EqualToASCIIString() on each member. This is done via
import.c:is_builtin() and import.c:find_frozen(), respectively (both C
functions are exported to Python and called as part of their respective
find_spec() implementations). BuiltinImporter and FrozenImporter are
always registered as the first two sys.meta_path importers.
This behavior means every "import" results in an array walk + string
compare over these two arrays. Yes, this is done from C and should be
reasonably fast. But it feels inefficient to me. Slow import performance
is a common complaint against Python and anything that can be done to
minimize overhead could be useful.
Having a more efficient member lookup for BuiltinImporter and
FrozenImporter might shave off a millisecond or two from startup. This
would require some kind of derived data structure. Unfortunately, as
long as there is a global data structure that can be mutated any time
(the API contract doesn't prohibit modifying these global arrays after
initialization), you would need to check for "cache invalidation" on
every lookup, undermining performance benefits. But if the interpreter
config contained references to the builtin and frozen module arrays and
refused to allow them to be modified after initialization,
initialization could build a derived data structure used for efficient
lookups and all the problems go away!
What I'm trying to say is that moving the builtin and frozen modules
arrays to the initialization config data structure is not only the right
thing to do from an architectural perspective, but it can also open the
door to optimizing import performance.
FWIW PyOxidizer uses a Rust HashMap to index modules data, so testing
for module presence is O(1)~. It's on my laundry list of TODOs to create
an UberImporter that indexes every known module at startup and proxies
to BuiltinImporter and FrozenImporter as necessary. The cost to
satisfying a find_spec() or similar MetaPathFinder interface request
would then be a Rust HashMap lookup, avoiding the O(n) traversal of
sys.meta_path entries (which is Python heavy and relatively slow
compared to compiled code) and further avoiding the O(n) lookups in
BuiltinImporter and FrozenImporter. I don't expect this to yield the
performance wins that doing away with filesystem-based module importing
did (importing the entire standard library completed in ~70% of the
time). But startup overhead is a problem and every little improvement
can help. Perhaps there is room to add importlib APIs to facilitate
indexing. Then MetaPathFinders which are immutable could contribute to a
global lookup dict, facilitating much faster import operations. It
wouldn't work for PathFinder. But it opens some interesting possibilities...