Use-cases for subinterpreters

10 views

Skip to first unread message

Adam Olsen

unread,

Jun 4, 2008, 7:44:27 PM6/4/08

to modwsgi

Due to the number of bugs associated with subinterpreters I've been
advocating their use be avoided. Additionally, since they add a lot
of complexity, I've recently started suggesting CPython remove support
for them.

mod_wsgi makes significant use of subinterpreters though, so I'd like
to understand what it needs them for, and hopefully find simpler
alternatives.

* Multiple processes mean you can't use globals for communication, so
good practices should avoid it anyway
* Not usable as a secure sandbox, so code has to be trusted not to
stick garbage in the stdlib.
* Can't load more than one copy of a C extension, although each
subinterpreter is given a shallow copy of the C extension's module
dict.
* Can't unload C extensions
* Can't free subinterpreters

The best I've come up with is having multiple incompatible versions of
a pure-python package, but who's imports don't allow them to be
installed as subpackages. Updating the package to use PEP 328's
relative imports would solve this.

What am I missing?

Graham Dumpleton

unread,

Jun 5, 2008, 4:02:18 AM6/5/08

to mod...@googlegroups.com

I need some time to answer this properly, and where I am right now I
don't have that time.

The short answer is that mod_wsgi daemon mode and its ability to
create separate daemon processes per application means that use of sub
interpreters aren't strictly necessary.

I'll comment further in a day or so when have a chance.

Graham

2008/6/5 Adam Olsen <rha...@gmail.com>:

Graham Dumpleton

unread,

Jun 6, 2008, 11:43:31 PM6/6/08

to mod...@googlegroups.com

2008/6/5 Adam Olsen <rha...@gmail.com>:

>
> Due to the number of bugs associated with subinterpreters I've been
> advocating their use be avoided. Additionally, since they add a lot
> of complexity, I've recently started suggesting CPython remove support
> for them.

What bugs in particular are you talking about?

Also, what personal experience do you actually have with implementing
applications which embed Python sub interpreters and specifically
multiple Python sub interpreters?

I ask as I can't see that you have much posting history on Google
Groups related to Python and so am trying to work how much you truly
know about this area of using Python.

I have had discussions with people about use of multiple sub
interpreters before who although they had a lot to say were just
regurgitating what others had said and never actually had any first
hand experience of the issues. In some cases what these people were
saying wasn't even really valid or was exaggerated.

> mod_wsgi makes significant use of subinterpreters though, so I'd like
> to understand what it needs them for, and hopefully find simpler
> alternatives.

For when mod_wsgi uses embedded mode, they are used as a means of
isolating different Python applications from each other, especially
where each expects to own global data.

As an example, it is impossible to run multiple Django instances
within the same Python interpreter instance, thus to run more than one
in a process you have to use multiple sub interpreters.

> * Multiple processes mean you can't use globals for communication, so
> good practices should avoid it anyway

There is nothing wrong with use of multiple processes and in the
context of web applications an application which is dependent on being
run in a single process could be said to be flawed as is prevents the
ability to horizontally scale a web application across multiple hosts.
Since Apache is multi process, such an application couldn't even be
usefully used with it either on a single host.

Read:

http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading

if you haven't already.

> * Not usable as a secure sandbox, so code has to be trusted not to
> stick garbage in the stdlib.

Multiple sub interpreters still give you a measure of isolation. An
application in one sub interpreters changing stuff in Python standard
modules should not effect another sub interpreter. Only exceptions is
stuff to do with C environment and custom C modules which aren't safe
to use with multiple interpreters. C extension in standard Python are
fine, it is only third party modules where people don't know enough
about how Python works and could be used to know there code is wrong
who are the problem.

That said, if you want true sandboxing, then you must use different
processes, you can't avoid it.

> * Can't load more than one copy of a C extension, although each
> subinterpreter is given a shallow copy of the C extension's module
> dict.

If that presents a problem, it is an error on the module writers part
for not properly handing multiple sub interpreter case. It is also
perhaps not completely true as one could always create a physical copy
of module. Apparently Python will treat modules of same name under
different paths as distinct in different interpreters.

> * Can't unload C extensions

In what cases is that a problem? In other words, what are some valid
use cases for doing it?

> * Can't free subinterpreters

You can destroy sub interpreters, just not the main one, at least not
without ending whole of Python and reinitialising it in the context of
the same process.

Destruction of sub interpreters does present some problems, and have
learnt that doing it is not a good idea. The main problems are again
though due to third party C extension writers not writing their code
so as to be able to handle it.

> The best I've come up with is having multiple incompatible versions of
> a pure-python package, but who's imports don't allow them to be
> installed as subpackages. Updating the package to use PEP 328's
> relative imports would solve this.

How is this relevant? In particular, what is the issue with
'pure-python' packages.

> What am I missing?

Maybe a description of the specific problem you are having?

Although sub interpreters have some issues, in the context of mod_wsgi
many don't apply. Most of the others are due to poorly written third
party C extension modules.

In summary, there may be some issues, but more than manageable and
still a viable technology for some use cases.

Graham

Adam Olsen

unread,

Jun 7, 2008, 1:05:08 AM6/7/08

to mod...@googlegroups.com

On Fri, Jun 6, 2008 at 9:43 PM, Graham Dumpleton
<graham.d...@gmail.com> wrote:
>
> 2008/6/5 Adam Olsen <rha...@gmail.com>:
>>
>> Due to the number of bugs associated with subinterpreters I've been
>> advocating their use be avoided. Additionally, since they add a lot
>> of complexity, I've recently started suggesting CPython remove support
>> for them.
>
> What bugs in particular are you talking about?

The one that most recently came up is here: http://bugs.python.org/issue1758146
You mention several yourself, such as extension modules not being
notified when a subinterpreter is destroyed.

And of course there's this post by Martin v Löwis, but you should know
as you posted plenty in that thread:

http://groups.google.com/group/comp.lang.python/msg/7d9c614e6bf3f922

> Also, what personal experience do you actually have with implementing
> applications which embed Python sub interpreters and specifically
> multiple Python sub interpreters?
>
> I ask as I can't see that you have much posting history on Google
> Groups related to Python and so am trying to work how much you truly
> know about this area of using Python.
>
> I have had discussions with people about use of multiple sub
> interpreters before who although they had a lot to say were just
> regurgitating what others had said and never actually had any first
> hand experience of the issues. In some cases what these people were
> saying wasn't even really valid or was exaggerated.

I've little experience using subinterpreters, but I have a significant
understanding of the implementation behind it. As part of my
python-safethread[1] project I have rewritten the entire threading API
and removed the (seemingly unnecessary) subinterpreter API.

Even if python-safethread is not accepted as a whole into Python, I
may end up cleaning up the thread/interpreter APIs.

[1] http://code.google.com/p/python-safethread/

>> mod_wsgi makes significant use of subinterpreters though, so I'd like
>> to understand what it needs them for, and hopefully find simpler
>> alternatives.
>
> For when mod_wsgi uses embedded mode, they are used as a means of
> isolating different Python applications from each other, especially
> where each expects to own global data.
>
> As an example, it is impossible to run multiple Django instances
> within the same Python interpreter instance, thus to run more than one
> in a process you have to use multiple sub interpreters.

It sounds like Django should be fixed to use less globals and be more
thread-safe. Saying they're wrong doesn't help you though.. so I see
your point.

>> * Multiple processes mean you can't use globals for communication, so
>> good practices should avoid it anyway
>
> There is nothing wrong with use of multiple processes and in the
> context of web applications an application which is dependent on being
> run in a single process could be said to be flawed as is prevents the
> ability to horizontally scale a web application across multiple hosts.
> Since Apache is multi process, such an application couldn't even be
> usefully used with it either on a single host.
>
> Read:
>
> http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
>
> if you haven't already.

That was my point. Multiple processes mean your app can't be using
globals, so your app shouldn't need the isolation.

That ignores thread-unsafe libraries though, which use globals to
communicate within themselves - ie your Django example.

>> * Not usable as a secure sandbox, so code has to be trusted not to
>> stick garbage in the stdlib.
>
> Multiple sub interpreters still give you a measure of isolation. An
> application in one sub interpreters changing stuff in Python standard
> modules should not effect another sub interpreter. Only exceptions is
> stuff to do with C environment and custom C modules which aren't safe
> to use with multiple interpreters. C extension in standard Python are
> fine, it is only third party modules where people don't know enough
> about how Python works and could be used to know there code is wrong
> who are the problem.

Again, no reason why an app would *need* isolation. It's just to work
around libraries only designed for one user per process.

>> * Can't load more than one copy of a C extension, although each
>> subinterpreter is given a shallow copy of the C extension's module
>> dict.
>
> If that presents a problem, it is an error on the module writers part
> for not properly handing multiple sub interpreter case. It is also
> perhaps not completely true as one could always create a physical copy
> of module. Apparently Python will treat modules of same name under
> different paths as distinct in different interpreters.

My understanding is that multiple C modules under different names
don't work (on linux). One's symbols will take priority over the
other's, and unless they happen to be identical libraries, they'll
result in the expat, mysql, ssl, and md5 issues mentioned in your
ApplicationIssues page. As Python is unable to unload a C library, it
attempts to retain a permanent dictionary of all it has loaded, and
reuse them when asked to - giving different names breaks that.

>
>> * Can't unload C extensions
>
> In what cases is that a problem? In other words, what are some valid
> use cases for doing it?
>
>> * Can't free subinterpreters
>
> You can destroy sub interpreters, just not the main one, at least not
> without ending whole of Python and reinitialising it in the context of
> the same process.
>
> Destruction of sub interpreters does present some problems, and have
> learnt that doing it is not a good idea. The main problems are again
> though due to third party C extension writers not writing their code
> so as to be able to handle it.
>
>> The best I've come up with is having multiple incompatible versions of
>> a pure-python package, but who's imports don't allow them to be
>> installed as subpackages. Updating the package to use PEP 328's
>> relative imports would solve this.

To rephrase, I was guessing you'd need multiple different versions of
a package such as Django. As you explained though, even for the same
version you need multiple copies.

> How is this relevant? In particular, what is the issue with
> 'pure-python' packages.

pure-python packages are not the issue, just the opposite. It's
impure (ie C extensions) that can only exist once in the entire
process.

>> What am I missing?
>
> Maybe a description of the specific problem you are having?
>
> Although sub interpreters have some issues, in the context of mod_wsgi
> many don't apply. Most of the others are due to poorly written third
> party C extension modules.
>
> In summary, there may be some issues, but more than manageable and
> still a viable technology for some use cases.

The question is not whether it can be made to work, but what parts you
*need* to work, and whether they're worth the maintenance costs.

--
Adam Olsen, aka Rhamphoryncus

Graham Dumpleton

unread,

Jun 8, 2008, 9:28:37 AM6/8/08

to mod...@googlegroups.com

2008/6/7 Adam Olsen <rha...@gmail.com>:

>
> On Fri, Jun 6, 2008 at 9:43 PM, Graham Dumpleton
> <graham.d...@gmail.com> wrote:
>>
>> 2008/6/5 Adam Olsen <rha...@gmail.com>:
>>>
>>> Due to the number of bugs associated with subinterpreters I've been
>>> advocating their use be avoided. Additionally, since they add a lot
>>> of complexity, I've recently started suggesting CPython remove support
>>> for them.
>>
>> What bugs in particular are you talking about?
>
> The one that most recently came up is here: http://bugs.python.org/issue1758146

That is a bug in Python wrappers for subversion. It is well known that
the Python subversion wrappers have problems being used in secondary
sub interpreters. The person having the problem has brought this on
themselves by not following documentation for Trac which states that
mod_python should be setup to run Trac in main Python sub interpreter
when subversion is used, using:

PythonInterpreter main_interpreter

They are trying to run two Trac instances in different sub
interpreters, which will cause such random problems.

This is not a bug in core Python code nor in sub interpreter support.

There are similar warnings for Trac integration with mod_wsgi warning
of these bugs in Python subversion bindings.

> You mention several yourself, such as extension modules not being
> notified when a subinterpreter is destroyed.

But extension modules can still be written to cope with not being
notified, just that people don't it. In some respects the real problem
is inadequate documentation for Python about writing modules to be
safe for use with multiple interpreters.

> And of course there's this post by Martin v Löwis, but you should know
> as you posted plenty in that thread:
>
> http://groups.google.com/group/comp.lang.python/msg/7d9c614e6bf3f922

What it comes down to is although there are some issues with sub
interpreters, mainly around how people write third party extension
modules, they are not completely broken as it was claimed by Martin in
that thread. Such alarmist and broad statements like that do not help.

>> Also, what personal experience do you actually have with implementing
>> applications which embed Python sub interpreters and specifically
>> multiple Python sub interpreters?
>>
>> I ask as I can't see that you have much posting history on Google
>> Groups related to Python and so am trying to work how much you truly
>> know about this area of using Python.
>>
>> I have had discussions with people about use of multiple sub
>> interpreters before who although they had a lot to say were just
>> regurgitating what others had said and never actually had any first
>> hand experience of the issues. In some cases what these people were
>> saying wasn't even really valid or was exaggerated.
>
> I've little experience using subinterpreters, but I have a significant
> understanding of the implementation behind it. As part of my
> python-safethread[1] project I have rewritten the entire threading API
> and removed the (seemingly unnecessary) subinterpreter API.
>
> Even if python-safethread is not accepted as a whole into Python, I
> may end up cleaning up the thread/interpreter APIs.
>
>
> [1] http://code.google.com/p/python-safethread/

If it slows down the normal single threaded Python use case as the
site seems to suggest, I would say you are going to have an uphill
battle getting it included.

>>> mod_wsgi makes significant use of subinterpreters though, so I'd like
>>> to understand what it needs them for, and hopefully find simpler
>>> alternatives.
>>
>> For when mod_wsgi uses embedded mode, they are used as a means of
>> isolating different Python applications from each other, especially
>> where each expects to own global data.
>>
>> As an example, it is impossible to run multiple Django instances
>> within the same Python interpreter instance, thus to run more than one
>> in a process you have to use multiple sub interpreters.
>
> It sounds like Django should be fixed to use less globals and be more
> thread-safe. Saying they're wrong doesn't help you though.. so I see
> your point.

Yes Django needs to be fixed. But it isn't just Django, other major
Python web frameworks such as TurboGears, and possibly Pylons (not
sure), have similar sorts of issues around use of global data. Thus it
wouldn't be just Django that would be affected.

The only Python web application that plays really nice in this respect
as far as running multiple project sites within one interpreter, by
not relying on global data in a bad way, is Trac.

>>> * Can't load more than one copy of a C extension, although each
>>> subinterpreter is given a shallow copy of the C extension's module
>>> dict.
>>
>> If that presents a problem, it is an error on the module writers part
>> for not properly handing multiple sub interpreter case. It is also
>> perhaps not completely true as one could always create a physical copy
>> of module. Apparently Python will treat modules of same name under
>> different paths as distinct in different interpreters.
>
> My understanding is that multiple C modules under different names
> don't work (on linux). One's symbols will take priority over the
> other's, and unless they happen to be identical libraries, they'll
> result in the expat, mysql, ssl, and md5 issues mentioned in your
> ApplicationIssues page. As Python is unable to unload a C library, it
> attempts to retain a permanent dictionary of all it has loaded, and
> reuse them when asked to - giving different names breaks that.

C libraries are different to Python extension modules. The extension
modules aren't loaded with global symbol context and so there
shouldn't be clashes with other extension modules. If different
extension modules require different versions of C libraries then you
will have problems, but that is a different issue.

That said, I have always suggested not to use different versions of
extension modules in same process as I have seen enough strangeness
not to trust that it will work, but Martin in that thread (I think),
claims it should be okay.

> To rephrase, I was guessing you'd need multiple different versions of
> a package such as Django. As you explained though, even for the same
> version you need multiple copies.

You don't need multiple copies of same version of Django when using
them in different interpreters. This is because Django is Python code
only and C extension module issues don't affect it.

Ultimately if multiple sub interpreter support goes away I probably
will not care too much as I only other the hosting software and I will
work to what is available. Also, mod_wsgi has daemon mode which can be
used as an alternate means of providing separation with only minimal
additional configuration overhead. You may though upset a lot of
people who do rely on this ability in mod_wsgi and mod_python.

So, if multiple sub interpreter support went away it would survive as
would simply adapt the code. There has even been discussion in the
past about whether for simplicity support for multiple interpreters
should be dropped.

For mod_python though, elimination of multiple sub interpreters would
probably be the last nail in its coffin. It is possibly already
unlikely that mod_python will get ported to Python 3.0. If some
version of Python soon after that is going to drop support for
multiple interpreters, then it would be even less likely that it would
get ported to Python 3.0. This is because it doesn't have an alternate
mechanism such as mod_wsgi daemon mode to provide separation.

Graham

Adam Olsen

unread,

Jun 8, 2008, 1:07:50 PM6/8/08

to mod...@googlegroups.com

On Sun, Jun 8, 2008 at 7:28 AM, Graham Dumpleton
<graham.d...@gmail.com> wrote:
>
> 2008/6/7 Adam Olsen <rha...@gmail.com>:

>> Even if python-safethread is not accepted as a whole into Python, I
>> may end up cleaning up the thread/interpreter APIs.
>>
>>
>> [1] http://code.google.com/p/python-safethread/
>
> If it slows down the normal single threaded Python use case as the
> site seems to suggest, I would say you are going to have an uphill
> battle getting it included.

Yup. Removing the gil will be a compile-time option though, so
hopefully providing two versions of the interpreter will be enough.

> Ultimately if multiple sub interpreter support goes away I probably
> will not care too much as I only other the hosting software and I will
> work to what is available. Also, mod_wsgi has daemon mode which can be
> used as an alternate means of providing separation with only minimal
> additional configuration overhead. You may though upset a lot of
> people who do rely on this ability in mod_wsgi and mod_python.
>
> So, if multiple sub interpreter support went away it would survive as
> would simply adapt the code. There has even been discussion in the
> past about whether for simplicity support for multiple interpreters
> should be dropped.

That's pretty much the bottom line. We can't rip it out because it's
currently in use, we don't want to support it because it's a
workaround for poorly designed libraries, and we don't want to leave
it as-is because *other* poorly designed libraries will crash!

Long term, it's clear what provides a better language, but that takes
a concerted effort to create..