Pymongo with mod_wsgi

238 views
Skip to first unread message

Sunil

unread,
Jan 29, 2010, 3:36:42 AM1/29/10
to mongodb-user, arora...@gmail.com
There is a documentation on FAQ page related to a potential issue of
pymongo with mod-wsgi on following page:

http://api.mongodb.org/python/1.4%2B/faq.html#does-pymongo-work-with-mod-wsgi

We are planning to use pymongo in our web app. which will be running
in mod_wsgi and wanted to ensure that we don't hit that issue.

I am running a very simple app which uses pymongo(1.4, c-extension
enabled) in mod_wsgi to retrieve data from mongodb and displays the
data. The app is running successfully out of the box (I am NOT using
any of the workarounds mentioned on the FAQ page).

So can someone help me understand the exact scenarios (a sample code
would be great) where that potential issue can pop up. I know it will
be safe to follow the workarounds but I want to understand the issue
better.

Thanks

Michael Dirolf

unread,
Jan 29, 2010, 10:30:03 AM1/29/10
to mongod...@googlegroups.com, arora...@gmail.com
The issue can only pop up if mod_wsgi is configured to use multiple
sub-interpreters and you have the C extension installed. I thought
that the multiple sub-interpreter mode was the default but it's
possible that that has changed (or that it never was the default).
Either way I expect that if you run ab with some concurrency you would
see the issue manifest itself pretty quickly (as an exception
explaining what happened), or you're probably safe.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Roger Binns

unread,
Jan 29, 2010, 2:05:19 PM1/29/10
to mongod...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michael Dirolf wrote:
> The issue can only pop up if mod_wsgi is configured to use multiple
> sub-interpreters and you have the C extension installed.

The underlying problem is due to using "the" global interpreter lock. The
lock effectively points to the current interpreter state. If you write C
extensions the normal way then they will end up messing things up when there
are multiple interpreters by mixing up the state between the different
interpreters which will ultimately cause a crash.

The code can be corrected by having the extension keep track of which
interpreter various objects are associated with (and no globals), and
manipulating the correct state. This however is extremely difficult to do
with the standard extension APIs not being of much help. Additionally
*every* C extension in use has to do this as just one could confuse things.

In general using multiple interpreters in one process is not well tested,
and doesn't get you much benefit - there is still effectively a single GIL.
Many C extensions will not work correctly anyway. The advice is to just
use multiple processes with one interpreter per process. If you insist on
using more than one interpreter per process then each C extension needs to
be audited or avoid using C extensions.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAktjMW8ACgkQmOOfHg372QSyOwCgrPyi/x9YFbqNNLRA8aSvtohM
+osAoI4HKPwxguFl2yXLDP0EW768B20C
=NqbP
-----END PGP SIGNATURE-----

Graham Dumpleton

unread,
Jan 30, 2010, 12:50:41 AM1/30/10
to mongodb-user

On Jan 29, 7:36 pm, Sunil <arora.su...@gmail.com> wrote:
> There is a documentation on FAQ page related to a potential issue of
> pymongo with mod-wsgi on following page:
>

> http://api.mongodb.org/python/1.4%2B/faq.html#does-pymongo-work-with-...


>
> We are planning to use pymongo in our web app. which will be running

> inmod_wsgiand wanted to ensure that we don't hit that issue.


>
> I am running a very simple app which uses pymongo(1.4, c-extension

> enabled) inmod_wsgito retrieve data from mongodb and displays the


> data. The app is running successfully out of the box (I am NOT using
> any of the workarounds mentioned on the FAQ page).
>
> So can someone help me understand the exact scenarios (a sample code
> would be great) where that potential issue can pop up. I know it will
> be safe to follow the workarounds but I want to understand the issue
> better.

For the record, information in that FAQ falls quite short of
explaining what the issue is and in some respects is misleading.

The FAQ says:

"""When running PyMongo with the C extension enabled it is possible to
see strange failures when encoding due to the way mod_wsgi handles
module reloading with multiple sub interpreters."""

It actually has nothing to do with module reloading and what mod_wsgi
does is nothing different to anything else which would use multiple
subinterpreters. That is, mod_wsgi uses them how they are meant to be
used and is not at fault. The problem, as someone else rightly
described is because the C extension module is not tying global data
created at the C level to specific interpreters. Thus it is sharing
Python objects between sub interpreters, which is generally a very bad
idea.

It is true that Python 2.X doesn't provide any help on doing this, but
isn't as hard as made out. The conjecture that Python sub interpreter
support in Python isn't well tested is wrong as mod_python and
mod_wsgi have been using it heavily for many years. There are areas of
Python where support for modules working in sub interpreters could be
improved, but it isn't broken.

As solutions the FAQ suggests:

"""Force all WSGI applications to run in the same application
group."""

This may or may not be enough depending on how the C extension module
is implemented. This is because another common problem with C
extension modules is that when they use full Python thread API they
don't do so properly and code will not work in sub interpreters. That
or they specifically use the simplified Python thread API which can
only work in the first of main Python interpreter. The first is going
to be a bug in the C extension. The latter is a design limitation as
far as what API the C extension decided to use.

The best thing to do is to force the application to run in the first
or main Python interpreter created when Python is initialised. This is
equivalent to what code runs in when using command line Python.

In mod_wsgi this is achieved using:

WSGIApplicationGroup %{GLOBAL}

in the Apache configuration in the context which applies to that
specific WSGI application.

The problem is that forcing many WSGI applications to run in the same
interpreter can itself be a problem. For example, you cannot run
multiple Django instances in the same interpreter because it is
dependent on global configuration data. The web2py framework has
similar issues. Applications such as Trac don't have that problem so
long as you use the right way of configuring them.

This then gets on to where FAQ says:

"""Run mod_wsgi in daemon mode with different WSGI applications
assigned to their own daemon processes."""

So, if you are stuck with a WSGI application where only a single
instance can be run in same interpreter and you need to run more than
one on your web site, the only solution is to use mod_wsgi daemon mode
and delegate each to its own set of processes. In doing that, it is
still recommended that they be run in the first or main Python
interpreter.

In mod_wsgi this can be achieved using:

WSGIDaemonProcess group1
WSGIDaemonProcess group2

WSGIScriptAlias /suburl /some/path/app1/application.wsgi
WSGIScriptAlias / /some/path/app2/application.wsgi

<Directory /some/path/app1>
WSGIProcessGroup group1
WSGIApplicationGroup %{GLOBAL}
Order allow,deny
Allow from all
</Directory>

<Directory /some/path/app2>
WSGIProcessGroup group2
WSGIApplicationGroup %{GLOBAL}
Order allow,deny
Allow from all
</Directory>

Hopefully that explains things a bit better and perhaps someone may
take that and expand on the information in the FAQ and correct
anything that isn't strictly true.

Graham

Roger Binns

unread,
Jan 30, 2010, 1:19:48 PM1/30/10
to mongod...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Graham Dumpleton wrote:
> It is true that Python 2.X doesn't provide any help on doing this, but
> isn't as hard as made out.

BTW Graham I have asked you directly several times exactly what I as a C
extension author should do and your answer has always amounted to "It
depends" :-) I have yet to find any find any documentation showing what
should be done if one's C extension has been written following the patterns
described in the Python documentation for doing so.

(I'm not complaining - you have done a fantastic job with mod_wsgi and
mod_python before that, but while *you* know exactly what should be done,
the rest of us don't nor do we have any documentation to know that we have
done everything that is needed.)

> The conjecture that Python sub interpreter
> support in Python isn't well tested is wrong

The background for me claiming something along those lines is when the topic
comes up on c.l.p it is usually met with astonishment from people that it
does work, then the whole C extension thing, followed by saying if they
should be truly independent then just use one interpreter per process since
that can't go wrong. And of course it hasn't been well tested using various
C extension modules since the authors do not know what they should be doing
different than what is described in the Python docs.

> That or they specifically use the simplified Python thread API which can
> only work in the first of main Python interpreter.

That is the only thread API that is documented! (Ok a whole bunch of
functions are documented to but it is not particularly apparent how you are
supposed to string them together.) My extension supports Python 2.3 and
above (including 3) because of the simplified GIL API introduced in 2.3.

I would love to make my C extension work correctly but there is no complete
(or any!) documentation on what to do. I'm pretty sure many C extensions
started out like this:

http://docs.python.org/extending/extending.html#a-simple-example

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAktkeEQACgkQmOOfHg372QQ0bQCePnuqzCOrhDMO0+BDcY9YxDV/
qwMAn01Z0I0eadfojx+OelYPYGySDYCa
=mU3Z
-----END PGP SIGNATURE-----

Sunil

unread,
Feb 1, 2010, 12:16:53 AM2/1/10
to mongodb-user
I had not expected such details insights, so thanks Graham, Michael
and Roger for such detailed insights. Got the fundamentals clear
now :)

Thanks!


On Jan 30, 10:50 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:

Michael Dirolf

unread,
Feb 1, 2010, 12:40:54 PM2/1/10
to mongod...@googlegroups.com
Thanks for the detailed response - some thoughts inline:

> used and is not at fault. The problem, as someone else rightly
> described is because the C extension module is not tying global data
> created at the C level to specific interpreters. Thus it is sharing
> Python objects between sub interpreters, which is generally a very bad
> idea.

You'r ecorrect in saying that this is the problem. Unfortunately I
think that solving this in a performant way might be a bit tricky. For
now we make a best effort to raise a reasonable exception, but any
patches to handle this case better are more than welcome!

> """Force all WSGI applications to run in the same application
> group."""
>
> This may or may not be enough depending on how the C extension module
> is implemented.

This should work (and has been tested, though not too recently) with
the PyMongo C extension.

> Hopefully that explains things a bit better and perhaps someone may
> take that and expand on the information in the FAQ and correct
> anything that isn't strictly true.

Would love contributions to the FAQ/docs in this regard. It's as easy
as a fork and pull request, all of the docs live in the main repo
under the "doc/" sub-directory.

Reply all
Reply to author
Forward
0 new messages