I wouldn't be overly concerned about the GIL and try to guess what
configuration may be better than another. I have talked about this a
bit before in:
http://blog.dscpl.com.au/2007/07/web-hosting-landscape-and-modwsgi.html
In that I said:
"""In addition to the low overhead, there are also other positive
benefits deriving from how Apache works when using this mode. The
first is that Apache uses multiple child processes to handle requests.
As a result, any contention for the Python GIL within the context of a
single process is not an issue, as each process will be independent.
Thus there is no impediment when using multi processor systems.
That said, the GIL is not as big a deal as some people make out, even
when using Apache with only one multi-threaded child process for
accepting requests. This is because the code which handles accepting
of requests, determines which Apache handler should process the
request, along with the code for reading the request content and
writing out the response content, is all written in C and is in no way
linked to Python. As a consequence there are large sections of code
where the GIL is not being held. On top of that, the same web server
may also be serving up static files where again the GIL doesn't even
come into the picture. So, more than enough opportunity for making
good use of those multiple processors."""
I was actually talking about embedded mode there, but just as
pertinent to daemon mode. In particular, in daemon mode the Apache
server child processes are still doing work at the same time as they
are doing the proxying of the request to the daemon mode process. So,
you are going to have multiple processes trying to do stuff anyway.
There isn't much point trying to match number of daemon processes to
number of cores purely based on concerns about the GIL.
This doesn't mean you shouldn't try and tune the Apache MPM settings
and daemon mode settings to see what works best, but to do that you
really need to have your actual application running and be hitting it
with realistic traffic patterns. That is, no point just using 'ab' at
maximum throttle against a single URL as in practice your site is
never going to be pushed to the maximum. If you are running out of
grunt even for typical traffic, then you need to upgrade your system
to give it more headroom to deal with real spikes in traffic.
> - Between embedded mode & daemon mode, which uses less memory?
Worker MPM with daemon mode. See the dangers of using prefork and
especially embedded mode in:
http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html
You sacrifice Apache's ability to create additional processes to
handle demand, but frankly for fat Python web applications that is
arguably a stupid feature as it compounds problems. Namely, just when
you start to get a spike in requests, Apache tries to create more of
your fat applications, which just load the machine even more and slow
things down. In worst case the slowing down makes Apache thinks it
needs even more processes and it can spiral out of control and choke
your whole server. So, for fat Python web applications you are just
better off using multithreading and making sure you have enough
processes/threads to handle expected demand to begin with.
> Assuming worker MPM is used.
> I'll be using either shared hosting or VPS, so reducing memory use is
> very important
>
> - With mod_python, it's recommended to put a reverse proxy (eg. nginx)
> in front of the fat Apache & serve static content from the reverse
> proxy, does the same recommendation still applies to mod_wsgi?
Yes. And turn keep alive off on Apache to ensure connections released
straight away. Keep alive is generally only effective for static media
files which nginx would then be handling. By disabling keep alive you
get better utilisation of available connections and lower memory usage
in Apache server child processes.
> If daemon mode is used, will the front end Apache process act as an
> effective reverse proxy?
Apache is effectively acting as an internal proxy for the daemon
processes. Even so, you are still better off pushing static media to a
nginx in front of that. The overhead of the extra internal hop within
Apache to get to the daemon processes is so small you would never see
it within context of typical request times for Python web
applications.
> - What about mod_wsgi for nginx - how does that compared to Apache's
> mod_wsgi? Would it be less memory intensive?
I can't really comment on that except to say that it doesn't matter
what WSGI hosting mechanism you use, be it Apache, nginx or a pure
Python web server such as Paste serve of CherryPy WSGI server. For
each process running your Python web application, each system is still
going to use about the same amount of memory. This is because the
underlying Python interpreter memory usage should always be the same
and your Python web applications is also going to always use about the
same amount of memory as well. Any small differences that there may be
would relate to how the web server aspect of the system uses memory
differently, but the differences aren't generally going to be
significant as long as you set the servers up properly. No particular
system provides some magic bullet that somehow nullifies how much
memory your actual Python web application uses and that is where the
bulk of your memory will be used.
Graham
OT, sorry
Thanks for the great explanations Graham. My talk proposal, Web
Server Shootout[1], for the Open Source Bridge Conference[2] was
accepted, so you may see me popping in with more questions similar to
the OPs. :-)
Would love to see you there! You'd definitely get a beer on me!
[1] http://opensourcebridge.org/proposals/119
[2] http://opensourcebridge.org/
It is in a foreign land over the big blue sea and a long way away from
where I am. So, I somewhat doubt you will see me there. :-)
Graham