My Django + Cpanel rough draft

19 views
Skip to first unread message

Milan Andric

unread,
Jul 27, 2008, 11:41:21 PM7/27/08
to mod...@googlegroups.com
http://m.andric.us/post/43754517/django-python-with-cpanel

Comments appreciated. Obviously tons I glossed over. Tried to give a
quick tutorial on how I got a django app running on a cPanel/CentOS
server with Apache2. Planning to point the folks over at cPanel to
add mod_wsgi support for python apps.

--
Milan

Graham Dumpleton

unread,
Jul 29, 2008, 4:00:27 AM7/29/08
to mod...@googlegroups.com
2008/7/28 Milan Andric <man...@gmail.com>:

A few helpful (I hope) comments. :-)

In your list of advantages, you have:

"""Python is not embedded in every apache process."""

This is actually not true, just like in mod_python, Python is embedded
in every Apache process as the initialisation of Python is actually
done in the Apache parent process before any child processes or
mod_wsgi daemon processes are forked.

This technically means you can be running multiple WSGI applications
at the same, where some run in embedded mode, ie., in normal Apache
child processes, and some run in daemon mode, ie., in mod_wsgi
specific daemon processes.

The reason that the memory footprint with mod_wsgi is a lot less and
it may appear that Python is not embedded in every process, is that
with mod_python a lots of its code base was in Python and not C code.
These Python modules had to be loaded for main interpreter when
mod_python was initialised. In turn these preloaded a lot of standard
Python modules. As a result, the base memory footprint of a process
with mod_python loaded was a few MB. With mod_wsgi, it is all C code
and no Python code. The Apache C code module is relatively small.
There is still some overhead from Python initialisation and
maintaining the main interpreter, but because no Python modules are
preloaded, it is actually really quite small.

The mod_python package also got a bad wrap for something which wasn't
its fault. That is that Python installations that provided only a
static library caused undue memory bloat of a few MB as well. With
mod_wsgi I have winged and warned a lot about this and told people to
make sure they use a shared Python library to avoid the issue. Thus
overall, mod_wsgi memory footprint is much less than mod_python.

You then say:

"""Scales better than mod_python on machines doing mass virtual hosting."""

Except for mod_python consuming a bit more memory and mod_wsgi having
lower base request handling overhead, when looking at the larger
picture, I wouldn't expect much difference in the abilities for them
to scale if one is talking about single application running in
embedded mode. This is because the scaling abilities come more from
Apache feature whereby it can dynamically create additional child
processes to handle demand.

There is also a difference between mod_python and mod_wsgi as far as
how interpreters are used. In mod_python the default is one
interpreter per virtual host. With mod_wsgi the default is one
interpreter per WSGI application. Technically this means that if
arbitrarily throwing lots of WSGI application on a virtual host, that
mod_wsgi may use more memory because of the way it separates the
applications. But then, if using mod_python, with the default the way
it is, if the applications couldn't safely coexist, you would have
lots of issues and would be forced to separate them anyway.

I guess the thing mod_wsgi has in its favour with respect to virtual
hosting is daemon mode. That is, creating separate daemon process
groups for each virtual host, with the daemon processes potentially
running as different users who are the owners of those virtual hosts.
Because separate processes are used for each virtual host though, you
couldn't scale this up to large numbers of virtual hosts as you
wouldn't be able to handle that many processes. For that to be
practical, mod_wsgi needs to implement transient daemon processes that
are only created on demand and which are reaped quite quickly when no
activity. Such a feature is in mod_wsgi issue list, but will be a
while before it is done.

The question then is what you mean by 'on machines doing mass virtual hosting'.

Last item in that list is:

"""Graham Dumpleton rocks."""

Not sure how true that is. There are certain forums you can go to
where people reckon I have rocks in my head and that mod_wsgi is evil.

With respect to you Apache configuration, you have:

# don't handle these with mod_wsgi
<Location /media>
SetHandler None
</Location>
<Location /robots.txt>
SetHandler None
</Location>
<Location /admin_media>
SetHandler None
</Location>

These shouldn't be required for mod_wsgi, they are a hang over from
mod_python. In mod_wsgi the Alias directives should be enough as they
will take precedence over WSGIScriptAlias for those URLs.

I have to run now, but got a few more things to say later about Python
egg cache and logging.

Graham

Milan Andric

unread,
Jul 29, 2008, 9:21:30 AM7/29/08
to mod...@googlegroups.com

Thanks for explaining that. Updated my post.

> You then say:
>
> """Scales better than mod_python on machines doing mass virtual hosting."""
>
> Except for mod_python consuming a bit more memory and mod_wsgi having
> lower base request handling overhead, when looking at the larger
> picture, I wouldn't expect much difference in the abilities for them
> to scale if one is talking about single application running in
> embedded mode. This is because the scaling abilities come more from
> Apache feature whereby it can dynamically create additional child
> processes to handle demand.

So embedded mode actually scales better than daemon mode, makes sense.
But I imagine if my application was thread safe I could setup 5 or so
daemons with a high threads value so it could scale better?

>
> There is also a difference between mod_python and mod_wsgi as far as
> how interpreters are used. In mod_python the default is one
> interpreter per virtual host. With mod_wsgi the default is one
> interpreter per WSGI application. Technically this means that if
> arbitrarily throwing lots of WSGI application on a virtual host, that
> mod_wsgi may use more memory because of the way it separates the
> applications. But then, if using mod_python, with the default the way
> it is, if the applications couldn't safely coexist, you would have
> lots of issues and would be forced to separate them anyway.
>

Ah, good point, added that.

> I guess the thing mod_wsgi has in its favour with respect to virtual
> hosting is daemon mode. That is, creating separate daemon process
> groups for each virtual host, with the daemon processes potentially
> running as different users who are the owners of those virtual hosts.
> Because separate processes are used for each virtual host though, you
> couldn't scale this up to large numbers of virtual hosts as you
> wouldn't be able to handle that many processes. For that to be
> practical, mod_wsgi needs to implement transient daemon processes that
> are only created on demand and which are reaped quite quickly when no
> activity. Such a feature is in mod_wsgi issue list, but will be a
> while before it is done.

Would be very cool, but in the meantime it's also nice to have threads.

>
> The question then is what you mean by 'on machines doing mass virtual hosting'.
>
> Last item in that list is:
>
> """Graham Dumpleton rocks."""
>
> Not sure how true that is. There are certain forums you can go to
> where people reckon I have rocks in my head and that mod_wsgi is evil.
>

I'm sure there is ...

> With respect to you Apache configuration, you have:
>
> # don't handle these with mod_wsgi
> <Location /media>
> SetHandler None
> </Location>
> <Location /robots.txt>
> SetHandler None
> </Location>
> <Location /admin_media>
> SetHandler None
> </Location>
>
> These shouldn't be required for mod_wsgi, they are a hang over from
> mod_python. In mod_wsgi the Alias directives should be enough as they
> will take precedence over WSGIScriptAlias for those URLs.
>

Nice ... Apache config is even cleaner.

> I have to run now, but got a few more things to say later about Python
> egg cache and logging.
>


Thanks so much for this input, Graham. Invaluable for me at least to
get a better understanding of mod_wsgi. Looking forward to more and
will keep updating my post.

--
Milan

Graham Dumpleton

unread,
Jul 30, 2008, 8:51:54 PM7/30/08
to mod...@googlegroups.com
2008/7/29 Milan Andric <man...@gmail.com>:

>> You then say:
>>
>> """Scales better than mod_python on machines doing mass virtual hosting."""
>>
>> Except for mod_python consuming a bit more memory and mod_wsgi having
>> lower base request handling overhead, when looking at the larger
>> picture, I wouldn't expect much difference in the abilities for them
>> to scale if one is talking about single application running in
>> embedded mode. This is because the scaling abilities come more from
>> Apache feature whereby it can dynamically create additional child
>> processes to handle demand.
>
> So embedded mode actually scales better than daemon mode, makes sense.
> But I imagine if my application was thread safe I could setup 5 or so
> daemons with a high threads value so it could scale better?

Because of the Python GIL (global interpreter lock), more threads
doesn't necessarily mean more performance. This is because only one
thread can run at a time through Python code. Thus that process can't
make good use of multiple cores or CPUs. Thus, use of multiple
processes is still better as far as making use of all those processors
available. Thus better to have more processes with less threads, than
single or small number of processes with lots of threads. In embedded
mode with Apache worker MPM you still have control of that anyway,
without resorting to daemon mode, but anecdotal evidence still
suggests perfork MPM, ie., many processes with single thread is
probably better solution.

BTW, multiple threads in daemon mode may not currently be working as
well as it could be. I have been looking at this for a while, but
haven't had time to change code. Problem is that when new request
comes in, all threads in process not doing anything are woken up, but
only one actually gets to handle the request. The act of waking all
threads is unnecessary and may have a very small performance impact on
a raw hello world benchmark, albeit in real application probably makes
no difference. I need to change code to keep a stack of threads doing
nothing and only notify the most recently active thread. This should
lesson issues around need to swap in stack memory for a thread which
hasn't been run for a while. If using Apache worker MPM and running in
embedded mode, it doesn't have this issue because of how Apache
implements its mechanism for handing requests off to threads.

Graham

Milan Andric

unread,
Jul 31, 2008, 11:03:34 AM7/31/08
to mod...@googlegroups.com

So it sounds like I should stick to embed mode. I'm really not using
that many features of daemon mode, like setting the process
user/group. Though I was hoping to take advantage of that in the
future, especially as we add applications/new virtual hosts.
I also liked the ability to reload my application without needing root
privs to restart apache.
Again, not something I need right now but in the future or a
mass-vhost scenario, useful. The main thing I'm worried about is
limiting my application to 5 processes each doing 1 thread and
suddenly getting a spike in traffic. Seems like embed mode is better
for my current scenario but isn't ideal for mass-vhost setup.

Now if I'm running in embed mode, can I add another vhost/WSGI
application to apache? Sorry if I'm talking in circles, still trying
to wrap my head around this. I imagine if you keep adding WSGI
applications to apache in embed mode, eventually all your httpd
processes will contain code for each application. Should probably go
read some more about embed mode. ;)

Thanks again,

--
Milan

Graham Dumpleton

unread,
Aug 2, 2008, 2:40:22 AM8/2/08
to mod...@googlegroups.com
2008/8/1 Milan Andric <man...@gmail.com>:

Your assumption about all process eventually containing code for each
application is correct. Each application would though be in its own
interpreter. Thus, processes can get quite fat and you will have
problems if using third party modules that require that they be run in
first Python interpreter instance created as obviously you can only
delegate one application (or only applications which can cooexist
together) to the first interpreter.

Graham

Reply all
Reply to author
Forward
0 new messages