How will the separate daemon process stuff work?

7 views
Skip to first unread message

Simon Willison

unread,
Apr 14, 2007, 9:26:47 PM4/14/07
to modwsgi
Hi Graham,

You wrote this on my blog:

"""
FWIW, a feature is being added to mod_wsgi that will allow WSGI
applications to be hosted in a separate daemon process so that bloat
caused by these fat applications doesn't affect the main Apache child
processes. Because the whole thing is tailored to WSGI, it is much
simpler to setup and configure as you only need mod_wsgi and don't
need a separate SCGI/FASTCGI supporting framework.
""" - http://simonwillison.net/2007/Apr/14/modwsgi/

Purely out of curiosity, how is this going to work? In particular, how
is the separate process going to be managed? One of the most annoying
things about SCGI / FastCGI is figuring out how to monitor the
external processes and make sure they are restarted in the event of
something going wrong.

Cheers,

Simon

Graham Dumpleton

unread,
Apr 14, 2007, 10:08:09 PM4/14/07
to mod...@googlegroups.com

The feature for use of separate daemons is intended to be really
simple. As such, it doesn't have the complexity of FASTCGI/SCGI and
because of that many will no doubt still perceive it as being no good,
but hopefully it will be useful enough for a large range of cases.

Names of directives etc may yet change, where you might initially have
setup a WSGI application using:

<VirtualHost www.modwsgi.org:80>

WSGIScriptAlias / /some/path/site.wsgi

<Directory /some/path>
Order deny,allow
Allow from all
</Directory>

</VirtualHost>

All you would need to do is change it to:

WSGIDaemonProcess grahamd-1 user=grahamd group=grahamd

<VirtualHost www.modwsgi.org:80>

WSGIProcessName grahamd-1

WSGIScriptAlias / /some/path/site.wsgi

<Directory /some/path>
Order deny,allow
Allow from all
</Directory>

</VirtualHost>

There are two parts to the change, the first is the WSGIDaemonProcess
directive at global scope outside of any VirtualHost containers. This
says to create a daemon process identified as 'grahamd-1' and run it
as user/group of grahamd/grahamd.

Then WSGIProcessName directive is used in the VirtualHost container
and indicates that all mod_wsgi requests to that virtual host should
be processed within the daemon process identified as 'grahamd-1'.

And that is all there is really to it as far as configuration and
setup goes. There is no need to install any separate packages, nor
have any separate supervisor applications to monitor the daemon
processes.

What happens when you add the above configuration is that for each
named daemon process specified using WSGIDaemonProcess, the Apache
main process will fork and create a separate process. After the fork
it will switch to the designated user/group, or if they aren't
specified, change to the user/group that Apache would normally run
child processes as. Ie., as set by User/Group directive in Apache
configuration.

In forking each of these daemon processes, support within the Apache
runtime library is used to monitor them, much as the Apache parent
process monitors the standard Apache child processes. If any of the
daemon processes dies, or is killed of by the user of that daemon
process using a SIGHUP/SIGTERM, then the Apache parent process will
automatically restart the daemon process.

If you know anything about Apache, the mechanism used here is
basically the same as is used by mod_cgid except that you can have
multiple daemon processes where as mod_cgid only uses one.

Now, within each of the daemon processes, it creates a separate UNIX
socket from which requests will be proxied from the real request
handling Apache child processes. The way this is done is again like
how mod_cgid works except that you have distinct UNIX sockets for each
daemon process. Also different to mod_cgid is that in mod_wsgi the
request is handled within the Python interpreter within the daemon
process where as when mod_cgid receives a request it actually execs
the CGI script as a further separate process.

As to what is implemented, you can already play with creating daemon
processes and watching how they get restarted when you kill them. It
is the proxying of the requests over the UNIX socket connection that I
haven't finished. I still also have to implement the thread pool
mechanism for the daemon process so parallel requests can be
processed.

Just to prove the concept, as a first step I am only going to have a
single thread in the child process, but even with that some level of
parallelism can be achieved using mod_rewrite to implement a simple
sort of balancing of requests across multiple daemon processes. For
example:

WSGIDaemonProcess grahamd-1 user=grahamd group=grahamd
WSGIDaemonProcess grahamd-2 user=grahamd group=grahamd
WSGIDaemonProcess grahamd-3 user=grahamd group=grahamd
WSGIDaemonProcess grahamd-4 user=grahamd group=grahamd
WSGIDaemonProcess grahamd-5 user=grahamd group=grahamd

<VirtualHost www.modwsgi.org:80>

RewriteEngine On
RewriteMap servers rnd:/some/path/servers.txt
RewriteRule . - [E=WSGI_DAEMON_PROCESS:${servers:dynamic|grahamd-1}]

WSGIProcessName %{ENV:WSGI_DAEMON_PROCESS}

WSGIScriptAlias / /some/path/site.wsgi

<Directory /some/path>
Order deny,allow
Allow from all
</Directory>

</VirtualHost>

The file '/some/path/servers.txt' would contain:

dynamic grahamd-1|grahamd-2|grahamd-3|grahamd-4|grahamd-5

What is happening in this configuration is that five daemon processes
are created. We then use mod_rewrite to randomly select a particular
daemon process and the request is sent there.

Even when the daemon process use multithreading, you could still
employ this technique if you are concerned about only one process
being available for that application.

Important thing in all of this is to make mod_wsgi as simple as
possible. If some form of simple load balancing is still needed, then
that the daemon process name can be set by a variable means that one
can use mod_rewrite to control it. One could even use mod_python or
mod_perl to implement some more sophisticated system, they just need
to be able to set the variable in the Apache sub process environment
table and mod_wsgi can pick it up as appropriate if the directive
makes reference to it.

Hope you can get your head around this quick explanation. As I said,
it may not fit everyones idea of what is the best way of doing it, but
I have done it in the simplest way that is possible with what Apache
libraries have to offer so that I don't have to write some whole
distinct supervisor system for daemon processes and so it can all be
packaged within the one module making it easy to install.

Hope you find it interesting.

Graham

Reply all
Reply to author
Forward
0 new messages