http://pacopablo.com/irclogs/2007/12/12
I know some of the participants in that discussion monitor this list,
so thought I might respond here to one point brought up the
discussion.
The comment which was made was:
[06:04:10] <__doc__> btw. I don't like that mod_wsgi expects one
particular name in a module to be the application.
[06:04:24] <djc> you can change that using a config directive
[06:04:25] <aat> as opposed to...?
[06:05:00] <__doc__> well, I don't like the whole file idea.
[06:05:23] <__doc__> how about we specify an entry point and package
name (version) in setuptools semantics for apache?
[06:05:52] <__doc__> (as an additional alternative for loading applications)
In part I object to the idea of supporting one particular delivery
solution for WSGI applications over another and thus why I go for
lowest common denominator in requiring a file. Also, a file is used
rather than a configuration option pointing to a module/package name
as the file is required by Apache in order that SCRIPT_NAME can be set
correctly. The reason that mod_python is a pain for WSGI is that it
doesn't use a file as application marker and so SCRIPT_NAME is
generally always wrong and has to be set manually.
Anyway, that said, in Subversion trunk for mod_wsgi and thus in 2.0c4,
is support for a new configuration directive allowing a proxy handler
script/application to be specified. When this is defined, rather than
the file target which the URL maps to being loaded as the WSGI
application script, the handler script is loaded and executed instead.
That handler can then use SCRIPT_FILENAME to work out the true target
of the URL and thus treat that in a special way.
What this would mean is that the target of the URL doesn't actually
need to be a WSGI application script file, but something else which is
interpreted by the handler script in some special way.
One example is that the target file could be a traditional CGI script
implemented in Python. The handler script could then do various magic,
similar to what mod_python.cgihandler does, to allow an unmodified CGI
script to work in mod_wsgi.
Another example is that the target file could be a template file for
one of the many Python templating systems. In other words, one uses
Apache in combination with mod_wsgi handler script as a very cheap URL
mapping system. Because the URL mapping is all done in Apache in C
code, very quick, and you can make use of mod_rewrite and all manner
of such things.
Yet another example is that the target file could be a .ini file for
Pylons. The handler script would then be something like the following.
Note, have ignored thread safety and reloading issues for now.
from paste.deploy import loadapp
_applications = {}
def application(environ, start_response):
if not environ['SCRIPT_FILENAME'] in _applications:
object = loadapp('config:%s' % environ['SCRIPT_FILENAME'])
_applications[environ['SCRIPT_FILENAME']] = object
return object(environ, start_response)
return _applications[environ['SCRIPT_FILENAME']](environ, start_response))
In other words, you could dump Pylons .ini files in a directory and
they would be automatically served up as independent WSGI
applications. Map to the files in the right way in the configuration
and you don't need to use .ini in the URL. For example:
WSGIScriptAliasMatch ^/pylons/(.*) /usr/local/wsgi/pylons/$1.ini
<Directory /usr/local/wsgi/pylons>
Order deny,allow
Allow from all
WSGIHandlerScript /usr/local/wsgi/map_ini_file_to_pylons_app.py
</Directory>
Finally, if there was a convention for how the entry point worked, one
could even perhaps have the target file be a self contained egg file
and the handler script could map to that.
Anyway, the point is that mod_wsgi 2.0 will have extra flexibility to
allow customised mechanisms for what an actual entry point file looks
like or what it represents.
Hope this is of interest. Oh and sorry, the above is the only
documentation on this at the moment. :-)
Graham
Ah, you're watching us! You should come and hang out.
Anyway, sounds like you have some interesting ideas here, thanks for that.
Cheers,
Manuzhai (djc)
Timezones and lack of access to IRC from work means that most of the
time you aren't on when I am able to be on, so just catch up on what
is happening from web based irclogs.
At least I know now who djc is. ;-)
Graham
<snip>
> [Currently, a] file is used rather than a configuration option
> pointing to a module/package name as the file is required by
> Apache in order that SCRIPT_NAME can be set correctly. The
> reason that mod_python is a pain for WSGI is that it doesn't
> use a file as application marker and so SCRIPT_NAME is
> generally always wrong and has to be set manually.
This is exactly what I like about mod_wsgi: it is simple.
> Anyway, that said, in Subversion trunk for mod_wsgi and thus
> in 2.0c4, is support for a new configuration directive
> allowing a proxy handler script/application to be specified.
> When this is defined, rather than the file target which the
> URL maps to being loaded as the WSGI application script, the
> handler script is loaded and executed instead.
> That handler can then use SCRIPT_FILENAME to work out the
> true target of the URL and thus treat that in a special way.
I don't understand the need for this. I would run the handler script as
the WSGI application, pass it the path to the configuration file as a
custom key in the WSGI environ, and let the handler script fiddle with
PATH_INFO, SCRIPT_NAME, et al. before forwarding the request to the WSGI
application. That can all be done using pure WSGI, without adding
dependencies on non-WSGI stuff like SCRIPT_FILENAME. I think all that is
needed is clear documentation about how to add WSGI environ parameters
from the Apache config file and from .htaccess.
> One example is that the target file could be a traditional
> CGI script implemented in Python. The handler script could
> then do various magic, similar to what mod_python.cgihandler
> does, to allow an unmodified CGI script to work in mod_wsgi.
If the developer/deployer doesn't/can't modify their CGI scripts to be
WSGI-compliant, they probably aren't going to bother with mod_wsgi
either. A lot of CGI scripts now are being written using WSGI with a CGI
wrapper, and I think that is the practice that should be encouraged.
> Another example is that the target file could be a template
> file for one of the many Python templating systems. In other
> words, one uses Apache in combination with mod_wsgi handler
> script as a very cheap URL mapping system. Because the URL
> mapping is all done in Apache in C code, very quick, and you
> can make use of mod_rewrite and all manner of such things.
The dispatching often code does more than just pick a template, so this
will usually not be possible. Generally, we would configure Apache to
serve static resources directly, so whatever dispatching is done in
Python is going to lead to dynamically generated content, and the time
to generate the content will dominate the dispatching time.
I would rather see a good wsgi.file_wrapper implementation so that I can
do all my dispatching in my WSGI application, and still get good
performance when serving static resources.
> In other words, you could dump Pylons .ini files in a
> directory and they would be automatically served up as
> independent WSGI applications. Map to the files in the right
> way in the configuration and you don't need to use .ini in
> the URL. For example:
>
> WSGIScriptAliasMatch ^/pylons/(.*) /usr/local/wsgi/pylons/$1.ini
>
> <Directory /usr/local/wsgi/pylons>
> Order deny,allow
> Allow from all
> WSGIHandlerScript /usr/local/wsgi/map_ini_file_to_pylons_app.py
> </Directory>
It is likely that each Pylons application will need its own unique set
of configuration directives anyway, to integrate with Apache's
authentication, authorization, caching, etc.
> Finally, if there was a convention for how the entry point
> worked, one could even perhaps have the target file be a self
> contained egg file and the handler script could map to that.
That would be a nice feature, but I think it can (and should) be
implemented in Python by a (WSGI-compliant) framework instead.
> Anyway, the point is that mod_wsgi 2.0 will have extra
> flexibility to allow customised mechanisms for what an actual
> entry point file looks like or what it represents.
I first read about mod_wsgi by coming across your "Commodity shared
hosting and mod_wsgi" and related articles/comments/emails. Like you
pointed out, the main concerns that people have about deploying mod_wsgi
are about WSGI compliance, security, ease-of-installation, and resource
management (particularly process management), and ease-of-installation.
Ease-of-installation is mostly a matter of convincing people who provide
Apache to also provide mod_wsgi, so that it is pre-installed for most
people.
I think that this new feature is counter-productive to ensuring that
mod_wsgi is secure, because it adds more code to mod_wsgi, and it adds
to confusion regarding configuration. Further, in your Pylons INI
example, mod_wsgi has to inspect the directory containing the INI files,
whereas before it didn't even have to be able to see it. The example
configuration is thus less secure than the "traditional" mod_wsgi
configuration.
Contrast this with the new(-ish) authentication integration feature,
which adds code to mod_wsgi, but which also reduces the amount of
configuration that needs to be done for applications; applications can
now fully delegate authentication to Apache instead of spreading
authentication between Apache and the Python application/framework.
In summary, my personal opinion is that it is better to improve process
management and memory management, and try to delegate other features to
frameworks, whenever possible. Especially, if there is already a way of
doing something then there is no need to provide an alternative way
purely for convenience; that is the specialty of the framework
developers.
Regards,
Brian ($0.02)
I agree 100% with Brian. The appeal of mod_wsgi is that it's simple.
Adding features that can just as easily be done in Python just adds more
complexity, more configuration options, more potential for problems.
Of course, this is a common problem: as the user base grows users make
more and more feature requests.
Alec (aat on IRC)
--
Evolution: Taking care of those too stupid to take care of themselves.
To do that means though that your WSGI application has to perform the
task of mapping URLs to files in the file system. In other words,
Apache already has this URL mapping system including access control
mechanisms and you are effectively saying I don't care about that and
instead are doing it yourself in a slower language.
I am looking here at the bigger picture of Apache as a web application
framework, whereas anyone using WSGI these days sees WSGI as all that
matters and does everything in Python, thereby loosing out on
practically all that Apache has to offer, including that it does some
things a lot faster than Python ever will.
In practice I expect this feature not to be used by just about
everyone. But in a few situations it can be a quick and simple way of
achieving things which would take more effort if using Python/WSGI.
In time I will add other features will allow closer integration with
Apache. For example, allow access from Python to Apache auth provider
mechanism. So for example a web application could make use of an
Apache module which provides authentication facilities from a database
or LDAP, without everything that that entailed being having to be
reproduced in Python code as is currently the case.
That these sort of features will exist doesn't mean you have to use
them. I fully expect to see exactly what happens with mod_python. 98%
of people merely use it as a jump off point for a pure Python web
application unrelated to Apache. Those who actually use it as a
platform for doing stuff with Apache are few and far between.
> > One example is that the target file could be a traditional
> > CGI script implemented in Python. The handler script could
> > then do various magic, similar to what mod_python.cgihandler
> > does, to allow an unmodified CGI script to work in mod_wsgi.
>
> If the developer/deployer doesn't/can't modify their CGI scripts to be
> WSGI-compliant, they probably aren't going to bother with mod_wsgi
> either. A lot of CGI scripts now are being written using WSGI with a CGI
> wrapper, and I think that is the practice that should be encouraged.
Although I agree that pushing people to convert code from plain old
CGI to WSGI and then if need be using a CGI-WSGI bridge, I think you
might be surprised at how many would settle for a quick way of making
existing CGI scripts work faster.
There was a bit of a discussion a while back on comp.lang.python about
this an there were a few people interested in being able to run
unaltered CGI scripts under mod_wsgi.
> > Another example is that the target file could be a template
> > file for one of the many Python templating systems. In other
> > words, one uses Apache in combination with mod_wsgi handler
> > script as a very cheap URL mapping system. Because the URL
> > mapping is all done in Apache in C code, very quick, and you
> > can make use of mod_rewrite and all manner of such things.
>
> The dispatching often code does more than just pick a template, so this
> will usually not be possible. Generally, we would configure Apache to
> serve static resources directly, so whatever dispatching is done in
> Python is going to lead to dynamically generated content, and the time
> to generate the content will dominate the dispatching time.
Without talking specifics and providing examples, it may be hard to
get an understanding of what I am thinking.
One example of what I could see is taking Cheetah template files and
using Cheetah's ability to compile the templates to Python code. Just
dumping these compiled code files into a directory isn't of much use
as there is no URL mapping system provided nor are they WSGI
applications in their own right. This is where a handler script would
be a really quick way of allowing those compiled templates to be
served through Apache as Apache/mod_wsgi would perform the URL
mapping. All the handler script does is bridge WSGI to the particular
compiled template file targeted by Apache URL mapping mechanism.
> I would rather see a good wsgi.file_wrapper implementation so that I can
> do all my dispatching in my WSGI application, and still get good
> performance when serving static resources.
I would in part agree and I have looked at it. Part of the reason for
not doing much about it at this point is that most of the major
frameworks don't even use this optional part of WSGI. Further, it
isn't necessarily the best way of doing it anyway, at least within the
context of Apache.
Part of the problem with wsgi.file_wrapper is that it deals with the
data only and doesn't really address issues around HTTP response
headers. This means that user code still needs to set if necessary,
headers which reflect content type, encoding and which affecting
caching etc. Most people wouldn't know what to do for stuff like
caching and wouldn't bother.
At least for case of choosing actual static files, I feel that the
better way would be if the CGI notion of returning a Location response
header for a 200 status were implemented. In other words the WSGI
application can say that it wants static file under a different URL to
be served up, ie., internal redirect. Because this is handled as an
internal redirection then Apache would serve up automatically all the
appropriate response headers. At the point of serving up the file it
would also not involve Python code to deliver parts of the file and so
would be quicker. There is an old ticket for this one:
http://code.google.com/p/modwsgi/issues/detail?id=14
Anyway, back to wsgi.file_wrapper, to get the most out of Apache this
needs to work directly in conjunction with the Apache output bucketing
system. To do this needs a bit of thought and since there didn't seem
to be much that used it anyway, I put it all on hold. There is an old
ticket for it also:
http://code.google.com/p/modwsgi/issues/detail?id=5
> > In other words, you could dump Pylons .ini files in a
> > directory and they would be automatically served up as
> > independent WSGI applications. Map to the files in the right
> > way in the configuration and you don't need to use .ini in
> > the URL. For example:
> >
> > WSGIScriptAliasMatch ^/pylons/(.*) /usr/local/wsgi/pylons/$1.ini
> >
> > <Directory /usr/local/wsgi/pylons>
> > Order deny,allow
> > Allow from all
> > WSGIHandlerScript /usr/local/wsgi/map_ini_file_to_pylons_app.py
> > </Directory>
>
> It is likely that each Pylons application will need its own unique set
> of configuration directives anyway, to integrate with Apache's
> authentication, authorization, caching, etc.
Not necessarily. More often that not from what I have seen, the Pylons
folks are purists. Even if you can get them to use Apache/mod_wsgi,
which many religiously wouldn't even consider and will even tell you
it is evil despite not having used it, then they are more likely to
have a self contained WSGI application which does all its
authentication internally. Thus, all that is necessary is to tell
mod_wsgi to pass HTTP authentication information to it and Pylons will
be happy. All their other information generally comes from the .ini
file also and not from WSGI environment variables pushed to the
application from the Apache configuration.
That said, using the handler script in this way for Pylons application
doesn't preclude individual applications having different Apache
configuration. This is because the Location/Directory/Files directive
containers in Apache configuration can still be used to limit
configuration to apply to specific applications.
> > Finally, if there was a convention for how the entry point
> > worked, one could even perhaps have the target file be a self
> > contained egg file and the handler script could map to that.
>
> That would be a nice feature, but I think it can (and should) be
> implemented in Python by a (WSGI-compliant) framework instead.
And well you could, but this is exactly what the people in some
discussions say they didn't want to do. Ie., their ideal for ISP based
WSGI hosting was to simply be able to drop an egg in an appropriate
spot and it would be automatically served. They didn't want to have to
themselves add a wrapper WSGI application around it to handle mapping
to the egg. This is where the handler script proxy may be useful as it
could be used by an ISP to implement a mechanism dictating how they
allow users to setup their WSGI applications. One approach for doing
this may be by mapping to an egg file.
A handler script might also be useful to an ISP as an invisible
wrapper around users applications. For example, they might use the
handler script to introduce a WSGI middleware application for do doing
interactive debugging, or mailing out of exception raises by
applications, where whether it is enabled is dependent on same flag
set through the users control panel. Thus an ISP could offer value add
features such as interactive debugging, or error mailings etc, thereby
meaning the user doesn't have to go find and install such things
themselves an specifically set them up as wrappers around their WSGI
application.
> I think that this new feature is counter-productive to ensuring that
> mod_wsgi is secure, because it adds more code to mod_wsgi, and it adds
> to confusion regarding configuration. Further, in your Pylons INI
> example, mod_wsgi has to inspect the directory containing the INI files,
> whereas before it didn't even have to be able to see it. The example
> configuration is thus less secure than the "traditional" mod_wsgi
> configuration.
It is no less secure. First off the handler script can only be set in
main Apache configuration so only available to people who own the web
server. In a web hosting environment normal users wouldn't have access
to it. Secondly, the handler script is executed in the exact same
context as the normal target WSGI application, and thus only has what
access rights the WSGI application had. So the handler script only has
ability to inspect INI files and the directory they are in because the
original WSGI application did. There isn't anything dangerous about
it.
> In summary, my personal opinion is that it is better to improve process
> management and memory management, and try to delegate other features to
> frameworks, whenever possible. Especially, if there is already a way of
> doing something then there is no need to provide an alternative way
> purely for convenience; that is the specialty of the framework
> developers.
Again, I am looking at bigger fish than just WSGI applications. I am
looking at those who want to use Apache as a development platform and
also at ISPs who need hooks such as this to allow them to better
manage and/or control things or which allows them to provide value add
features. It doesn't mean I am ignoring things like memory management
and in some respects having this handler script functionality may be
an important part of implementing memory constraints.
This is because implementing memory usage checks in mod_wsgi C code
core is actually a pain as APIs for such features aren't portable and
different ways of doing it are required on different platforms. Thus,
it may be more effective for the mod_wsgi C code core to not do it,
but instead an ISP can use a handler script as wrapper around a users
application as a means of triggering code, including perhaps running a
monitoring thread, which looks at memory in use and which forces
daemon process restarts when memory usage goes over certain levels.
Over time when I start to document the features in 2.04, it will
hopefully become more obvious what I have in mind. It is just one
little part of an overall system for making things possible. Without
that context it might not be obvious why it is needed. It may also be
a while before it comes clear though as there are other software
components, which would be separate to mod_wsgi itself, which are a
part of the puzzle that need to be done. This, and some of the other
bits added in 2.0c4 are also necessary precursors to allow ISPs to be
able to easily configure aspects of mod_wsgi which will only appear in
version 3.0, such as the support for transient daemon processes.
Trust me he says, I know what I am doing, there is a grand plan and it
is good. ;-)
Graham
Congratulations on the birth of your child!
Graham wrote:
> On 14/12/2007, Brian Smith <br...@briansmith.org> wrote:
> > I don't understand the need for this. I would run the
> > handler script as the WSGI application, pass it the path
> > to the configuration file as a custom key in the WSGI
> > environ, and let the handler script fiddle with PATH_INFO,
> > SCRIPT_NAME, et al. before forwarding the request to the
> > WSGI application. That can all be done using pure WSGI,
> > without adding dependencies on non-WSGI stuff like
> > SCRIPT_FILENAME. I think all that is needed is clear
> > documentation about how to add WSGI environ parameters
> > from the Apache config file and from .htaccess.
>
> To do that means though that your WSGI application has to
> perform the task of mapping URLs to files in the file system.
> In other words, Apache already has this URL mapping system
> including access control mechanisms and you are effectively
> saying I don't care about that and instead are doing it
> yourself in a slower language.
Yes, you understand me correctly.
> I am looking here at the bigger picture of Apache as a web
> application framework, whereas anyone using WSGI these days
> sees WSGI as all that matters and does everything in Python,
> thereby loosing out on practically all that Apache has to
> offer, including that it does some things a lot faster than
> Python ever will.
<snip>
> That these sort of features will exist doesn't mean you have
> to use them. I fully expect to see exactly what happens with
> mod_python. 98% of people merely use it as a jump off point
> for a pure Python web application unrelated to Apache. Those
> who actually use it as a platform for doing stuff with Apache
> are few and far between.
I agree. That is why I think that this new functionality and the
embedded execution mode, should be factored out into a separate module.
That would make it really easy for a shared hosting provider to provide
a "98%" solution that they can feel confident about the security and
manageability of.
> Although I agree that pushing people to convert code from
> plain old CGI to WSGI and then if need be using a CGI-WSGI
> bridge, I think you might be surprised at how many would
> settle for a quick way of making existing CGI scripts work faster.
I don't doubt that. I'm just don't think it is a good idea to complicate
the core of mod_wsgi to support that situation.
> > I would rather see a good wsgi.file_wrapper implementation
> > so that I can do all my dispatching in my WSGI application,
> > and still get good performance when serving static resources.
>
> I would in part agree and I have looked at it. Part of the
> reason for not doing much about it at this point is that most
> of the major frameworks don't even use this optional part of
> WSGI. Further, it isn't necessarily the best way of doing it
> anyway, at least within the context of Apache.
>
> Part of the problem with wsgi.file_wrapper is that it deals
> with the data only and doesn't really address issues around
> HTTP response headers. This means that user code still needs
> to set if necessary, headers which reflect content type,
> encoding and which affecting caching etc. Most people
> wouldn't know what to do for stuff like caching and wouldn't bother.
I guess I am not typical then; I have written some WSGI apps (no
framework) that could make use of wsgi.file_wrapper, and which handle
all the headers that you mention, sometimes better than Apache could.
Unfortunately, wsgi.file_wrapper is usually not available in any setting
so that has been a wasted effort.
> At least for case of choosing actual static files, I feel
> that the better way would be if the CGI notion of returning a
> Location response header for a 200 status were implemented.
> In other words the WSGI application can say that it wants
> static file under a different URL to be served up, ie.,
> internal redirect. Because this is handled as an internal
> redirection then Apache would serve up automatically all the
> appropriate response headers. At the point of serving up the
> file it would also not involve Python code to deliver parts
> of the file and so would be quicker.
I agree that would be useful, but it would be better if it was in the
WSGI (2.0) specification, so that WSGI applications could rely on it.
> > > Finally, if there was a convention for how the entry
> > > point worked, one could even perhaps have the target
> > > file be a self contained egg file and the handler
> > > script could map to that.
> >
> > That would be a nice feature, but I think it can (and should) be
> > implemented in Python by a (WSGI-compliant) framework instead.
>
> And well you could, but this is exactly what the people in
> some discussions say they didn't want to do. Ie., their ideal
> for ISP based WSGI hosting was to simply be able to drop an
> egg in an appropriate spot and it would be automatically
> served.
I do agree that, if an application (say Trac) has been installed via the
operating system's package manager or by easy_install or whatever, then
we shouldn't need a separate script file for it. But, to be useful for
my pre-packaged applications, I need the mechanism to work when the user
can only edit .htaccess.
> A handler script might also be useful to an ISP as an
> invisible wrapper around users applications. For example,
> they might use the handler script to introduce a WSGI
> middleware application for do doing interactive debugging, or
> mailing out of exception raises by applications, where
> whether it is enabled is dependent on same flag set through
> the users control panel.
I agree, but I think that this is a little bit like putting the cart
before the horse. Most hosting providers that I have seen want to
provide as little as they can get away with, with as little effort as
possible, with as much reliability as possible, and have it be as simple
to understand as possible. For example, it would be doubtful that
DreamHost or 1&1 would be quick to provide these value-added features,
from my experience. (More on this below.)
> > I think that this new feature is counter-productive to
> > ensuring that mod_wsgi is secure, because it adds more code
> > to mod_wsgi, and it adds to confusion regarding configuration.
> > Further, in your Pylons INI example, mod_wsgi has to inspect
> > the directory containing the INI files, whereas before it
> > didn't even have to be able to see it. The example
> > configuration is thus less secure than the "traditional"
> > mod_wsgi configuration.
>
> It is no less secure. First off the handler script can only
> be set in main Apache configuration so only available to
> people who own the web server. In a web hosting environment
> normal users wouldn't have access to it. Secondly, the
> handler script is executed in the exact same context as the
> normal target WSGI application, and thus only has what access
> rights the WSGI application had. So the handler script only
> has ability to inspect INI files and the directory they are
> in because the original WSGI application did. There isn't
> anything dangerous about it.
Okay, the access rights of the handler script were not clear to me.
> Over time when I start to document the features in 2.04, it
> will hopefully become more obvious what I have in mind. It is
> just one little part of an overall system for making things
> possible. Without that context it might not be obvious why it
> is needed. It may also be a while before it comes clear
> though as there are other software components, which would be
> separate to mod_wsgi itself, which are a part of the puzzle
> that need to be done. This, and some of the other bits added
> in 2.0c4 are also necessary precursors to allow ISPs to be
> able to easily configure aspects of mod_wsgi which will only
> appear in version 3.0, such as the support for transient
> daemon processes.
>
> Trust me he says, I know what I am doing, there is a grand
> plan and it is good. ;-)
Okay, I will trust you. My view has been that shared hosting providers
are only going to start providing mod_wsgi if it provides a
*significant* advantage over FastCGI in managability, and only if it
provides no obvious drawbacks. My suggestions about keeping mod_wsgi
simple are aimed at the "no obvious drawbacks" part. However, if these
new features are prerequisites to ISP-acceptable resource management,
then it is hard to object to them. If you've already talked to the
hosting providers about what they want mod_wsgi to do, then you will
have a much better idea than me.
By the way, it would be useful to hear your thoughts on mod_wsgi daemon
mode vs. FastCGI regarding performance and manageability. Especially,
mod_wsgi (.htaccess/SetHandler) vs. common FastCGI configurations used
by shared hosting providers.
- Brian
Getting information out of large scale commodity web hosting companies
seems to be really hard. At least the ones I talked to in the end
didn't want to provide too much information about how they actually
configure their systems. This makes it really hard to work how one
should implement mod_wsgi so as to make it easy to fit into their
existing systems. :-(
> By the way, it would be useful to hear your thoughts on mod_wsgi daemon
> mode vs. FastCGI regarding performance and manageability.
For specific web applications, performance difference possibly not
that much difference as bottleneck generally never in mod_wsgi/fastcgi
but in the web application or database. That said, in my unscientific
tests, mod_wsgi demon sits on relative scale of 500 vs 300-250 for
mod_fastcgi/flup. Higher value is better.
The goal is take make mod_wsgi as easy as possible to manage. I don't
know how easy web hosting companies feel that fastcgi is to setup, but
you see enough problems with users actually getting it to work.
> Especially,
> mod_wsgi (.htaccess/SetHandler) vs. common FastCGI configurations used
> by shared hosting providers.
This is the problem area as far as commodity web hosting goes. This is
because they will not want to provide FileInfo override for users and
so user will not have access to AddHandler/SetHandler in a .htaccess
file. Often they will not even have use of a .htaccess file.
This means the best one would get is an AddHandler in main Apache
configuration so as to allow one to have .wsgi script files which map
to an WSGI application.
Many don't like this as it means that it is hard to to have an
application which appears to be mounted on the root of the site. Ie.,
the URL would need to be:
/dango.wsgi/
rather than just:
/
You see a lot of mod_rewrite hacks being used to make it appear as the
root URL. But then, access to the ability to use rewrite hacks in
.htaccess also implies that certain overrides have been enabled for
user, again something that typically wouldn't be done in the commodity
web hosting setups.
If MultiViews is enabled and set up properly, then URL can at last
drop the .wsgi extension and could use:
/django/
Other thing that can be done is for DirectoryIndex in main Apache
configuration for the directory to list index.wsgi. This will mean
that:
/
can be used, but can't remember if that will allow for path info
beyond the URL prefix which maps to the directory. Even if it does,
the SCRIPT_NAME comes through including index.wsgi so WSGI application
would need to use a wrapper to rewrite it to drop it out so that it
then doesn't start appearing in redirect URLs generated by the
application. I need to go test this again to refresh my memory.
Anyway, after that ramble, I would really like to hear opinions on
what peoples expectations are as far as what they should be able to do
in a commodity web hosting environment.That gives me a focus point as
to what needs to be made simple to do.
So, what sort of configuration do you expect you would want to do?
What do you think is and isn't possible with mod_wsgi now? How many
daemon process groups do you believe you want to have and do you
expect to be able to easily control to what daemon processes and
application groups the applications are delegated to run in?
Graham
The top 4 problems with FastCGI. In order:
1. Most users don't understand how FastCGI works well enough to diagnose
problems.
2. Most hosting providers don't offer it, or don't offer it in
conjunction with Python.
3. Hosting providers tend to "kill -9" FastCGI processes; that means
your program has to handle SIGKILL the same way that they should handle
SIGUSR1, instead terminating immediately.
4. I've heard that sometimes changes to the application don't take
effect immediately. But, I think that 99% of the time, this is an
instance of problem #1.
I think mod_wsgi is likely to suffer from the same problems.
> This is the problem area as far as commodity web hosting
> goes. This is because they will not want to provide FileInfo
> override for users and so user will not have access to
> AddHandler/SetHandler in a .htaccess file. Often they will
> not even have use of a .htaccess file.
I'm pretty sure that any hosting provider that provides FastCGI will
also have AllowOverride FileInfo Indexes, as well as mod_rewrite. At
least, this is true for all the providers I have seen.
> Many don't like this as it means that it is hard to to have
> an application which appears to be mounted on the root of the
> site. Ie., the URL would need to be:
>
> /dango.wsgi/
>
> rather than just:
>
> /
>
> You see a lot of mod_rewrite hacks being used to make it
> appear as the root URL.
PHP has this same problem and people seem to get the mod_rewrite hack to
work with WordPress without too much difficulty.
> Anyway, after that ramble, I would really like to hear
> opinions on what peoples expectations are as far as what they
> should be able to do in a commodity web hosting
> environment.That gives me a focus point as to what needs to
> be made simple to do.
I don't care about performance too much as long as it is not slower than
FastCGI. But, I do care about proper signal handling to allow for
graceful process shutdowns (finish the current request and exit), so
that hosting providers can avoid "kill -9" processes most of the time. I
would never use the embedded mode, only the daemon mode.
> So, what sort of configuration do you expect you would want to do?
Like I mentioned on Web-SIG, it would be nice to be able to control the
Python logging module from mod_wsgi and send the python logging output
to the appropriate Apache log files.
> What do you think is and isn't possible with mod_wsgi now?
For the users I am targetting, the main problem with mod_wsgi is that it
isn't available where they want to deploy their applications. For my own
use, I would like to completely disable the embedded mode for security
reasons.
- Brian
So, good documentation. I think I can say that mod_wsgi is already
heading in the right direction there. The documentation available for
FASTCGI solutions is really quite crappy.
> 2. Most hosting providers don't offer it, or don't offer it in
> conjunction with Python.
> 3. Hosting providers tend to "kill -9" FastCGI processes; that means
> your program has to handle SIGKILL the same way that they should handle
> SIGUSR1, instead terminating immediately.
In mod_wsgi there is already some support for graceful shutdown of
daemon processes. When a daemon process receives a SIGINT or SIGTERM a
few things happen. The first is that it will stop receiving new
requests. The second is that the main thread will wait for active
requests to complete before doing an actual shutdown of the process.
Third, a separate monitor thread, realising that shutdown has been
triggered will let a grace period pass and if the process hasn't
shutdown at the end of that grace period will SIGKILL its own daemon
process to force it to die.
If there were no active requests or they are completed within the
grace period, then mod_wsgi will go through and perform all the
appropriate actions on all the sub interpreters created to kill off
non daemon threads and invoke callback functions registered with the
Python atexit module. FInally it will destroy the sub interpreters and
the Python core.
So, it tries very hard to cleanup up everything properly, but if the
process hangs or takes too long to shut down it will as needs be kill
it off.
> 4. I've heard that sometimes changes to the application don't take
> effect immediately. But, I think that 99% of the time, this is an
> instance of problem #1.
The mod_wsgi process reload mechanism for daemon processes should
hopefully be more reliable than what I have seen as to how FASTCGI
solutions are implemented.
The way mod_wsgi works is that in process reloading mode when it sends
the initial headers across to the daemon process, the daemon process
will send back an indication that a restart of that daemon process is
required or not. If it is, the daemon process itself will SIGINT
itself thus triggering the above shutdown process and consequent
failsafe of sending a SIGKILL to itself if need be. The Apache parent
process will automatically note that the process has died and start a
new one in its place.
From memory in FASTCGI solutions the killing off of the processes is
triggered from outside of the process using a signal and the process
is not an active part in making the decision or making sure the
process shuts down.
BTW, in mod_wsgi it ignores signal handler registrations from Python
code to stop an application from interfering with process shutdown,
which is where some problems probably come from with FASTCGI
solutions.
> I think mod_wsgi is likely to suffer from the same problems.
Getting web hosters to use mod_wsgi will be an issue. I believe that 3
and 4 are already catered for at least to a degree, with possible room
for further improvement if need be. As to 1, it is just a matter of
good documentation.
> > This is the problem area as far as commodity web hosting
> > goes. This is because they will not want to provide FileInfo
> > override for users and so user will not have access to
> > AddHandler/SetHandler in a .htaccess file. Often they will
> > not even have use of a .htaccess file.
>
> I'm pretty sure that any hosting provider that provides FastCGI will
> also have AllowOverride FileInfo Indexes, as well as mod_rewrite. At
> least, this is true for all the providers I have seen.
Hmmm, the ones I talked to must be more paranoid. They only gave
FileInfo to trusted customers, not your average user.
> > Many don't like this as it means that it is hard to to have
> > an application which appears to be mounted on the root of the
> > site. Ie., the URL would need to be:
> >
> > /dango.wsgi/
> >
> > rather than just:
> >
> > /
> >
> > You see a lot of mod_rewrite hacks being used to make it
> > appear as the root URL.
>
> PHP has this same problem and people seem to get the mod_rewrite hack to
> work with WordPress without too much difficulty.
>
> > Anyway, after that ramble, I would really like to hear
> > opinions on what peoples expectations are as far as what they
> > should be able to do in a commodity web hosting
> > environment.That gives me a focus point as to what needs to
> > be made simple to do.
>
> I don't care about performance too much as long as it is not slower than
> FastCGI.
In my simple tests, at least with Apache/mod_fastcgi/flup, mod_wsgi
doesn't have a problem as far as performance.
> But, I do care about proper signal handling to allow for
> graceful process shutdowns (finish the current request and exit), so
> that hosting providers can avoid "kill -9" processes most of the time. I
> would never use the embedded mode, only the daemon mode.
And as explained, hopefully orderly shutdown is already catered for.
There is still a risk that a daemon process may get a SIGKILL. This
should only arise in two situations though. The first is where Python
atexit registered functions, or Python object destructors hang or do
things that take longer than the grace period.
The second is where the daemon process is used for handling requests
that can take a long time to run. This is a bit of a problem area
because mod_wsgi daemon mode expects the existing process to have
exited before it starts a new one. Thus the grace period (defaults to
5 seconds), can't be too long.
Part of the problem with FASTCGI solutions from memory is that it will
start up new processes without waiting for the old and it can then
loose track of the old process and it hangs around for ever. I wanted
to avoid that problem, but didn't want to complicate the
implementation.
Anyway, end result is that a long running request may be killed off
when process needs to be restarted. I might be able to revisit this at
some point in the future and improve it, but may be tricky to get
right because of how I use Apache parent process to handle monitoring
of processes. Also, the current behaviour is probably what an ISP
would prefer even if the user may not see it as ideal for the
particular case of long running requests.
> > So, what sort of configuration do you expect you would want to do?
>
> Like I mentioned on Web-SIG, it would be nice to be able to control the
> Python logging module from mod_wsgi and send the python logging output
> to the appropriate Apache log files.
Only thing missing to be able to do that is a 'mod_wsgi.log_error()'
function. This would be easy to add.
The actual logging module handler that maps to that can be written
separately. Although it in time could be in separate companion package
for mod_wsgi I intend to make available, nothing to stop someone
writing it them self for now.
As to ensuring the handler is configured before anything else gets
loaded, the WSGIImportScript directive in 2.0c4 can be used to do
that. This directive allows one to do stuff at process start for
specific application group, whether that be in embedded mode process
or daemon mode process.
> > What do you think is and isn't possible with mod_wsgi now?
>
> For the users I am targetting, the main problem with mod_wsgi is that it
> isn't available where they want to deploy their applications. For my own
> use, I would like to completely disable the embedded mode for security
> reasons.
Embedded mode can effectively be disabled at runtime by defining at
global scope in Apache configuration:
WSGIProcessGroup <undefined>
The value of the argument doesn't matter as long as it isn't the same
as any daemon process group.
With that in place, unless WSGIProcessGroup is defined explicitly in
an appropriate context, you will always get a 500 error response and
no way that WSGI application can be run in embedded process.
Yes I realise you probably want the whole ability to use embedded mode
compiled out of the code. I still need to look at that. :-)
Graham
I agree that mod_wsgi has a lot of documentation. In fact, in some ways
it might have too much documentation--people don't like to read any more
than necessary.
> > 2. Most hosting providers don't offer it, or don't offer it in
> > conjunction with Python.
> > 3. Hosting providers tend to "kill -9" FastCGI processes;
> > that means your program has to handle SIGKILL the same way
> > that they should handle SIGUSR1, instead terminating immediately.
>
> In mod_wsgi there is already some support for graceful
> shutdown of daemon processes.
<snip>
> So, it tries very hard to cleanup up everything properly, but
> if the process hangs or takes too long to shut down it will
> as needs be kill it off.
> > 4. I've heard that sometimes changes to the application don't take
> > effect immediately. But, I think that 99% of the time, this is an
> > instance of problem #1.
>
<snip>
> From memory in FASTCGI solutions the killing off of the
> processes is triggered from outside of the process using a
> signal and the process is not an active part in making the
> decision or making sure the process shuts down.
Many hosting providers have a cron job that runs very frequently,
killing off any customer process that has been running for "too long".
At 1&1, "too long" is six seconds (they don't have FastCGI). At
DreamHost, they kill all your processes like this, but they make an
exception for any process named exactly "dispatch.fcgi"--in that case,
they still send the "kill -9", but less frequently than for other
processes. So, one requirement for mod_wsgi is that all deamon processes
must be obviously recognizable as mod_wsgi processes. And, another
requirement is documentation and code to give the hosting providers
showing them how to provide a special case for mod_wsgi: they should
kill those processes only when necessary, and they should try sending a
SigInt first.
> > I'm pretty sure that any hosting provider that provides
> > FastCGI will also have AllowOverride FileInfo Indexes, as
> > well as mod_rewrite. At least, this is true for all the
> > providers I have seen.
>
> Hmmm, the ones I talked to must be more paranoid. They only
> gave FileInfo to trusted customers, not your average user.
I researched this quite a bit and I couldn't find anybody that offered
FastCGI but not mod_rewrite or "AllowOverride FileInfo Indexes". If you
can, please send me the names of the ones you found (on or off list).
> In my simple tests, at least with Apache/mod_fastcgi/flup,
> mod_wsgi doesn't have a problem as far as performance.
Users of shared hosting providers (especially Dreamhost) have reported
that flup's FastCGI handler (sometimes) takes too long to start up, and
Dreamhost's process monitor (sometimes) kills it, so many people are
using the original, simpler fcgi.py instead.
> The second is where the daemon process is used for handling
> requests that can take a long time to run. This is a bit of a
> problem area because mod_wsgi daemon mode expects the
> existing process to have exited before it starts a new one.
> Thus the grace period (defaults to
> > Like I mentioned on Web-SIG, it would be nice to be able to control
> > the Python logging module from mod_wsgi and send the python logging
> > output to the appropriate Apache log files.
>
> Only thing missing to be able to do that is a 'mod_wsgi.log_error()'
> function. This would be easy to add.
>
> The actual logging module handler that maps to that can be
> written separately. Although it in time could be in separate
> companion package for mod_wsgi I intend to make available,
> nothing to stop someone writing it them self for now.
I agree.
> As to ensuring the handler is configured before anything else
> gets loaded, the WSGIImportScript directive in 2.0c4 can be
> used to do that. This directive allows one to do stuff at
> process start for specific application group, whether that be
> in embedded mode process or daemon mode process.
I think that a regular piece of middleware should be able to do this, so
I'm not sure that WSGIImportScript is needed for it.
- Brian
Yes, accept that I need a simpler entry point for getting going with
mod_wsgi. :-)
> > From memory in FASTCGI solutions the killing off of the
> > processes is triggered from outside of the process using a
> > signal and the process is not an active part in making the
> > decision or making sure the process shuts down.
>
> Many hosting providers have a cron job that runs very frequently,
> killing off any customer process that has been running for "too long".
> At 1&1, "too long" is six seconds (they don't have FastCGI). At
> DreamHost, they kill all your processes like this, but they make an
> exception for any process named exactly "dispatch.fcgi"--in that case,
> they still send the "kill -9", but less frequently than for other
> processes. So, one requirement for mod_wsgi is that all deamon processes
> must be obviously recognizable as mod_wsgi processes. And, another
> requirement is documentation and code to give the hosting providers
> showing them how to provide a special case for mod_wsgi: they should
> kill those processes only when necessary, and they should try sending a
> SigInt first.
In mod_wsgi 3.0 the intention is to add support for true transient
daemon processes. That is, only startup when required and will
disappear when they have been inactive for a set period. This in
combination with setting maximum requests before restarting should
provide a reasonable means of ISPs ensuring that processes not running
when needing to and reclaiming memory occasionally if processes creep
in memory usage.
The way if which the daemon processes would be used with ISPs would
mean that httpd process would run as the user and so just need to look
for those if they really wanted to kill them, or exclude them from
killing.
> > As to ensuring the handler is configured before anything else
> > gets loaded, the WSGIImportScript directive in 2.0c4 can be
> > used to do that. This directive allows one to do stuff at
> > process start for specific application group, whether that be
> > in embedded mode process or daemon mode process.
>
> I think that a regular piece of middleware should be able to do this, so
> I'm not sure that WSGIImportScript is needed for it.
Certainly if you don't have access to main Apache configuration file
using middleware is the only way anyway. As long as process reload
mechanism is enabled for daemon processes, then using middleware
shouldn't be a problem.
Some people wanted WSGIImportScript as they didn't like the idea that
the WSGI application would otherwise only be loaded when first request
arrives for it. This directive allows them to ensure it is preloaded
immediately at process start.
Graham
I forgot the most important configuration option needed for shared
hosting environments: the ability to choose which Python installation to
use, when the only Apache configuration you have is .htaccess. Most
shared hosting providers have old versions of Python (I saw a 2.2 one
major hosting provider and 2.3 in several places), and people often
install 2.4 or 2.5 in their home directory and use that.
- Brian
I have discussed before about possibility that mod_wsgi could support
a hybrid mode whereby it still does all the process management and
configuration, but rather than daemon processes being a fork only,
they could do a fork/exec of arbitrary Python executable, with
mod_wsgi telling that Python instance to load up special daemon side
Python module provided by mod_wsgi to perform the same role as current
in process mechanisms for daemon processes.
Overall this would provide three options, embedded, daemon and hybrid modes.
Such an additional mode wouldn't even be considered until after 3.0 is
done and suspect that there possibly would not be enough interest in
it to justify doing it.
Graham
Yes, I noticed this when browsing the source code. The documentation for
WSGIPythonHome should probably note this restriction.
> I have discussed before about possibility that mod_wsgi could
> support a hybrid mode whereby it still does all the process
> management and configuration, but rather than daemon
> processes being a fork only, they could do a fork/exec of
> arbitrary Python executable, with mod_wsgi telling that
> Python instance to load up special daemon side Python module
> provided by mod_wsgi to perform the same role as current in
> process mechanisms for daemon processes.
Once you've done that, you are not far away from implementing a total
generic mod_fcgi/mod_fastcgi/mod_scgi/mod_ajp replacement.
For people using hosting providers that don't want to actively help
users with their Python applications, using one of the FastCGI-WSGI
adapters probably makes a lot more sense than trying to shoehorn a bunch
of accomodating features into mod_wsgi.
- Brian
Done.
> > I have discussed before about possibility that mod_wsgi could
> > support a hybrid mode whereby it still does all the process
> > management and configuration, but rather than daemon
> > processes being a fork only, they could do a fork/exec of
> > arbitrary Python executable, with mod_wsgi telling that
> > Python instance to load up special daemon side Python module
> > provided by mod_wsgi to perform the same role as current in
> > process mechanisms for daemon processes.
>
> Once you've done that, you are not far away from implementing a total
> generic mod_fcgi/mod_fastcgi/mod_scgi/mod_ajp replacement.
That is why I think adding this hybrid mode is attractive. For Python
hosting, you effectively subsume, mod_python,
mod_fcgi/mod_fastcgi/mod_scgi/mod_ajp systems and to some degree
running a back end system with mod_proxy.
Better still, the same WSGI script file works for all options giving
you a lot of flexibility in the one package as to how you want to run
it, depending on the tradeoffs which are acceptable to you. For
example, absolute speed versus being able to select Python versions
being the extremes.
It will also hopefully shut up those people who say that mod_wsgi is
evil because it restricts you to a single version of Python. :-)
Graham