Status of mod_wsgi version 2.0.

46 views
Skip to first unread message

Graham Dumpleton

unread,
Jan 11, 2008, 8:25:42 PM1/11/08
to mod...@googlegroups.com
This is just a quick email to let you all know where version 2.0 of
mod_wsgi is at given that the next release candidate I talked about
still hasn't seen the light of day.

Basic problem has been lack of time. In the first instance I have gone
back to work and that combined with my new baby duties means have very
little time to do much on mod_wsgi. What little time I do have to do
things I haven't been using effectively. I have been letting myself
waste too much time in trying to counter some of the FUD about
mod_wsgi that is being spouted by people on other forums who would
rather see mod_wsgi die. Plus wasting further time trying to correct
misconceptions other people have because of them making assumptions
that mod_wsgi is just another FASTCGI knock off when it isn't due to
the fundamental difference that the daemon process are a fork only and
not a fork and exec as with FASTCGI and as such mod_wsgi fully
controls the daemon process. They really don't seem to understand the
subtle differences and the implications it has and why mod_wsgi can be
a more effective solution because of it working differently.

I have been challenged on the list here as well on aspects of mod_wsgi
but I don't mind the sort of discussion going on here as it is
positive and directed towards understanding how mod_wsgi works and
making it better. The stuff on the other forums is much more negative
and from people who can't even seem to make the effort to actually
understand how mod_wsgi works. Those discussions seem to be more of
the sort of, what colour it should be painted. In other words stuff
which is totally irrelevant to how mod_wsgi works.

So, a bit of an apology for just not getting on with what is the
important stuff. I really need to watch that video on poisonous people
again and learn better to just ignore this outside stuff. :-)

My grumble out of the way, due to the discussions happening on this
list, will now try and address a few more things for version 2.0 of
mod_wsgi. The current list of things I can remember are:

1. Add support for very large files to wsgi.file_wrapper.

2. Move the code which unlocks the duplicate file descriptor used by
wsgi.file_wrapper until after the close() method is called on the
iterable returned by the WSGI application. This will mean that Python
file object reference to file descriptor will be closed first and
limit any problems that may come up due to file locks being released
prior to that point by mod_wsgi closing the duplicate file descriptor.

3. Remove hack which attempts to throw away remaining request content
if not consumed by the WSGI application. Problem here was that it was
doing it before the close() method was called on an iterable and
technically a close() method could consume the remaining input.

4. Try and properly fix the potential for socket deadlock when using
daemon mode and WSGI application doesn't consume all the request
content.

5. Discard any response content in excess of the length defined by
Content-Length header and log a warning message to Apache error log as
indication that WSGI application has a flaw of some sort that allows
it to generate more content than it was supposed to.

6. When using wsgi.file_wrapprer if Content-Length header has been
defined only send the amount of data specified by the header. This
behaviour would probably be contrary to the WSGI specification as
written, but makes more sense and better satisfies HTTP standards.

7. When using wsgi.file_wrapper if Content-Length is not defined then
if the length of the associated file can be determined then generate a
Content-Length header automatically.

8. When using wsgi.file_wrapper if the file like object has a fileno()
method but is not an actual file with the length not being able to be
calculated, use Apache ability to improve performance by using a pipe
brigade. This would optimise case where wsgi.file_wrapper was used
around a socket connected to some back end process. For example, HTTP
proxying or XML-RPC for example.

I think that is all. If there was something else you were expecting
then please let me know. Hopefully you will all try and keep me
focused and not get distracted again. :-)

Graham

braydon fuller

unread,
Jan 12, 2008, 6:20:04 AM1/12/08
to mod...@googlegroups.com
Thanks for the all the effort on mod_wsgi, and excellent support.
Having the daemon process has been a big plus to the
previous method of using Supervisor2, plus Apache proxy.

Clodoaldo

unread,
Jan 12, 2008, 6:55:08 AM1/12/08
to mod...@googlegroups.com
2008/1/11, Graham Dumpleton <graham.d...@gmail.com>:

>
> What little time I do have to do
> things I haven't been using effectively. I have been letting myself
> waste too much time in trying to counter some of the FUD about
> mod_wsgi that is being spouted by people on other forums who would
> rather see mod_wsgi die. Plus wasting further time trying to correct
> misconceptions other people have because of them making assumptions
> that mod_wsgi is just another FASTCGI knock off when it isn't due to
> the fundamental difference that the daemon process are a fork only and
> not a fork and exec as with FASTCGI and as such mod_wsgi fully
> controls the daemon process. They really don't seem to understand the
> subtle differences and the implications it has and why mod_wsgi can be
> a more effective solution because of it working differently.

<rant>

Although I don't write to this list I have read almost all the posts
here. What can I say? If i had half your productivity and reasoning
capability i would be considered a genious at my work place. And I'm
not a bad programmer at all.

Never before modwsgi had Python a so concrete chance to make a real
entrance in the big web applications arena, those so common in
commodity hosting providers that a customer install with one click.
Without modwsgi, or something else still to appear, Python will be
forever a niche language in web applications, just a ghetto of
inflated egos.

I don't really understand all the subtleties of modwsgi 2.0 as 1.3 is
more than enough for me. IIRC you think only 3.0, if done, will have
all the features commodity hosting providers demand. I think 2.0 today
is already much better than mod_python for those providers. The only
problem I see is that they prefer, as sometimes I also do, a known
problem, for which they have some workarounds, than an unknown
solution.

The modwsgi rate of adoption will not be decided by the framework
developers but by the hosting providers' admins. Those are the people
who determine which are the mainstream technologies. They decided that
php is better than perl and them, and only them, will decide if
modwsgi will have its place in that marked. Once and if those admins
adopt modwsgi the framework developers will have no choice but to
fully support modwsgi.

</rant>

Regards, Clodoaldo Pinto Neto

Brian Smith

unread,
Jan 12, 2008, 9:30:13 PM1/12/08
to mod...@googlegroups.com
Graham Dumpleton wrote:
> Basic problem has been lack of time. In the first instance I
> have gone back to work and that combined with my new baby
> duties means have very little time to do much on mod_wsgi.

If I had a baby I doubt I would have time for anything else besides my
family.

> What little time I do have to do things I haven't been using
> effectively. I have been letting myself waste too much time
> in trying to counter some of the FUD about mod_wsgi that is
> being spouted by people on other forums who would rather see
> mod_wsgi die.

It is just a matter of "code speaks louder than words." You already have
plans to correct the most significant problems in mod_wsgi; once those
plans are executed upon, there aren't going to be any valid objections.

> daemon process are a fork only and not a fork and exec as
> with FASTCGI and as such mod_wsgi fully controls the daemon
> process. They really don't seem to understand the subtle
> differences and the implications it has and why mod_wsgi can
> be a more effective solution because of it working differently.

The fork-based design has the potential to be extremely efficient. Right
now, mod_wsgi doesn't seem to be utilizing the full benefits of forking
with respect to memory sharing between the processes. I hope to send you
some ideas regarding this soon.

> I think that is all. If there was something else you were
> expecting then please let me know. Hopefully you will all try
> and keep me focused and not get distracted again. :-)

Tomorrow, I will enter those issues and some other ones into the issue
tracker, so that they are easier to keep track of. Perhaps having these
issues in the issue tracker will encourage others to dig into the code
and fix them. I encourage everybody to go to the issue tracker
(http://code.google.com/p/modwsgi/issues/list) and vote for the issues
that they care most about.

If you would like, I can also fix the wsgi.file_wrapper issues. Right
now I am experimenting with the file-descriptor-passing alternative
method, but I understand the code well enough to fix the issues with the
current implementation as well. Maybe it is better for the
wsgi.file_wrapper implementation to be moved to a 2.1 release? I think
the wsgi.input issues are a higher priority, and you have a better
understanding about how to fix them than I do.

Regards,
Brian

Graham Dumpleton

unread,
Jan 12, 2008, 10:56:13 PM1/12/08
to mod...@googlegroups.com
On 13/01/2008, Brian Smith <br...@briansmith.org> wrote:
>
> Graham Dumpleton wrote:
> > Basic problem has been lack of time. In the first instance I
> > have gone back to work and that combined with my new baby
> > duties means have very little time to do much on mod_wsgi.
>
> If I had a baby I doubt I would have time for anything else besides my
> family.

My wife has a very similar viewpoint and tries very hard to keep me
off the computer. As long as I do all the chores I need to, I don't
get in too much trouble. She also likes to spend a bit of time on the
computer as well. :-)

> > I think that is all. If there was something else you were
> > expecting then please let me know. Hopefully you will all try
> > and keep me focused and not get distracted again. :-)
>
> Tomorrow, I will enter those issues and some other ones into the issue
> tracker, so that they are easier to keep track of. Perhaps having these
> issues in the issue tracker will encourage others to dig into the code
> and fix them. I encourage everybody to go to the issue tracker
> (http://code.google.com/p/modwsgi/issues/list) and vote for the issues
> that they care most about.
>
> If you would like, I can also fix the wsgi.file_wrapper issues. Right
> now I am experimenting with the file-descriptor-passing alternative
> method, but I understand the code well enough to fix the issues with the
> current implementation as well. Maybe it is better for the
> wsgi.file_wrapper implementation to be moved to a 2.1 release? I think
> the wsgi.input issues are a higher priority, and you have a better
> understanding about how to fix them than I do.

I have actually already done 2 and 3 and committed them. I have in my
checked out code changes for 5, 6, 7 and 8 but am still reviewing and
testing the code plus thinking over what sort of warning to output
about content length violations. I have also had a look at 1 and 4 and
know what I need to do. Waiting on finishing 5, 6, 7 and 8 before I do
1. :-)

Anyway, if people are going to start voting on issues, I better add up
at least one more idea I have been playing with and already had one go
at implementing to work out what is involved.

What I want to do it eliminate people needing to do:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /site.wsgi/$1 [QSA,PT,L]

The idea is to add a directive:

WSGIDirectoryScript site.wsgi

For just the Directory/.htaccess context this is defined in (and not
sub directories) it would perform the equivalent of what the rewrite
rules would do when a request doesn't match any existing file. In
other words it would treat the request against that parent directory
and internal redirect it through a WSGI application script notionally
mounted on that directory. In doing this, it would go one step beyond
what the rewrite rule does and fixup SCRIPT_NAME automatically so that
it refers to the URL mapping to the directory and not the actual WSGI
script file which is the target after the redirect. This automatic
fixup would eliminate the need for a WSGI wrapper fiddle of the form:

def _application(environ, start_response):
# The original application.
...

import posixpath

def application(environ, start_response):
# Wrapper to set SCRIPT_NAME to actual mount point.
environ['SCRIPT_NAME'] = posixpath.basename(environ['SCRIPT_NAME'])
return _application(environ, start_response)

or the need to use some configuration within the WSGI application to
have it adjust what the notional mount point of the application is.

Since the rewrite rules above seem to be the cause of a lot of
problems for people when using FASTGI and the configuration only uses
something like:

AddHandler fastcgi-script .fcgi

It would be nice to come up with a better solution for this problem of
mounting a specific file on the root of a site or a sub directory of a
site when a similar configuration is used with mod_wsgi.

So, any feedback around solving this particular issue or similar
things which are a pita with something like FASTCGI would help more at
this point in trying to prioritise the existing issues as many of
those are minor and not absolutely required.

The other more concrete thing which would be really good is for
someone independent of me to do some proper benchmarking comparing
mod_wsgi (in both modes) to FASTCGI (flup), SCGI (flup), AJP (flup),
CherryPy wsgiserver, CherryPy wsgiserver (behind mod_proxy), paste
server and paste server (behind mod_proxy).

There has been some comments about me not providing benchmarks
comparing mod_wsgi to these and some even have the idea that me not
providing any is admission that some of the solutions at least are
still superior to mod_wsgi.

I have actually done the benchmarks for most and I know for the
systems I use at least that that is not the case and mod_wsgi still
provides a better option if one wants to look at speed even if in most
cases the bottlenecks are elsewhere. What I don't understand is if
those people think that there own solution is so much better why they
don't do the benchmark comparison themselves. After all, if I come out
with something, because I am not independent they probably will not
believe it anyway.

BTW, on somehow having file_wrapper pass a file descriptor back to the
Apache child processes for manipulating directly, my current viewpoint
is that if people really want the maximum speed possible then they
should either run the application in embedded mode or just use static
files in the file system an use Alias directive of Apache to map to
them. So, would have to be a pretty convincing solution you are coming
up with. :-)

Graham

gert

unread,
Jan 13, 2008, 6:36:59 PM1/13/08
to modwsgi
WSGIDirectoryScript site.wsgi

can anyone explain to me in cartoon style why not use
WSGIScriptAliasMatch please ?

WSGIScriptAliasMatch ^/wsgi-scripts/([^/]+) /web/wsgi-scripts/$1.wsgi





Graham Dumpleton

unread,
Jan 13, 2008, 7:00:24 PM1/13/08
to mod...@googlegroups.com
On 14/01/2008, gert <gert.c...@gmail.com> wrote:
>
> WSGIDirectoryScript site.wsgi
>
> can anyone explain to me in cartoon style why not use
> WSGIScriptAliasMatch please ?

Sorry, I am terrible at drawing.

> WSGIScriptAliasMatch ^/wsgi-scripts/([^/]+) /web/wsgi-scripts/$1.wsgi

WSGIScriptAliasMatch is only of use if you have access to the main
Apache configuration file and can do those sorts of customisations. If
a web hosting company were to use mod_wsgi then they aren't going to
be setting up a specific WSGI application as being mounted on the root
of the web site for each user as in practice that may not be what the
user wants.

For shared hosting, what would typically happen is one of two things.
The first would be that the web hosting company would explicitly put
in the main Apache configuration file for the Directory context of the
user:

AddHandler wsgi-script .wsgi

Alternatively, the user would be given FileInfo override permission
and would add this into there own .htaccess file instead.

With this it would then be possible for a user to add in WSGI script
files, but with the name of the file needing to be part of the URL.
For example, they may have 'django.wsgi' in root directory of their
site. They would access this as '/django.wsgi/' or if MultiViews
enabled as '/django/'.

The problem is that they can't when AddHandler is used in this way
make a WSGI application appear on the root of the site. One way of
getting around this is if user has sufficient privileges in their
.htaccess file is used mod_rewrite as I showed.

With such a rewrite rule, if a request arrives which doesn't map to a
physical file it will internally redirect it through the designated
WSGI application. Using this trick, one can make the designated WSGI
application appear to be mounted on the root of the site.

Problem is that the rewrite rules can often be troublesome and you
still need to do some fiddles or the WSGI application needs to have
configurability to override the mount point.

Thus the whole point of this is for where AddHandler was used to
enable ability to use WSGI applications and there is no access to
WSGIScriptAlias or similar, to make the process or mounting a
particular WSGI application on the root of the site much easier than
it currently is.

Graham

gert

unread,
Jan 13, 2008, 8:10:01 PM1/13/08
to modwsgi
ok got it, translates url "/django.wsgi" to "/django.wsgi/"
automatically with .htaccess privileges without messing up the wsgi
environ

what about

AddType application/x-httpd-wsgi .wsgi

Brian Smith

unread,
Jan 13, 2008, 8:11:18 PM1/13/08
to mod...@googlegroups.com
Graham Dumpleton wrote:
> On 14/01/2008, gert <gert.c...@gmail.com> wrote:
> > can anyone explain to me in cartoon style why not use
> > WSGIScriptAliasMatch please ?
>
> WSGIScriptAliasMatch is only of use if you have access to the
> main Apache configuration file and can do those sorts of
> customisations.

Graham, let's rephrase the question as "Why can't we use
WSGIScriptAliasMatch within an .htaccess file?"

I suspect the answer is answer is (1) .htaccess files are basically the
same as <Directory> sections, (2) Apache only looks at <Directory>
sections after it has already mapped the request URL to the filesystem,
and (3) Apache's API doesn't allow a directive to be processed both
before and after URL-to-filesystem mapping takes place. Otherwise,
"Alias" and "ScriptAlias" would be alloyed in .htaccess and Directory
sections too.

> With this it would then be possible for a user to add in WSGI
> script files, but with the name of the file needing to be
> part of the URL.
> For example, they may have 'django.wsgi' in root directory of
> their site. They would access this as '/django.wsgi/' or if
> MultiViews enabled as '/django/'.
>
> The problem is that they can't when AddHandler is used in
> this way make a WSGI application appear on the root of the
> site. One way of getting around this is if user has
> sufficient privileges in their .htaccess file is used
> mod_rewrite as I showed.

How about DirectoryIndex + AddHandler or SetHandler?

> Thus the whole point of this is for where AddHandler was used
> to enable ability to use WSGI applications and there is no
> access to WSGIScriptAlias or similar, to make the process or
> mounting a particular WSGI application on the root of the
> site much easier than it currently is.

How about an alternative?: The existing SetEnv, or a new "WSGISetEnv"
directive, can override any WSGI environ variable, including PATH_INFO
and SCRIPT_NAME, with a user-supplied value . I think that this would
solve the stated problem, while also being very useful for other
reasons.

There are a few things I don't like about the proposed
WSGIDirectoryIndex mechanism:

1. Its based on the idea that we will put Python scripts within the
Apache document root. Although some of the existing mod_wsgi directives
also assume this, this isn't something that should be further
encouraged. In particular, it won't work for prepackaged Python
applications residing in site-packages outside the Apache document root.
[1]

2. It is really only needed for mounting at the root; existing
mechanisms are satisfactory for mounting everywhere else.

3. Applications and frameworks still have to handle the mod_rewrite
hackery for cases when they are deployed using FastCGI or CGI, so this
isn't saving them any work. Rather, it increases the amount of
documentation that pre-packaged systems have to create.

4. This is a common problem with Apache that isn't mod_wsgi-specific; If
there isn't an existing that allows a handler to be bound to the root
URL from within an .htaccess file placed in the document root directory,
then that is something that needs to be fixed outside of mod_wsgi.

[1] Before, I argued against the idea of mod_wsgi supporting
pre-packaged egg files. I had thought that people were intending those
eggs to be within the Apache document root. However, now I understand
the intent is for them to be in PYTHONPATH, outside the document root. I
fully support that idea, because I hate having any scripts within the
Apache document root; it is one of the most common ways to leak
security-sensitive configuration information and/or subject a site to
code injection attacks.

- Brian

Graham Dumpleton

unread,
Jan 13, 2008, 8:48:19 PM1/13/08
to mod...@googlegroups.com
On 14/01/2008, gert <gert.c...@gmail.com> wrote:
>
> ok got it, translates url "/django.wsgi" to "/django.wsgi/"
> automatically with .htaccess privileges without messing up the wsgi
> environ

Sorry, no you haven't.

I'll try and explain again later if I have the time, but suggest maybe
you go read about the Apache DirectoryIndex directive and work out how
that works for static files. What am trying to do is an equivalent for
scripts which are able to take additional path information as well,
something DirectoryIndex doesn't allow.

Anyway, check out DirectoryIndex and maybe when I reply to Brian's
comments it may make more sense.

Graham

gert

unread,
Jan 13, 2008, 8:51:03 PM1/13/08
to modwsgi
i am with Brian and here comes my technical explenation

"keep the list of directives as short as possible" :)

gert

unread,
Jan 13, 2008, 8:52:03 PM1/13/08
to modwsgi
On Jan 14, 2:48 am, "Graham Dumpleton" <graham.dumple...@gmail.com>
wrote:
> On 14/01/2008, gert <gert.cuyk...@gmail.com> wrote:
>
> Sorry, no you haven't.
>
> I'll try and explain again later if I have the time, but suggest maybe
> you go read about the Apache DirectoryIndex directive and work out how
> that works for static files. What am trying to do is an equivalent for
> scripts which are able to take additional path information as well,
> something DirectoryIndex doesn't allow.
>
> Anyway, check out DirectoryIndex and maybe when I reply to Brian's
> comments it may make more sense.

ok thanks

Brian Smith

unread,
Jan 13, 2008, 9:03:39 PM1/13/08
to mod...@googlegroups.com
Graham Dumpleton wrote:
> I have actually already done 2 and 3 and committed them. I
> have in my checked out code changes for 5, 6, 7 and 8 but am
> still reviewing and testing the code plus thinking over what
> sort of warning to output about content length violations. I
> have also had a look at 1 and 4 and know what I need to do.
> Waiting on finishing 5, 6, 7 and 8 before I do 1. :-)

It looks like you've already fixed all the ones that I could help with
in a reasonable time. #1 looks really easy for you to fix and fixing #4
is beyond my current understanding of the code.

> So, any feedback around solving this particular issue or
> similar things which are a pita with something like FASTCGI
> would help more at this point in trying to prioritise the
> existing issues as many of those are minor and not absolutely
> required.

I suggest that people not only vote for existing issues, but also add
new issues that other people can vote on. The number of votes (the
number of people that click the star next to an issue in the issue
tracker) and the number of commenters per issue in the issue tracker are
metrics that are easier to quantify than requests made on the mailing
list. Listing all open issues in the issue tracker also makes it easier
for people to troubleshoot their problems, and it makes it easier for
people to find something that they could send in patches for.

For example, I suspect many people will want to vote for this issue:
http://code.google.com/p/modwsgi/issues/detail?id=50 "Do not load the
Python interpreter into the Apache process when the embedded mode is not
used," because it would solve a lot of the compatibility issues that
come up with using mod_wsgi.

> The other more concrete thing which would be really good is
> for someone independent of me to do some proper benchmarking
> comparing mod_wsgi (in both modes) to FASTCGI (flup), SCGI
> (flup), AJP (flup), CherryPy wsgiserver, CherryPy wsgiserver
> (behind mod_proxy), paste server and paste server (behind mod_proxy).

It is very difficult to do, especially when you start talking about the
effects of something like an optimized wsgi.file_wrapper implementation.
To test the effects properly, you need to have two physical systems that
communicate over a physical NIC. Otherwise, zero-copy disk-to-NIC DMA
transfers enabled by sendfile will never be properly reflected in the
results. If you use VMWare or similar then you can use strace to get
some idea if those optimizations are happening, but they won't affect
the actual speed.

However, I am planning to do some tests RSN, but focused as much on RSS
as throughput and latency.

> There has been some comments about me not providing
> benchmarks comparing mod_wsgi to these and some even have the
> idea that me not providing any is admission that some of the
> solutions at least are still superior to mod_wsgi.

<snip>

> What I don't understand is if those people think that there
> own solution is so much better why they don't do the
> benchmark comparison themselves. After all, if I come out
> with something, because I am not independent they probably
> will not believe it anyway.

At least on DreamHost and similar hosts, AJP-based mechanisms are out
because shared hosting providers don't want people running big Java
application servers on their servers; they will never enable mod_ajp or
other AJP modules. mod_proxy is out for the same reasons. I know that
Flup doesn't work reliably on DreamHost because it is too slow to start
up; people that host their recommend to use the older fcgi.py module
that it is based on. That leaves SCGI, FastCGI, mod_wsgi deamon mode as
the only real competitors for shared hosting environments. If I was a
hosting provider (I'm not), I would prefer FastCGI and SCGI because they
are not Python-specific. However, if mod_wsgi could offer some
managability or performance features that were significant, then I would
consider mod_wsgi as well.

- Brian

Graham Dumpleton

unread,
Jan 13, 2008, 10:28:41 PM1/13/08
to mod...@googlegroups.com
On 14/01/2008, Brian Smith <br...@briansmith.org> wrote:
>
> Graham Dumpleton wrote:
> > I have actually already done 2 and 3 and committed them. I
> > have in my checked out code changes for 5, 6, 7 and 8 but am
> > still reviewing and testing the code plus thinking over what
> > sort of warning to output about content length violations. I
> > have also had a look at 1 and 4 and know what I need to do.
> > Waiting on finishing 5, 6, 7 and 8 before I do 1. :-)
>
> It looks like you've already fixed all the ones that I could help with
> in a reasonable time. #1 looks really easy for you to fix and fixing #4
> is beyond my current understanding of the code.

Except for 1 and 4 I had to commit the remainder of the changes this
morning. This doesn't mean I am 100 percent sure it is all okay, but
had to commit it so I could check it out back on my laptop so I could
do stuff from work if I got a chance and check that it works in Apache
1.3 and 2.0 as well as 2.2 that had already done some testing for.

So, if running latest code from Subversion, just watch out for any
funny stuff going on until I can double check things again.

> For example, I suspect many people will want to vote for this issue:
> http://code.google.com/p/modwsgi/issues/detail?id=50 "Do not load the
> Python interpreter into the Apache process when the embedded mode is not
> used," because it would solve a lot of the compatibility issues that
> come up with using mod_wsgi.

Actually, I am not sure it does solve many problems. Any source of
conflicts is generally because of specific Python modules being loaded
and even though the main interpreter sits unused in the main Apache
child processes, those modules don't ever get loaded so the
dependencies that cause the conflicts aren't loaded.

Current plans for how a web hosting company would supply dynamic
configuration information related to transient daemon processes are
also sort of dependent on Python interpreter still being present in
Apache child processes even if only daemon mode is used to run
applications. This is because it seems easier at this point to have
them define a really simple Python script, with potentially small C
extension module to hook into other Apache internals, than requiring
them to write a complete custom Apache module to bridge their virtual
hosting/user configuration with mod_wsgi. Until I can get some
concrete examples of how large web hosting companies configure their
systems though, bit hard to tell how it might end up being done.

> > The other more concrete thing which would be really good is
> > for someone independent of me to do some proper benchmarking
> > comparing mod_wsgi (in both modes) to FASTCGI (flup), SCGI
> > (flup), AJP (flup), CherryPy wsgiserver, CherryPy wsgiserver
> > (behind mod_proxy), paste server and paste server (behind mod_proxy).
>
> It is very difficult to do, especially when you start talking about the
> effects of something like an optimized wsgi.file_wrapper implementation.

I certainly accept that one test is not sufficient. As I have
mentioned when talking about doing this benchmarking in the past I see
a range of tests being needed to delve into various aspects of how the
hosting solutions perform. So, performance of things like single test
string response, list of many strings, yielding, file wrapper, request
content consumption, scalability under concurrent requests, etc etc.
Because Apache/mod_wsgi provides its own URL mapping system for root
of application, one could even consider performance impacts of using
Apache/mod_wsgi for doing composition of multiple applications over
others system equivalents. One could even take it as far as starting
to look at some of the major Python web frameworks and look at the
base memory consumption, plus basic request throughput.

I was thinking whether to create a public project 'wsgi-bench' on
Google Code site where interested parties can join up and we make a
join effort of it all, setting out a well defined documented
methodology, series of tests etc etc. All the code and the results can
then be collected within the associated Subversion repository as we
all work on them and so other people can see it all in the open and so
not be able to complain that stuff was hidden as to how any results
were arrived at.

Graham

Graham Dumpleton

unread,
Jan 13, 2008, 11:11:31 PM1/13/08
to mod...@googlegroups.com
On 14/01/2008, Brian Smith <br...@briansmith.org> wrote:
>
> Graham Dumpleton wrote:
> > On 14/01/2008, gert <gert.c...@gmail.com> wrote:
> > > can anyone explain to me in cartoon style why not use
> > > WSGIScriptAliasMatch please ?
> >
> > WSGIScriptAliasMatch is only of use if you have access to the
> > main Apache configuration file and can do those sorts of
> > customisations.
>
> Graham, let's rephrase the question as "Why can't we use
> WSGIScriptAliasMatch within an .htaccess file?"
>
> I suspect the answer is answer is (1) .htaccess files are basically the
> same as <Directory> sections, (2) Apache only looks at <Directory>
> sections after it has already mapped the request URL to the filesystem,
> and (3) Apache's API doesn't allow a directive to be processed both
> before and after URL-to-filesystem mapping takes place. Otherwise,
> "Alias" and "ScriptAlias" would be alloyed in .htaccess and Directory
> sections too.

Correct. The ScriptAlias directives for both mod_cgi and mod_wsgi are
processed in the 'translate_name' phase of Apache. This is the actual
phase which makes the determination of what directory a URL maps to.
In the case of the ScriptAlias directives it is overriding the default
behaviour of simply trying to map the URL to something under
DocumentRoot.

> > With this it would then be possible for a user to add in WSGI
> > script files, but with the name of the file needing to be
> > part of the URL.
> > For example, they may have 'django.wsgi' in root directory of
> > their site. They would access this as '/django.wsgi/' or if
> > MultiViews enabled as '/django/'.
> >
> > The problem is that they can't when AddHandler is used in
> > this way make a WSGI application appear on the root of the
> > site. One way of getting around this is if user has
> > sufficient privileges in their .htaccess file is used
> > mod_rewrite as I showed.
>
> How about DirectoryIndex + AddHandler or SetHandler?

Looked at that. The problem is that DirectoryIndex will not work where
there is additional path information beyond the leading part of the
URL which mapped to the directory.

/* Never tolerate path_info on dir requests */
if (r->path_info && *r->path_info) {
return DECLINED;
}

So, although DirectoryIndex could be pointed at a .wsgi file, that
file can only implement the URL which corresponded to the directory.
Ie., similar to index.php.

In other words, DirectoryIndex can automatically map '/some/dir/' to
'/some/dir/index.wsgi', it cant map '/some/dir/foo/bar' to
'/some/dir/index.wsgi/foo/bar'.

Now, if gert went and looked up DirectoryIndex, hopefully this little
bit of description will help in understanding the problem trying to be
solved. That is, that DirectoryIndex cant handle additional path
information.

> > Thus the whole point of this is for where AddHandler was used
> > to enable ability to use WSGI applications and there is no
> > access to WSGIScriptAlias or similar, to make the process or
> > mounting a particular WSGI application on the root of the
> > site much easier than it currently is.
>
> How about an alternative?: The existing SetEnv, or a new "WSGISetEnv"
> directive, can override any WSGI environ variable, including PATH_INFO
> and SCRIPT_NAME, with a user-supplied value . I think that this would
> solve the stated problem, while also being very useful for other
> reasons.

It is also allows a user to potentially muck things up as well so I
preferred what Apache came up with for these to be definitive. If a
user wants to muck with it in the script file afterwards then that is
their business. Also, the script file is the only place where one can
do more complicated modifications to the environment anyway. One can
do some stuff with mod_rewrite but not with the fixed CGI environment
stuff as that isn't setup by that point and would get wiped out when
those CGI variables are added to the environment later.

> There are a few things I don't like about the proposed
> WSGIDirectoryIndex mechanism:
>
> 1. Its based on the idea that we will put Python scripts within the
> Apache document root. Although some of the existing mod_wsgi directives
> also assume this, this isn't something that should be further
> encouraged. In particular, it won't work for prepackaged Python
> applications residing in site-packages outside the Apache document root.
> [1]

You can't really get away from having a file of some sort in the
document three. This is because Apache needs one to calculate
SCRIPT_NAME properly. In mod_python you didn't need a file in the
document tree but SCRIPT_NAME was also always nearly wrong and it
wasn't possible to somehow otherwise deduce it properly.

The file in the document tree is also being used as the trigger for
daemon process reloading. That is, when file is changed, daemon
process is restarted. If you bypass that and use the mod_python
approach of mapping directly to a Python module name installed
somewhere on Python module path you loose that. I can't see any other
easy or logical way of handling daemon process reloading without it.

Once you start trying to have the configuration map to a Python module
directly without an actual script file in the document tree, then you
get into the whole mess mod_python had with Python module search
paths. In mod_python it was made worse because the default was one
interpreter per site and mod_wsgi is one per application. The
potential for a mess is still there though if one tried to support
arbitrary Python path changes on a per application basis through
Apache configuration.

Overall, worked much better and seemed much easier to have the script
file and for user to be able to set sys.path in their if need be, or
even setup virtual environment, and then import module from elsewhere
and trigger it.

> 2. It is really only needed for mounting at the root; existing
> mechanisms are satisfactory for mounting everywhere else.

Yes you could just shift the WSGI script file up one directory for sub
URLs, but also may be just as convenient to dump a complete directory
into place which is self contained. That way you don't have to merge
multiple applications together in the parent directory.

> 3. Applications and frameworks still have to handle the mod_rewrite
> hackery for cases when they are deployed using FastCGI or CGI, so this
> isn't saving them any work. Rather, it increases the amount of
> documentation that pre-packaged systems have to create.
>
> 4. This is a common problem with Apache that isn't mod_wsgi-specific; If
> there isn't an existing that allows a handler to be bound to the root
> URL from within an .htaccess file placed in the document root directory,
> then that is something that needs to be fixed outside of mod_wsgi.

I accept that what is really required is a DirectoryScript directive
in the core of Apache. Ie., sort of like DirectoryIndex but without
the path information restriction, but don't see that happening and
doesn't help with existing versions of Apache unless you are going to
implement it as a standalone Apache module that people could install
with any version of Apache.

Thus, since we are more worried about Python and WSGI, didn't see a
problem with coming up with a solution which at least makes it simpler
for mod_wsgi even it means I have to cripple it slightly so it can
only be used to map to a target which is a wsgi-script rather than
being generic.

> [1] Before, I argued against the idea of mod_wsgi supporting
> pre-packaged egg files. I had thought that people were intending those
> eggs to be within the Apache document root. However, now I understand
> the intent is for them to be in PYTHONPATH, outside the document root. I
> fully support that idea, because I hate having any scripts within the
> Apache document root; it is one of the most common ways to leak
> security-sensitive configuration information and/or subject a site to
> code injection attacks.

As I have said, you can't really get away from having some sort of
file in the document tree, even if it is just a marker file containing
just the name of a Python module to use.

I have always wanted the WSGI script file to be the absolute minimum
of set up sys.path or virtual environment, import Python
module/package/egg and reference application entry point from that
module.

Problem is that there are very few WSGI applications that are
implemented such that it can work that way yet. All too many so called
WSGI applications have all this extra crud that needs to be done to
setup the environment. The one which is closest to the ideal is Trac,
all it needs is:

from trac.web.main import dispatch_request as application

All the rest can be handled through SetEnv configuration when using
Trac. But then that presumes that user has FileInfo rights to use
SetEnv in the first place.

So, although it can be convenient for small applications to put the
source code into the WSGI script file, for more serious stuff, should
always be outside of the document tree.

Graham

Graham Dumpleton

unread,
Jan 14, 2008, 12:38:59 AM1/14/08
to mod...@googlegroups.com
On 14/01/2008, Graham Dumpleton <graham.d...@gmail.com> wrote:
> On 14/01/2008, Brian Smith <br...@briansmith.org> wrote:
> >
> > Graham Dumpleton wrote:
> > > I have actually already done 2 and 3 and committed them. I
> > > have in my checked out code changes for 5, 6, 7 and 8 but am
> > > still reviewing and testing the code plus thinking over what
> > > sort of warning to output about content length violations. I
> > > have also had a look at 1 and 4 and know what I need to do.
> > > Waiting on finishing 5, 6, 7 and 8 before I do 1. :-)
> >
> > It looks like you've already fixed all the ones that I could help with
> > in a reasonable time. #1 looks really easy for you to fix and fixing #4
> > is beyond my current understanding of the code.
>
> Except for 1 and 4 I had to commit the remainder of the changes this
> morning. This doesn't mean I am 100 percent sure it is all okay, but
> had to commit it so I could check it out back on my laptop so I could
> do stuff from work if I got a chance and check that it works in Apache
> 1.3 and 2.0 as well as 2.2 that had already done some testing for.
>
> So, if running latest code from Subversion, just watch out for any
> funny stuff going on until I can double check things again.

Large file support is in there now, so if you have some spare multi GB
files laying around you can try it out. Just don't try and get your
browser to load the file. :-)

FWIW, the whole mod_wsgi code needs to be gone over as most likely
various 32 bit limitations in other areas as well because of use of
'int' rather than correct Python string length type. I avoided the
issue because I was only using Python 2.3 originally.

Graham

Brian Smith

unread,
Jan 14, 2008, 9:31:28 AM1/14/08
to mod...@googlegroups.com
Graham Dumpleton wrote:
> On 14/01/2008, Brian Smith <br...@briansmith.org> wrote:
> > For example, I suspect many people will want to vote for this issue:
> > http://code.google.com/p/modwsgi/issues/detail?id=50 "Do
> > not load the Python interpreter into the Apache process when the
> > embedded mode is not used," because it would solve a lot of the
> > compatibility issues that come up with using mod_wsgi.
>
> Actually, I am not sure it does solve many problems. Any
> source of conflicts is generally because of specific Python
> modules being loaded and even though the main interpreter
> sits unused in the main Apache child processes, those modules
> don't ever get loaded so the dependencies that cause the
> conflicts aren't loaded.

It seems I got it backwards then. Really the problem is that Apache is
being loaded into the deamon processes. But, the issue is still
valid--the python interpreter and other Apache modules should never be
loaded into the space process, because there is too much potential for
conflicts. This is something that is only going to get worse over time.

Also, it seems like it is a symptom of a security problem. It seems to
me that the the forked child processes can read all the data that was
present in Apache's heap at the time that the processes was forked. For
example, if mod_mem_cache is being used, then the daemon processes can
access the cached documents. And, it can read any part of the Apache
configuration, including any passwords. This seems like a real problem
for any kind of shared hosting situation.

You mentioned before on the list that fork() without an exec() has some
benefits over fork()+exec(). Do you have a link to something that
explains the benefit? To me, it seems like there *has* to be an exec()
between the Apache process and the daemons.

- Brian

Graham Dumpleton

unread,
Jan 14, 2008, 7:50:03 PM1/14/08
to mod...@googlegroups.com
On 15/01/2008, Brian Smith <br...@briansmith.org> wrote:
>
> Graham Dumpleton wrote:
> > On 14/01/2008, Brian Smith <br...@briansmith.org> wrote:
> > > For example, I suspect many people will want to vote for this issue:
> > > http://code.google.com/p/modwsgi/issues/detail?id=50 "Do
> > > not load the Python interpreter into the Apache process when the
> > > embedded mode is not used," because it would solve a lot of the
> > > compatibility issues that come up with using mod_wsgi.
> >
> > Actually, I am not sure it does solve many problems. Any
> > source of conflicts is generally because of specific Python
> > modules being loaded and even though the main interpreter
> > sits unused in the main Apache child processes, those modules
> > don't ever get loaded so the dependencies that cause the
> > conflicts aren't loaded.
>
> It seems I got it backwards then. Really the problem is that Apache is
> being loaded into the deamon processes. But, the issue is still
> valid--the python interpreter and other Apache modules should never be
> loaded into the space process, because there is too much potential for
> conflicts. This is something that is only going to get worse over time.

Except for SSL and expat library dependencies from Apache itself, all
the other conflicts I have seen arise due to PHP module dependencies.
Depending at what point PHP loads those modules will dictate whether
there is a conflict even when using daemon processes.

> Also, it seems like it is a symptom of a security problem. It seems to
> me that the the forked child processes can read all the data that was
> present in Apache's heap at the time that the processes was forked. For
> example, if mod_mem_cache is being used, then the daemon processes can
> access the cached documents. And, it can read any part of the Apache
> configuration, including any passwords. This seems like a real problem
> for any kind of shared hosting situation.

For daemon processes, I don't think mod_mem_cache is a problem as each
Apache child process appears to have its own in memory cache and it
isn't shared across processes, thus no way that daemon processes could
access it. Obviously embedded mode is different, but then same issue
for where PHP is being used embedded in Apache child processes. This
doesn't mean that other Apache modules may not be a problem and even
the Apache scoreboard could be an issue if it has security problems
like those which were fixed sometime last year.

> You mentioned before on the list that fork() without an exec() has some
> benefits over fork()+exec(). Do you have a link to something that
> explains the benefit? To me, it seems like there *has* to be an exec()
> between the Apache process and the daemons.

I'll take this up in a separate thread where I'll discuss some stuff
about current mod_wsgi architecture, why it is the way it is and the
compromises which had to be made because of mod_python.

Basically, I haven't dismissed #50 and have been thinking about that
and other issues you are raising. If one were to drop various
compatibility requirements caused because of mod_python, and basically
say you can't used mod_python in the same server, then a quite
different architecture for mod_wsgi could be used.

The big problem is trying to balance up all the different requirements
people want from an Apache/Python solution. Certain architectures can
give additional benefits including security and greater flexibility,
but at the same time mean that one cant at the same time support
certain features that some people at least want.

Let me get my breath and I'll start up a separate thread and I'll
slowly go through all of this.

Graham

Brian Smith

unread,
Jan 14, 2008, 10:58:24 PM1/14/08
to mod...@googlegroups.com
Graham Dumpleton wrote:

> Brian Smith wrote:
> > How about an alternative?: The existing SetEnv, or a new
> > "WSGISetEnv" directive, can override any WSGI environ
> > variable, including PATH_INFO and SCRIPT_NAME, with a
> > user-supplied value . I think that this would solve the
> > stated problem, while also being very useful for other
> > reasons.
>
> It is also allows a user to potentially muck things up as
> well so I preferred what Apache came up with for these to be
> definitive. If a user wants to muck with it in the script
> file afterwards then that is their business. Also, the script
> file is the only place where one can do more complicated
> modifications to the environment anyway. One can do some
> stuff with mod_rewrite but not with the fixed CGI environment
> stuff as that isn't setup by that point and would get wiped
> out when those CGI variables are added to the environment later.

What kinds of problems do you forsee?

I was going to request this feature anyway. I want mod_deflate to handle
all compression/decompression in front of my app. My app contains logic
to do the compression based on the Accept-Encoding header, in case
mod_deflate is not available. So, I need a way of suppressing the
HTTP_ACCEPT_ENCODING entry from the WSGI environment if mod_deflate is
active.

> You can't really get away from having a file of some sort in
> the document three. This is because Apache needs one to
> calculate SCRIPT_NAME properly. In mod_python you didn't need
> a file in the document tree but SCRIPT_NAME was also always
> nearly wrong and it wasn't possible to somehow otherwise
> deduce it properly.

This is only needed when using AddHandler. Other ways of mapping a
location to a WSGI already have explicit values for SCRIPT_NAME within
the Apache configuration files.

> The file in the document tree is also being used as the
> trigger for daemon process reloading. That is, when file is
> changed, daemon process is restarted. If you bypass that and
> use the mod_python approach of mapping directly to a Python
> module name installed somewhere on Python module path you
> loose that. I can't see any other easy or logical way of
> handling daemon process reloading without it.

The file does not need to be under the document root, it just has to be
in some stat-able location.

> I accept that what is really required is a DirectoryScript
> directive in the core of Apache. Ie., sort of like
> DirectoryIndex but without the path information restriction,
> but don't see that happening and doesn't help with existing
> versions of Apache unless you are going to implement it as a
> standalone Apache module that people could install with any
> version of Apache.
>
> Thus, since we are more worried about Python and WSGI, didn't
> see a problem with coming up with a solution which at least
> makes it simpler for mod_wsgi even it means I have to cripple
> it slightly so it can only be used to map to a target which
> is a wsgi-script rather than being generic.

That is totally reasonable. How would this work when you have something
like this:
/ (home page generated by the WSGI
application)
/app/* (dynamic content generated by the WSGI
application)
/app/static (static resources that should be served
directly
by Apache).

Django and other frameworks that I have looked at have all had a
structure like this.

> I have always wanted the WSGI script file to be the absolute
> minimum of set up sys.path or virtual environment, import
> Python module/package/egg and reference application entry
> point from that module.

> All the rest can be handled through SetEnv configuration when

> using Trac. But then that presumes that user has FileInfo
> rights to use SetEnv in the first place.

I predict that there will never be a host that forbids FileInfo but
allows mod_wsgi.

> So, although it can be convenient for small applications to
> put the source code into the WSGI script file, for more
> serious stuff, should always be outside of the document tree.

Everything that you are saying makes perfect sense. Anyway, I think that
this is something that can be revisited if/when mod_wsgi starts
supporting multiple versinos of Python in a single installation. If the
Python interpreter is to be selectable through .htaccess or another
configuration file, then the Python path and everything related to it
will need to be defined using the same mechanism. FWIW, none of the
people I know that use shared hosting are using the pre-installed
version of Python to run their (Fast)CGI scripts; they all downloaded
and/or built their own upgraded Python to run their applications
instead.

- Brian

Brian Smith

unread,
Jan 14, 2008, 11:29:12 PM1/14/08
to mod...@googlegroups.com
Graham Dumpleton wrote:
> > Also, it seems like it is a symptom of a security problem.
> > It seems to me that the the forked child processes can
> > read all the data that was present in Apache's heap at
> > the time that the processes was forked. For example,
> > if mod_mem_cache is being used, then the daemon processes
> > can access the cached documents. And, it can read any part
> > of the Apache configuration, including any passwords.
> > This seems like a real problem for any kind of shared
> > hosting situation.
>
> For daemon processes, I don't think mod_mem_cache is a
> problem as each Apache child process appears to have its own
> in memory cache and it isn't shared across processes, thus no
> way that daemon processes could access it. Obviously embedded
> mode is different, but then same issue for where PHP is being
> used embedded in Apache child processes. This doesn't mean
> that other Apache modules may not be a problem and even the
> Apache scoreboard could be an issue if it has security
> problems like those which were fixed sometime last year.

mod_mem_cache might be a bad example. But, it seems like the security of
a pure fork()-based implementation is impossible to verify, because
portions of the Apache heap are copied into the forked process. It seems
unlikely that you can guarentee that there is no sensitive information
loaded into memory before the fork() happens.

> I'll take this up in a separate thread where I'll discuss
> some stuff about current mod_wsgi architecture, why it is the
> way it is and the compromises which had to be made because of
> mod_python.
>
> Basically, I haven't dismissed #50 and have been thinking
> about that and other issues you are raising. If one were to
> drop various compatibility requirements caused because of
> mod_python, and basically say you can't used mod_python in
> the same server, then a quite different architecture for
> mod_wsgi could be used.

AFIACT, for all practical purposes this is already the case--especially
if mod_python is configured to use a different version of Python than
mod_wsgi.

I've only just started looking at the mod_wsgi code, but it looks to me
like the code could be split up into two parts: one part that handles
all the interaction with Apache without interacting with the Python
interpreter at all, and the other part that interacts with the Python
interpreter but doesn't have to deal with anything Apache-specific (just
APR, not all of Apache). If that can be done, and each part can be split
into its own executable, then it seems like all the potential for shared
library/mod_php/mod_python conflicts is removed, and the security of
mod_wsgi is much easier to verify.

> The big problem is trying to balance up all the different
> requirements people want from an Apache/Python solution.
> Certain architectures can give additional benefits including
> security and greater flexibility, but at the same time mean
> that one cant at the same time support certain features that
> some people at least want.
>
> Let me get my breath and I'll start up a separate thread and
> I'll slowly go through all of this.

Thanks. To be honest, I don't have a firm grasp of when Apache does the
forking in relation to security-sensitive configuration and module
loading. That contributes to me thinking that I cannot verify that
mod_wsgi is safe when applications have to be totally isolated from each
other for security reasons. I also don't know what features need to be
given up with a complete seperation between processes.

- Brian

Graham Dumpleton

unread,
Jan 16, 2008, 2:01:44 AM1/16/08
to mod...@googlegroups.com
On 15/01/2008, Brian Smith <br...@briansmith.org> wrote:
> > I accept that what is really required is a DirectoryScript
> > directive in the core of Apache. Ie., sort of like
> > DirectoryIndex but without the path information restriction,
> > but don't see that happening and doesn't help with existing
> > versions of Apache unless you are going to implement it as a
> > standalone Apache module that people could install with any
> > version of Apache.
> >
> > Thus, since we are more worried about Python and WSGI, didn't
> > see a problem with coming up with a solution which at least
> > makes it simpler for mod_wsgi even it means I have to cripple
> > it slightly so it can only be used to map to a target which
> > is a wsgi-script rather than being generic.
>
> That is totally reasonable. How would this work when you have something
> like this:
> / (home page generated by the WSGI
> application)
> /app/* (dynamic content generated by the WSGI
> application)
> /app/static (static resources that should be served
> directly
> by Apache).

Take Django mounted on root of site as an example, but where user only
has .htaccess control.

In .htaccess of DocumentRoot for that site would be:

AddHandler wsgi-script .wsgi

WSGIDirectoryScript django.wsgi

In the DocumentRoot directory would be:

django.wsgi
media/*

The django.wsgi would be just like described in online documentation,
referring to directories outside of document tree where Django
installation is actually kept.

The 'media' directory would be the actual media files for Django.

What happens is that when getting a request Apache first looks to see
if it can find a physical file for that URL. So, if URL was
/media/some/path and a file existed for that, the static file would be
served up.

If URL didn't find a file within /media for that URL, then Apache
treats the request as being against the most nested directory that it
could match. Because URL started with /media, would be against that
directory or some directory of that. Since I said that
WSGIDirectoryScript would not be inherited by sub directories, this
would result in Apache returning not found, or generate a directory
index for the sub directory if indexing was enabled.

If the URL didn't map partially to a subdirectory and therefore Apache
treats it as a request against the DocumentRoot, the
WSGIDirectoryScript in the .htaccess for that directory would apply.
The result be that it would internally redirect to
/django.wsgi/some/path, but mod_wsgi would do the SCRIPT_NAME fiddle
as a I described so that the application still saw it as actually
being /some/path.

Thus, if a file doesn't exist and the URL didn't partially map to a
subdirectory, the request would be routed through Django via
django.wsgi.

So, in other words would work similar to the rewrite rule but fix up
SCRIPT_NAME so WSGI application was none the wiser.

The only remaining problem at this point is if someone explicitly says
/django.wsgi/some/path from their browser. This would still work but
may confuse Django as SCRIPT_NAME would have it as being mounted on a
sub directory. But then Django doesn't actually pay attention to
SCRIPT_NAME as it isn't WSGI compliant.

Anyway, what may be required is to somehow have WSGIDirectoryScript,
knowing that django.wsgi was specifically to be used for the
directory, block direct access to django.wsgi by user mentioning it in
URL explicitly.

Graham

Brian Smith

unread,
Jan 16, 2008, 7:49:03 AM1/16/08
to mod...@googlegroups.com
Graham Dumpleton wrote:
> > That is totally reasonable. How would this work when you have
> > something like this:
> > / (home page generated by the WSGI app)

> > /app/* (dynamic content generated by the WSGI
> > application)
> > /app/static (static resources that should be
> > served directly by Apache).
>
> Take Django mounted on root of site as an example, but where
> user only has .htaccess control.
>
> In .htaccess of DocumentRoot for that site would be:
>
> AddHandler wsgi-script .wsgi
>
> WSGIDirectoryScript django.wsgi
>
> In the DocumentRoot directory would be:
>
> django.wsgi
> media/*
>
> The only remaining problem at this point is if someone
> explicitly says /django.wsgi/some/path from their browser.
> This would still work but may confuse Django as SCRIPT_NAME
> would have it as being mounted on a sub directory. But then
> Django doesn't actually pay attention to SCRIPT_NAME as it
> isn't WSGI compliant.
>
> Anyway, what may be required is to somehow have
> WSGIDirectoryScript, knowing that django.wsgi was
> specifically to be used for the directory, block direct
> access to django.wsgi by user mentioning it in URL explicitly.

For WSGIDirectoryScript and WSGIScriptAlias[Match], the *.wsgi script
shouldn't need to be within the document root; regardless of what the
file path is, SCRIPT_NAME is going to be the path to the directory with
the WSGIDirectoryScript or the URL path matched by
WSGIScriptAlias[Match]. That is already how ScriptAlias works, for
example. In the example, if django.wsgi is not within the document root,
then there is no way that a user can try to refer to it directly. Script
files should only need to be under the document root for AddHandler and
SetHandler, just like in the CGI case.

- Brian

Graham Dumpleton

unread,
Jan 16, 2008, 3:45:14 PM1/16/08
to mod...@googlegroups.com

When using WSGIScriptAlias the target script file still must be in a
directory which is known to Apache with Apache configured to allow use
of it through its access control mechanisms.

In the case of shared web hosting they are only going to give you the
one directory which is under Apache control. Therefore it isn't
necessarily practical to be putting it outside of the your main
document directory.

What is potentially lost by doing so, is the ability of the web admin
to apply the access controls, plus the user themselves would loose the
ability to do things like:

<Files django.wsgi>
SetEnv some_value 1
LimitRequestBody 10000000
...
</Files>

for just that application.

Part of the issue is also that the routines which need to be used to
setup the internal lookup and redirect for the target script so Apache
can process the request against it instead rely on it being in a
directory under its control as it automatically applies any access
controls, plus settings like the above. So, unless one steps right
outside of Apache functions for doing it and fudge it, you cant do it
without it being in the same directory as the Apache functions will
error because of access to the script file being forbidden.

Graham

Brian Smith

unread,
Jan 16, 2008, 6:07:57 PM1/16/08
to mod...@googlegroups.com
Graham Dumpleton wrote:
> When using WSGIScriptAlias the target script file still must
> be in a directory which is known to Apache with Apache
> configured to allow use of it through its access control mechanisms.

I still think it should work like ScriptAlias[Match]. As the Apache
document says: "It is safer to avoid placing CGI scripts under the
DocumentRoot in order to avoid accidentally revealing their source code
if the configuration is ever changed. The ScriptAlias makes this easy by
mapping a URL and designating CGI scripts at the same time. If you do
choose to place your CGI scripts in a directory already accessible from
the web, do not use ScriptAlias. Instead, use <Directory>, SetHandler,
and Options."

> In the case of shared web hosting they are only going to give
> you the one directory which is under Apache control.
> Therefore it isn't necessarily practical to be putting it
> outside of the your main document directory.

You cannot use WSGIScriptAlias[Match] in this setup anyway, because
WSGIScriptAlias[Match] can only be used within the Apache configuration
file, not .htaccess. Also, most of the shared hosting services I have
used use something like DocumentRoot "/home/username/htdocs," not just
DocumentRoot "/home/username", leaving lots of places outside the
document root to put other files.

> What is potentially lost by doing so, is the ability of the
> web admin to apply the access controls, plus the user
> themselves would loose the ability to do things like:
>
> <Files django.wsgi>
> SetEnv some_value 1
> LimitRequestBody 10000000
> ...
> </Files>
>
> for just that application.

For WSGIScriptAlias[Match] X Y, you can can always use <LocationMatch X>
instead.

> Part of the issue is also that the routines which need to be
> used to setup the internal lookup and redirect for the target
> script so Apache can process the request against it instead
> rely on it being in a directory under its control as it
> automatically applies any access controls, plus settings like
> the above. So, unless one steps right outside of Apache
> functions for doing it and fudge it, you cant do it without
> it being in the same directory as the Apache functions will
> error because of access to the script file being forbidden.

Maybe I am misunderstanding, but I think you should be able to just copy
the code for ScriptAlias.

Regards,
Brian

Graham Dumpleton

unread,
Jan 16, 2008, 8:48:43 PM1/16/08
to mod...@googlegroups.com
On 17/01/2008, Brian Smith <br...@briansmith.org> wrote:
>
> Graham Dumpleton wrote:
> > When using WSGIScriptAlias the target script file still must
> > be in a directory which is known to Apache with Apache
> > configured to allow use of it through its access control mechanisms.
>
> I still think it should work like ScriptAlias[Match]. As the Apache
> document says: "It is safer to avoid placing CGI scripts under the
> DocumentRoot in order to avoid accidentally revealing their source code
> if the configuration is ever changed. The ScriptAlias makes this easy by
> mapping a URL and designating CGI scripts at the same time. If you do
> choose to place your CGI scripts in a directory already accessible from
> the web, do not use ScriptAlias. Instead, use <Directory>, SetHandler,
> and Options."

What I was saying was that when you use ScriptAlias, eg:

ScriptAlias /some/url /some/directory/script

You must still have a corresponding Directory container for the
directory so that Apache knows about it and what access controls it
should apply to it.

<Directory /some/directory>
Order deny,allow
Allow from all
</Directory>

So, the directory doesn't have to be under DocumentRoot, but Apache
still requires you to set the access control permissions.

I don't want to be ignoring Apache's access control mechanisms. I
would rather things work as web server administrators understand them
to work now.

> > In the case of shared web hosting they are only going to give
> > you the one directory which is under Apache control.
> > Therefore it isn't necessarily practical to be putting it
> > outside of the your main document directory.
>
> You cannot use WSGIScriptAlias[Match] in this setup anyway, because
> WSGIScriptAlias[Match] can only be used within the Apache configuration
> file, not .htaccess.

I know. That is the whole point of WSGIDirectoryScript, it would be
something that can work in a directory where you only have ability to
use .htaccess. Ie., similar to DirectoryIndex but for complex scripts
which want to accept path information.

> Also, most of the shared hosting services I have
> used use something like DocumentRoot "/home/username/htdocs," not just
> DocumentRoot "/home/username", leaving lots of places outside the
> document root to put other files.

But those other places don't have something like:

<Directory /home/username/other>
Order deny,allow
Allow from all
</Directory>

If it is a script file which is a direct target of a URL, I want
Apache access controls to apply so adminstrators can control things
like they are used to.

To avoid the issue of sensitive source code in a script, all one needs
to do is make it a thin wrapper that imports stuff from outside of
Apache controlled directories, as already explained in prior examples.

> > What is potentially lost by doing so, is the ability of the
> > web admin to apply the access controls, plus the user
> > themselves would loose the ability to do things like:
> >
> > <Files django.wsgi>
> > SetEnv some_value 1
> > LimitRequestBody 10000000
> > ...
> > </Files>
> >
> > for just that application.
>
> For WSGIScriptAlias[Match] X Y, you can can always use <LocationMatch X>
> instead.

Except that I am talking about in a .htaccess file and Location and
LocationMatch don't work in .htaccess files. The equivalent is Files
and it works from a target resource matched by a URL and not the URL.

> > Part of the issue is also that the routines which need to be
> > used to setup the internal lookup and redirect for the target
> > script so Apache can process the request against it instead
> > rely on it being in a directory under its control as it
> > automatically applies any access controls, plus settings like
> > the above. So, unless one steps right outside of Apache
> > functions for doing it and fudge it, you cant do it without
> > it being in the same directory as the Apache functions will
> > error because of access to the script file being forbidden.
>
> Maybe I am misunderstanding, but I think you should be able to just copy
> the code for ScriptAlias.

The ScriptAlias directives are handled in translate_name phase of
Apache and all they really do is set req.filename, as a first landing
point for where Apache should then start looking for stuff. The
processing of directories and .htaccess files is done in later
map_to_storage phase. The only real opportunity after that point to
try and redirect to a different resource after having being mapped to
a directory is way down in fixup phase just prior to the response
handler actually being called. This is because directives in a
.htaccess file can't trigger translate_name stuff as it is too late.
The best one can do at the fixup phase point is a fast internal
redirect, which is a totally different thing and nothing like what
ScriptAlias triggers back in translate_name phase. For an example, see
mod_dir.

Graham

Brian Smith

unread,
Jan 16, 2008, 9:42:33 PM1/16/08
to mod...@googlegroups.com
Graham Dumpleton wrote:
> You must still have a corresponding Directory container for
> the directory so that Apache knows about it and what access
> controls it should apply to it.

<snip>

> I don't want to be ignoring Apache's access control
> mechanisms. I would rather things work as web server
> administrators understand them to work now.

I totally agree. I wasn't intending to argue otherwise.

> I know. That is the whole point of WSGIDirectoryScript, it
> would be something that can work in a directory where you
> only have ability to use .htaccess. Ie., similar to
> DirectoryIndex but for complex scripts which want to accept
> path information.

I understand that. But, what I am saying is that, if "WSGIDirectorScript
/path/to/script.wsgi" worked like "WSGIScriptAlias /
/path/to/script.wsgi", then it would *also* allow .htaccess-only users
to implement the Apache-ecommended practice of keeping script files
outside of the document root. (But, see below.)

> > like DocumentRoot "/home/username/htdocs," not just DocumentRoot
> > "/home/username", leaving lots of places outside the
> > document root to put other files.
>
> But those other places don't have something like:
>
> <Directory /home/username/other>
> Order deny,allow
> Allow from all
> </Directory>
>

Okay, I accept that many or most shared hosting users might have to keep
scripts in the DocumentRoot. But, that doesn't mean that everybody
should have to.

> If it is a script file which is a direct target of a URL, I
> want Apache access controls to apply so adminstrators can
> control things like they are used to.

I agree. Like I said, I want it to work like CGI with respect to access
control.

> > > What is potentially lost by doing so, is the ability of the web
> > > admin to apply the access controls, plus the user
> themselves would
> > > loose the ability to do things like:
> > >
> > > <Files django.wsgi>
> > > SetEnv some_value 1
> > > LimitRequestBody 10000000
> > > ...
> > > </Files>
> > >
> > > for just that application.
> >
> > For WSGIScriptAlias[Match] X Y, you can can always use
> <LocationMatch
> > X> instead.
>
> Except that I am talking about in a .htaccess file and
> Location and LocationMatch don't work in .htaccess files. The
> equivalent is Files and it works from a target resource
> matched by a URL and not the URL.

If that is the only option that a user has, then they will have to put
the script within the DocumentRoot. But, again, that doesn't mean that
somebody with full access to the Apache configuration file should have
to do that.

> > Maybe I am misunderstanding, but I think you should be able to just
> > copy the code for ScriptAlias.
>
> The ScriptAlias directives are handled in translate_name
> phase of Apache and all they really do is set req.filename,
> as a first landing point for where Apache should then start
> looking for stuff. The processing of directories and
> .htaccess files is done in later map_to_storage phase.

So, WSGIScriptAlias[Match] could easily allow scripts to be placed
outside of the DocumentRoot, but the WSGIDirectoryScript directive could
not easily do so, right?

I am not really sure if you are against the idea of allowing .wsgi files
to be outside the DocumentRoot, or if you think it is just difficult or
impossible to implement.

Regards,
Brian

Graham Dumpleton

unread,
Jan 16, 2008, 10:28:20 PM1/16/08
to mod...@googlegroups.com
On 17/01/2008, Brian Smith <br...@briansmith.org> wrote:
> > > Maybe I am misunderstanding, but I think you should be able to just
> > > copy the code for ScriptAlias.
> >
> > The ScriptAlias directives are handled in translate_name
> > phase of Apache and all they really do is set req.filename,
> > as a first landing point for where Apache should then start
> > looking for stuff. The processing of directories and
> > .htaccess files is done in later map_to_storage phase.
>
> So, WSGIScriptAlias[Match] could easily allow scripts to be placed
> outside of the DocumentRoot, but the WSGIDirectoryScript directive could
> not easily do so, right?

Correct. All comes down to what is possible with Apache depending on
whether directive is defined at global scope of VirtualHost or in
Directory/.htaccess context. Am trying to use the functions that
Apache provides to do things in the conventional manner as it simply
saves a lot of work and I know it is secure code already.

> I am not really sure if you are against the idea of allowing .wsgi files
> to be outside the DocumentRoot, or if you think it is just difficult or
> impossible to implement.

The way I see it is that if you have access to main Apache
configuration then by all means do that, with it being able to be done
using WSGIScriptAlias directives. If you are just a user and those who
administer the box are intentionally restricting you to a specific
area, then I don't believe there should be an expectation that you be
able to break out of that box and have the initial target script files
themselves elsewhere.

FWIW, in addition to the WSGI script file being just a wrapper, when
using mod_wsgi daemon mode and the daemon mode runs as the same user
as owns the site, and not the Apache user, there is actually other
things that the user can do to stop inadvertent exposure of any code
which has to be in the script file. This comes about through the user
controlling permissions on files, but also more importantly perhaps,
on the directories themselves.

First thing is that in daemon mode main Apache child process do not
need to read the contents of the WSGI script files, so the permissions
can be set to 0600, ie., readable only to owner. Apache child
processes don't need to read the files as it is the mod_wsgi daemon
process running as the user that does.

The next thing is that map_to_storage in Apache only needs to be able
to stat the target WSGI script file. So, if the only thing in the
directory was the WSGI script file, the directory permission could be
0701/rwx-----x. So, Apache would still be able to work out the file
existed, but no one would be able to generate a directory listing
except for the owner.

The lack of the directory listing ability would mean though that
mod_index obviously wouldn't work. It also will prevent Apache
MultiViews from working as Apache wouldn't be able to get a listing of
files in the directory to determine if there was a related file by
basename. If those two features were still required then no choice but
to have 0755/rwxr-xr-x.

If all you had were WSGI script files, you could thus hide things
quite nicely. If you had static files as well, then you can still lock
off the directory, but the static files would at least have to be
readable to others. If you didn't want the static files readable to
others, or at minimum Apache group, if groups used in that way, then
use wsgi.file_wrapper to serve them via the WSGI application and you
can put them out of site as well.

Graham

Reply all
Reply to author
Forward
0 new messages