Mixing with static content and using SetHandler for non-existent URLs

28 views
Skip to first unread message

Deron Meranda

unread,
Mar 23, 2010, 11:01:36 AM3/23/10
to mod...@googlegroups.com
I'm trying to mix both static content (or content from other types
of handlers) with mod_wsgi handlers. I don't want to partition
things out into separate directories (like /static and /wsgiscripts),
but want them all to intermingle.

Through a variety of AddHandler and DirectoryIndex directives
I can get close. However I'm used to the mod_python feature
where it was pretty easy to make a SetHandler rule that would
get called for all URLs, regardless if there was actually a
file to map to. E.g., I could pop this into an .htaccess:

# Example in mod_python
SetHandler mod_python
PythonHandler mysitehandler::handler
<Files ~ "\.(css|js|png|jpg|txt|pdf)$">
SetHandler None
</Files>

However with mod_wsgi, the only thing I'm lacking is something
equivalent to the "PythonHandler". So when the SetHandler wsgi-script
is in effect, it tries to use the URL as the path to the WSGI script,
rather then letting me pick a specific script to use.

I'm not having much luck trying to do something similar to how
I've used mod_python. Close, but I still seem to have holes.
I'd like something that I can use at the .htaccess level if possible.

Does anybody know what I'm missing?
--
Deron Meranda

Deron Meranda

unread,
Mar 23, 2010, 1:23:06 PM3/23/10
to mod...@googlegroups.com
On Tue, Mar 23, 2010 at 11:01 AM, Deron Meranda <deron....@gmail.com> wrote:
> I'm trying to mix both static content (or content from other types
> of handlers) with mod_wsgi handlers.  I don't want to partition
> things out into separate directories (like /static and /wsgiscripts),
> but want them all to intermingle.

I kind of got this working, I think, using mod_rewrite rules. It's a
bit ugly, and I had to do things slightly different than what's in
the documentation wiki (ConfigurationGuidelines).

Here's what I had to add to my VirtualHost (I'm not done testing
all cases yet, but I think this works):

<VirtualHost ...
.... other apache directives ....
<IfModule wsgi_module>
DirectoryIndex index.wsgi
WSGI* -- .... other mod_wsgi directives ....
AddHandler wsgi-script .wsgi
RewriteEngine On
RewriteLog /var/log/httpd/rewrite.log
RewriteLogLevel 4
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /index.wsgi/$1 [QSA,PT,L]
</IfModule>
</VirtualHost>

Now, if a file exists at the URL, then it is served normally as
Apache usually does. But if there is no file, then a single
site-wide wsgi script is called to handle it. Which is what
I want. I want wsgi to get anything that would have normally
resulted in a 404.

(Note that I could probably do away with the DirectoryIndex
part if I added another and'ed condition with !-d ???)


For some reason I had to prepend %{DOCUMENT_ROOT}/
to the rewrite condition, because %{REQUEST_FILENAME} was
a relative path from the document root and the !-f condition
requires an absolute path. (Setting RewriteLogLevel to 4 let
me finally debug that)

So this will suffice, but it still seems much messier than it was
under mod_python. Plus I'd like to be able to do this on a
per-directory level rather than at the site/virtual-host level,
as well as being able to override any of this in local .htaccess
files -- I'm not sure yet if I can do that using this technique.

It still might be nice to have something similar to PythonHandler,
where I could set the wsgi script be something other than the URL.
--
Deron Meranda

Graham Dumpleton

unread,
Mar 23, 2010, 6:08:48 PM3/23/10
to mod...@googlegroups.com

There are a couple of ways. First is to use:

WSGIScriptAlias /some/url/myhandler /some/path/myhandler.wsgi

<Directory /some/path>
Order Allow,Deny
Deny from All
</Directory>

Action my-wsgi-handler /some/url/myhandler

Then in context you need it, use:

SetHandler my-wsgi-handler

<Files ~ "\.(css|js|png|jpg|txt|pdf)$">
SetHandler None
</Files>

Note that Apache is doing a sub request for the 'Action' directive and
as such SCRIPT_NAME gets change to match the target handler and not
the original resource. You will need to look at the key variables and
adjust where necessary. Also look at the REDIRECT_??? variables as
that may hold information of use about original resource.

Also, the '/some/url/myhandler' URL will be directly accessible unless
you also add a mod_rewrite rule that blocks access unless IS_SUBREQ is
true.

A second easier way is to use:

WSGIHandlerScript my-wsgi-handler /some/path/myhandler.wsgi

Again use following in context you need it.

SetHandler my-wsgi-handler

<Files ~ "\.(css|js|png|jpg|txt|pdf)$">
SetHandler None
</Files>

Stuff like SCRIPT_NAME will be correct as per the original resource as
no sub request performed.

Note that if using 'Action', the the WSGI script file should have
entry point as 'application'. For various reasons, if using
WSGIHandlerScript the entry point should be called 'handle_request'.
This will change in future mod_wsgi version and 'application' will
instead be used as default with it being able to be overridden.

Graham

Deron Meranda

unread,
Mar 23, 2010, 8:18:57 PM3/23/10
to mod...@googlegroups.com
> There are a couple of ways. First is to use:
>
>  WSGIScriptAlias /some/url/myhandler /some/path/myhandler.wsgi
>...
>  Action my-wsgi-handler /some/url/myhandler
>...
>  SetHandler my-wsgi-handler

Interesting. All these years and I've never used Apache's Action,
nor thought to look for it.


> A second easier way is to use:
>
>  WSGIHandlerScript my-wsgi-handler /some/path/myhandler.wsgi

>...
>  SetHandler my-wsgi-handler

Thanks, that sounds pretty close to what I was wanting.
I failed to stumble across the WSGIHandlerScript directive in
any of the documentation.

I'll have to try it out later.

-----

Oh, what I was doing up to this, using mod_rewrite, is almost
working. Except for the ability to override things with a local
per-directory .htaccess and similar loss of flexibility. But it is
close.

Though it took me a whie to discovered that I had to use a slightly
different syntax than what you documented. This is because of
mod_rewrite's %{REQUEST_FILENAME} variable, which acts
differently in a VirtualHost than it does at the server level (it even
says so in the docs once you read carefully).

You might want to update the wiki (ConfigurationGuidelines)
where you show he mod_rewrite example to mention the
virtual host difference.

At server level, use:
RewriteCond %{REQUEST_FILENAME} !-f

But in a virtual host, use:
RewriteCond %{DIRECTORY_ROOT}/%{REQUEST_URI} !-f

(Actually you can probably use the later form in both contexts)

Thanks Graham
--
Deron Meranda

Deron Meranda

unread,
Mar 23, 2010, 8:22:08 PM3/23/10
to mod...@googlegroups.com
> But in a virtual host, use:
> RewriteCond %{DIRECTORY_ROOT}/%{REQUEST_URI} !-f

Oops, I was typing too fast. Meant to say:

RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} !-f


(Also the "/" separator may be unnecessary, but it makes me
feel safer until I prove to myself that it's never needed.)

--
Deron Meranda

Graham Dumpleton

unread,
Mar 23, 2010, 8:35:57 PM3/23/10
to mod...@googlegroups.com

Am not sure REQUEST_URI will always work in that situation as it isn't
decoded nor are repeating slashes removed.

PATH_INFO: '/sa/asdf/as/df/asdf/ '
PATH_TRANSLATED: '/Users/grahamd/Testing/tests/sa/asdf/as/df/asdf/ '
REQUEST_URI: '/echo.wsgi/sa//asdf/as/df/asdf///%20'
SCRIPT_FILENAME: '/Users/grahamd/Testing/tests/echo.wsgi'
SCRIPT_NAME: '/echo.wsgi'

Graham

Deron Meranda

unread,
Mar 24, 2010, 1:34:44 AM3/24/10
to mod...@googlegroups.com
On Tue, Mar 23, 2010 at 8:35 PM, Graham Dumpleton
<graham.d...@gmail.com> wrote:
>> RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} !-f
>
> Am not sure REQUEST_URI will always work in that situation as it isn't
> decoded nor are repeating slashes removed.

Good point. Better to use %{REQUEST_FILENAME}.
Still, you need the %{DOCUMENT_ROOT} part if in a virtual host.


Anyway I just tried out the WSGIHandlerScript directive, and it works
great! (as long as you have 3.0 or greater) That was exactly what
I was looking for.

One difference between it and mod_python's PythonHandler that
stood out to me as somewhat significant is that this WSGI directive
apparently isn't allowed in the .htaccess or scopes other than server-level.

Though, for most cases, you can still use the 'SetHandler wsgi-script'
approach to override the WSGIHandlerScript in whatever config scopes
you want. Or you could just turn your main wsgi handler into more of a
dispatcher/middleware layer.

So its not a that big of a restriction; just a bit different.


Graham, I know you're terribly busy, but perhaps at some point could you
update the wiki page ConfigurationDirectives to get it up to date? I had
assumed it reflected the latest, and hence just didn't discover the
WSGIHandlerScript on my own.

You do have it (and many other newer features) documented in the various
changelog pages -- it would just be nice to have it all in one place.

Thanks again.
--
Deron

Graham Dumpleton

unread,
Mar 24, 2010, 1:44:30 AM3/24/10
to mod...@googlegroups.com
On 24 March 2010 16:34, Deron Meranda <deron....@gmail.com> wrote:
> On Tue, Mar 23, 2010 at 8:35 PM, Graham Dumpleton
> <graham.d...@gmail.com> wrote:
>>> RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} !-f
>>
>> Am not sure REQUEST_URI will always work in that situation as it isn't
>> decoded nor are repeating slashes removed.
>
> Good point.  Better to use %{REQUEST_FILENAME}.
> Still, you need the %{DOCUMENT_ROOT} part if in a virtual host.
>
>
> Anyway I just tried out the WSGIHandlerScript directive, and it works
> great! (as long as you have 3.0 or greater)  That was exactly what
> I was looking for.
>
> One difference between it and mod_python's PythonHandler that
> stood out to me as somewhat significant is that this WSGI directive
> apparently isn't allowed in the .htaccess or scopes other than server-level.

Correct. I see the job of specifying special handlers for new resource
types to be a job of the Apache administrator. Usually it is something
that would only be achievable by adding new Apache modules, something
else only an administrator can do.

But then, the Action directive is allowed in FileInfo override
context, ie., .htaccess, so one can do something similar that way. As
such, it perhaps should be allowed in .htaccess files and allowing
that is pretty simple to do. Just change:

AP_INIT_RAW_ARGS("WSGIHandlerScript", wsgi_add_handler_script,
NULL, ACCESS_CONF|RSRC_CONF, "Location of WSGI handler script file."),

to:

AP_INIT_RAW_ARGS("WSGIHandlerScript", wsgi_add_handler_script,
NULL, OR_FILEINFO, "Location of WSGI handler script file."),

and let me know if any problems come up with using it in .htaccess files.

> Though, for most cases, you can still use the 'SetHandler wsgi-script'
> approach to override the WSGIHandlerScript in whatever config scopes
> you want.  Or you could just turn your main wsgi handler into more of a
> dispatcher/middleware layer.
>
> So its not a that big of a restriction; just a bit different.
>
>
> Graham, I know you're terribly busy, but perhaps at some point could you
> update the wiki page ConfigurationDirectives to get it up to date?  I had
> assumed it reflected the latest, and hence just didn't discover the
> WSGIHandlerScript on my own.
>
> You do have it (and many other newer features) documented in the various
> changelog pages -- it would just be nice to have it all in one place.

I know it isn't up to date. Finding time, and motivation, is hard.

The WSGIHandlerScript directive is also still a little subject to
tweaking and why I wasn't really publicising it. :-)

Graham

Deron Meranda

unread,
Mar 24, 2010, 2:17:23 AM3/24/10
to mod...@googlegroups.com
On Wed, Mar 24, 2010 at 1:44 AM, Graham Dumpleton
<graham.d...@gmail.com> wrote:
> As such, it perhaps should be allowed in .htaccess files and allowing
> that is pretty simple to do. Just change:
>
>    AP_INIT_RAW_ARGS("WSGIHandlerScript", wsgi_add_handler_script,
>        NULL, ACCESS_CONF|RSRC_CONF, "Location of WSGI handler script file."),
>
> to:
>
>    AP_INIT_RAW_ARGS("WSGIHandlerScript", wsgi_add_handler_script,
>        NULL, OR_FILEINFO, "Location of WSGI handler script file."),
>
> and let me know if any problems come up with using it in .htaccess files.

So far, that seems to work well. Though I haven't
really tested it very hard yet.
--
Deron Meranda

Graham Dumpleton

unread,
Mar 24, 2010, 5:59:19 AM3/24/10
to mod...@googlegroups.com

The problem is that adding that change creates a security hole.

This is because the WSGIHandlerScript accepts process-group option.
Thus if this was a shared system with mod_wsgi daemon process groups
run as different users then you would be able to have your code run in
the context of any of those process groups which weren't restricted
from you and so execute code as another user. Restrictions on process
groups one can delegate to isn't a default and has to be set up, so
typical setup would have this problem.

The solution would be to disallow process-group option if directive is
used in a .htaccess file.

Note that application-group is also usable, but all it is going to do
is to allow you run in context of a different interpreter of what ever
process application would run in. This isn't ideal either, but just
realised that allowed WSGIAuth???Script directive in .htaccess with
Auth is allowed in .htaccess has already opened up that can of worms.

It isn't that allowing application-group is itself a security hole
because as running different users code in same process as same user
already is a security problem and should never be used in shared
hosting. Specifically code could already access user data for other
applications or modify writable files by Apache user used by others
user code. The application-group option does make it easier to
access/modify in process running Python, but this was already
possible, albeit much harder, by loading a custom C extension module
that use Python C APIs to inject code/calls into another sub
interpreter.

So, application-group can be lived with, but process-group definitely
cannot. This would need to be addressed if were to make that change.

Graham

Graham Dumpleton

unread,
Mar 24, 2010, 6:52:44 AM3/24/10
to mod...@googlegroups.com

A couple more things I forgot to mention. From memory, unless you
override application-group with WSGIHandlerScript, or set
WSGIApplicationGroup directive, although the one WSGI script for
handling requests, a new sub interpreter is created corresponding to
each unique resource matched.

You should therefore say something like:

WSGIHandlerScript my-wsgi-handler /some/path/application.wsgi
application-group=%{GLOBAL}

or:

WSGIHandlerScript my-wsgi-handler /some/path/application.wsgi

and set:

WSGIApplicationGroup %{GLOBAL}

in context for where all resources are.

This will force use of same sub interpreter in process.

Also, you can probably make your configuration simpler or more precise
by still using mod_rewrite using something like:

WSGIHandlerScript my-wsgi-handler /some/path/application.wsgi

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule .* - [H=my-wsgi-handler]

In other words, any request that doesn't match a file based resource
should be handled by handler of type my-wsgi-handler.

Note, I have tested this, so if you try it let me know outcome.

This will save you having to exclude by extension and allows non
existent files of those extensions to still potentially be served by
Python web application so it could generate them rather than outright
returning a 404.

This would allow you for example to use file system as cache and if
file not present pass through to web application to generate it and
return via wsgi.file_wrapper first time. On subsequent times could
pick up file you write to cache directory when generated. Just need to
be careful to write file under different name and move it into place
so is complete and partial file not picked up by another request
arriving while first is generating the file.

Graham

Reply all
Reply to author
Forward
0 new messages