[cherrypy-users] mod_proxy, WSGI and the SCRIPT_NAME value

98 views
Skip to first unread message

Eric Larson

unread,
Jun 19, 2007, 10:12:24 AM6/19/07
to cherryp...@googlegroups.com
Hi all,

I was using mod_proxy and apache for serving my cherrypy WSGI
application. I have been using a lot of paste middleware which in many
cases assumes the SCRIPT_NAME is set in the environ dict. I added a
small bit of middleware that takes the base URL given to tools.proxy
and gets a script name from it.

This is the third time I have run into this small difference and while
it has been fixable, it was tough (for me at least) to track down. Is
there any existing middleware or tools that might help get around this
problem without having to write my own fixes? Also, if adding some
functionality to add the SCRIPT_NAME to the tools.proxy, I am more
than happy to send a patch.

Thanks!

Eric

Robert Brewer

unread,
Jun 20, 2007, 1:44:49 PM6/20/07
to cherryp...@googlegroups.com

I'm not sure what you're suggesting; a little code might help me
understand. :)

I *think* you're saying that you have a WSGI environ dict (where?)
that's missing the SCRIPT_NAME entry...? CherryPy's WSGI server always
sets that value, even if blank. On the WSGI application side, CherryPy
correctly assumes SCRIPT_NAME is blank if omitted (as the spec
dictates).

But I'm probably wrong, because that doesn't help me understand what
"add the SCRIPT_NAME to the tools.proxy" means; tools.proxy is for
adjusting request.base, which is the portion of the URI up to, but not
including, the SCRIPT_NAME.


Robert Brewer
System Architect
Amor Ministries
fuma...@amor.org

Eric Larson

unread,
Jun 22, 2007, 7:40:25 PM6/22/07
to cherryp...@googlegroups.com
Sorry for being vague. When I meant that the SCRIPT_NAME is missing, I
did meant that it is blank.

The basic setup I am using is basically a rewrite rule in apache to go from:

http://hostname.com/my_app/ ---> My CherryPy Application

I set the proxy to use "http://hostname.com/my_app/" as the base url
in the tools.proxy. I set the tools.proxy.local to '' so it doesn't
just use the "X-Forwarded-Host" header. The problem is that it still
seems to get the X-Fowarded-Host header. The result is that the
hostname in the WSGI dict is set to "hostname.com" and the script_name
is left blank, which makes reconstructing the real base_url of my app
impossible.

My solution was to make a bit of middleware that just fixes up the
environ before it gets to my app. I take the base_url I am passing in
to the tools.proxy.base and get the tail of it and set the script_name
as that value. This makes the the url reconstruction
(http://www.python.org/dev/peps/pep-0333/#url-reconstruction) work for
me again.

It is not clear to me why things don't work b/c it would seem setting
tools.proxy.local would do the right thing, but it didn't seem to work
for me so maybe it is a bug... With that said, if there isn't a bug,
is my fix to get the script_name set a bad idea?

I am really just trying to learn here, so if I missed something
obvious or a best practice please let me know. I have pasted my
middleware fix and my server config below to hopefully clear up any
confusion and reveal some silly mistake :)

Thanks!
Eric

Config:

import cherrypy

class Root:
pass

if __name__ == '__main__':
global_conf = { 'engine.autoreload_on' : True,
'server.socket_port' : 5000,
'tools.proxy.on' : True,
'tools.proxy.base' : app_conf['weblog_base_url'],
'tools.proxy.local' : '',
}

cherrypy.config.update({ 'global' : global_conf })

conf = {'/' : { 'tools.wsgiapp.on' : True,
'tools.wsgiapp.app' : app,
'tools.gzip.on' : True,
'tools.trailing_slash.on' : True }}

cherrypy.tree.mount(Root(), '/', config=conf)
cherrypy.server.quickstart()

try:
cherrypy.engine.start()
except KeyboardInterrupt:
cherrypy.engine.stop()

Quick Hack:

from urlparse import urlparse
class ScriptName(object):
def __init__(self, app_conf, app=None):
self.app_conf = app_conf
self.app = app
self.base = self.app_conf['weblog_base_url']
self.url = urlparse(self.base)
self.script_name = self.url[2].rstrip('/')

def __call__(self, environ, start_response):
if environ.get('HTTP_X_FORWARDED_HOST'):
if environ.get('SCRIPT_NAME', '') == '':
environ['SCRIPT_NAME'] = self.script_name
return self.app(environ, start_response)

fumanchu

unread,
Jun 22, 2007, 9:23:33 PM6/22/07
to cherrypy-users
On Jun 22, 4:40 pm, "Eric Larson" <ionr...@gmail.com> wrote:
> Sorry for being vague. When I meant that the SCRIPT_NAME is missing, I
> did meant that it is blank.
>
> The basic setup I am using is basically a rewrite rule in apache to go from:
>
> http://hostname.com/my_app/---> My CherryPy Application

>
> I set the proxy to use "http://hostname.com/my_app/" as the base url
> in the tools.proxy. I set the tools.proxy.local to '' so it doesn't
> just use the "X-Forwarded-Host" header. The problem is that it still
> seems to get the X-Fowarded-Host header. The result is that the
> hostname in the WSGI dict is set to "hostname.com" and the script_name
> is left blank, which makes reconstructing the real base_url of my app
> impossible.
>
> My solution was to make a bit of middleware that just fixes up the
> environ before it gets to my app. I take the base_url I am passing in
> to the tools.proxy.base and get the tail of it and set the script_name
> as that value. This makes the the url reconstruction
> (http://www.python.org/dev/peps/pep-0333/#url-reconstruction) work for
> me again.
>
> It is not clear to me why things don't work b/c it would seem setting
> tools.proxy.local would do the right thing, but it didn't seem to work
> for me so maybe it is a bug... With that said, if there isn't a bug,
> is my fix to get the script_name set a bad idea?

Not if that's what you need for your next wsgiapp. However:

1) the wsgiapp Tool doesn't obey the WSGI spec (and cannot be fixed
to do so), and
2) most of CherryPy (including all the tools) aren't WSGI-aware (and
won't be because of point 1).

So...the best I can recommend is that you run your other app side-by-
side with your CherryPy handlers instead of nesting them. That is,
instead of the dispatch graph:

WSGI Server -> CP App -> WSGI app

you instead do:

WSGI Server -> Dispatcher -> CP App
`-> WSGI app

Eric Larson

unread,
Jun 23, 2007, 12:22:37 AM6/23/07
to cherryp...@googlegroups.com

Hmm... To be honest I thought that was pretty much what I was doing?
When you say "WSGI Server" I thought I was using the CherryPy server
to serve my WSGI application.

WSGI Server (CherryPy) -> WSGI app

In this case I don't even have an actual CherryPy app (ie no class or
methods with things like @expose).

Would you mind clearing up where I am confused?

Thanks for all the help!

Eric

fumanchu

unread,
Jun 23, 2007, 11:40:59 AM6/23/07
to cherrypy-users
On Jun 22, 9:22 pm, "Eric Larson" <ionr...@gmail.com> wrote:

> On 6/22/07, fumanchu <fuman...@amor.org> wrote:
> > 1) the wsgiapp Tool doesn't obey the WSGI spec (and cannot be fixed
> > to do so), and
> > 2) most of CherryPy (including all the tools) aren't WSGI-aware (and
> > won't be because of point 1).
>
> > So...the best I can recommend is that you run your other app side-by-
> > side with your CherryPy handlers instead of nesting them. That is,
> > instead of the dispatch graph:
>
> > WSGI Server -> CP App -> WSGI app
>
> > you instead do:
>
> > WSGI Server -> Dispatcher -> CP App
> > `-> WSGI app
>
> Hmm... To be honest I thought that was pretty much what I was doing?
> When you say "WSGI Server" I thought I was using the CherryPy server
> to serve my WSGI application.
>
> WSGI Server (CherryPy) -> WSGI app
>
> In this case I don't even have an actual CherryPy app (ie no class or
> methods with things like @expose).
>
> Would you mind clearing up where I am confused?

I'll do my best ;)

You might want to start by looking at http://www.cherrypy.org/wiki/WSGI#Visualmodel,
because I'll be using its terminology.

It sounds to me now like you just want the single link:

cherrypy.wsgiserver -> other WSGI Application

...where "other" could be a whole set of middleware and such, but no
CP components. If so, then you don't need any of the CP tools, because
they're all designed to go into effect only for CherryPy apps (methods
with @expose, etc). So the only remaining choices are whether you want
to use cherrypy.tree and cherrypy.server or not.

If you don't want to use either of them, your code would look
something like this:

import threading
from cherrypy import wsgiserver
s = wsgiserver.CherryPyWSGIServer(
('0.0.0.0', 5000), my_WSGI_nextapp)
threading.Thread(target=s.start).start()

cherrypy.server is actually a server manager; it will do the threading
for you, and adds error trapping, plus port open/closed checks. To
take advantage of that, you could write:

import cherrypy
from cherrypy import wsgiserver

s = wsgiserver.CherryPyWSGIServer(
('0.0.0.0', 5000), my_WSGI_nextapp)
cherrypy.server.socket_host = '0.0.0.0'
cherrypy.server.socket_port = 5000
cherrypy.server.quickstart(s)

cherrypy.tree is a WSGI dispatcher, but unlike paste's URLMap (which
dispatches on PATH_INFO), it dispatches on SCRIPT_NAME + PATH_INFO. By
default, cherrypy.server wraps cherrypy.wsgiserver and hands it
cherrypy.tree as the WSGI app. That wrapper also copies all of the
server.* config for you to the wsgiserver. You can use it like this:

import cherrypy

global_conf = {'server.socket_port' : 5000,
'server.socket_host': '0.0.0.0',
}
cherrypy.config.update({'global': global_conf})
cherrypy.tree.graft(my_WSGI_nextapp, '/')
cherrypy.server.quickstart()

Note that in all the preceding, there are no tools. Tools run after
WSGI is done, and aren't designed for WSGI. Several of them have good
logic that you could use to make a bit of WSGI middleware that does
something similar, but note that some of them are very tricky to get
right, especially in their order of execution.

I also left out the engine and its autoreload feature, because I'm not
sure whether you're using CP 3.0 or 3.1alpha (trunk), and there are
large differences between the two.

Hope that helps!

Eric Larson

unread,
Jun 23, 2007, 3:07:20 PM6/23/07
to cherryp...@googlegroups.com

That makes sense to me now. Thanks so much for clearing that up!

One more question... I had originally started using the setup I had
b/c it was the example in the CP book and also b/c I could use the
trailing_slash tool. Is there a way I can use that within my server
config (ie my global_conf)?

Thanks again for all the help! Btw, I have been reading the CP code a
bit and it is very easy to read and understand. I feel as though if I
find a bug I could actually submit a patch without much trouble. Of
course, I am doubting I will be finding any bugs anytime soon!

Best,

Eric

fumanchu

unread,
Jun 23, 2007, 3:25:23 PM6/23/07
to cherrypy-users
On Jun 23, 12:07 pm, "Eric Larson" <ionr...@gmail.com> wrote:
> One more question... I had originally started using the setup I had
> b/c it was the example in the CP book and also b/c I could use the
> trailing_slash tool. Is there a way I can use that within my server
> config (ie my global_conf)?

Not as-is, because it operates on cherrypy.request attributes, and
you're not using cherrypy.request. The trailing_slash tool (like all
CP tools) only works with CherryPy handlers.

You could do something similar in WSGI middleware (and someone
probably already has). The HTTP redirect is easy, it's deciding
whether the URL is "supposed to" refer to an index page or not that's
tricky.

> Thanks again for all the help! Btw, I have been reading the CP code a
> bit and it is very easy to read and understand. I feel as though if I
> find a bug I could actually submit a patch without much trouble. Of
> course, I am doubting I will be finding any bugs anytime soon!

Thanks! We work hard at keeping the scope under control so that we can
spend more time making the API, code, and component model very clean.

Good luck with your project!

Reply all
Reply to author
Forward
0 new messages