WSGI environ

40 views
Skip to first unread message

christian

unread,
Jan 28, 2006, 12:08:21 PM1/28/06
to cherrypy-devel
Hi guys,

What would you all say to making the WSGI environ available inside of a
CherryPy app? I'm thinking something like:

cherrypy.request.environ

or

cherrypy.wsgi.environ

I am working on a subclass of _cpwsgi.WSGIServer that allows WSGI
middleware to be chained together with the CherryPy WSGI app. For some
WSGI middleware, like session middleware
(http://www.saddi.com/software/py-lib/#Session), it would be useful to
access the WSGI environment within page handlers (to actually access
the session).

Anyhow, it seems trivial to add it, but I just want to get input from
you guys.

Christian
http://www.dowski.com

christian

unread,
Jan 29, 2006, 4:09:38 PM1/29/06
to cherrypy-devel

Ok, I still want input, but I have a minor update.

I worked up a patch that implements the "cherrypy.request.environ"
feature _and_ updated the WSGI server to support WSGI middleware. I
also have CP setting the requested path to the PATH_INFO variable
instead of SCRIPT_NAME (will that cause problems?). All tests pass for
all servers.

The patch is at
http://projects.dowski.com/files/wsgi_filter/middleware_server_patch.diff.
I can be applied to 2.2beta / revision 949.

I'll post an example use later.

Christian
http://www.dowski.com

christian

unread,
Jan 29, 2006, 5:10:25 PM1/29/06
to cherrypy-devel
Patch, example and other thoughts are now in a ticket.
http://www.cherrypy.org/ticket/455

Robert Brewer

unread,
Jan 30, 2006, 4:39:37 AM1/30/06
to cherryp...@googlegroups.com
christian wrote:
> What would you all say to making the WSGI environ available inside of a
> CherryPy app? I'm thinking something like:
>
> cherrypy.request.environ

It's a good idea. It should probably be cherrypy.request.wsgi_environ, however, since it is functionality specific to wsgi.

> I worked up a patch that implements the "cherrypy.request.environ"
> feature _and_ updated the WSGI server to support WSGI middleware.
> I also have CP setting the requested path to the PATH_INFO variable
> instead of SCRIPT_NAME (will that cause problems?). All tests pass
> for all servers.

1. I strongly believe that the argument list for the request object shouldn't change. Instead, _cpwsgi should just attach the environ to the Request object before it calls "run" (that's why "run" isn't performed automatically in Request.init--so your server-side interface can muck about with the Request instance before handing control over to it). In other words, do it just like .login, .multithread and .multiprocess are done.

2. The SCRIPT_NAME and PATH_INFO munging has been asked for before and was denied (or at least delayed) for good reasons. Suffice to say that in CP 2, cherrypy.root will continue to be absolute, not relative to each app. That is, if you have an app mounted at "/users/dowski/myapp", cherrypy.root should map to "/", and "myapp/" should map to cherrypy.root.users.dowski.myapp.index. Therefore, SCRIPT_NAME should still be used to find the proper handler.

3. The WSGIServer should continue in its role as a server, and call a single wsgiApp. If someone wants to chain middleware, they should pass an "app" argument which has already wrapped the appropriate pieces. Unfortunately, that's harder than it should be. :( You have to subclass WSGIServer (or use a factory function) and set your own default for the "app" argument. That should be fixed, somehow.


Robert Brewer
System Architect
Amor Ministries
fuma...@amor.org

winmail.dat

christian

unread,
Jan 30, 2006, 10:14:50 AM1/30/06
to cherrypy-devel
Robert Brewer wrote:
> christian wrote:
> > What would you all say to making the WSGI environ available inside of a
> > CherryPy app? I'm thinking something like:
> >
> > cherrypy.request.environ
>
> It's a good idea. It should probably be cherrypy.request.wsgi_environ, however, since it is functionality specific to wsgi.


Glad you agree that it would be useful. You are right about the name -
environ on its own could be confusing outside of the WSGI context.
cherrypy.request.wsgi_environ it shall be (barring any other
objections).

> > I worked up a patch that implements the "cherrypy.request.environ"
> > feature _and_ updated the WSGI server to support WSGI middleware.
> > I also have CP setting the requested path to the PATH_INFO variable
> > instead of SCRIPT_NAME (will that cause problems?). All tests pass
> > for all servers.
>
> 1. I strongly believe that the argument list for the request object shouldn't change. Instead, _cpwsgi should just attach the environ to the Request object before it calls "run" (that's why "run" isn't performed automatically in Request.init--so your server-side interface can muck about with the Request instance before handing control over to it). In other words, do it just like .login, .multithread and .multiprocess are done.

Duh. Good point. That touches a lot less code and still provides the
feature.

> 2. The SCRIPT_NAME and PATH_INFO munging has been asked for before and was denied (or at least delayed) for good reasons. Suffice to say that in CP 2, cherrypy.root will continue to be absolute, not relative to each app. That is, if you have an app mounted at "/users/dowski/myapp", cherrypy.root should map to "/", and "myapp/" should map to cherrypy.root.users.dowski.myapp.index. Therefore, SCRIPT_NAME should still be used to find the proper handler.

Hhhmm...I have read http://www.cherrypy.org/ticket/444 and the main
observation that I came away with was that you and Ian were operating
on different wavelengths or something ;-) Hopefully we can associate
the the same access point here...

With my modified code, a request to
http://localhost:8080/users/dowski/myapp sets up the following:

environ["SCRIPT_NAME"] = ""
environ["PATH_INFO"] = "/users/dowski/myapp"

Which dipatches to cherrypy.root.users.dowski.myapp. A request to "/"
does the following:

environ["SCRIPT_NAME"] = ""
environ["PATH_INFO"] = "/"

That dispatches to cherrypy.root. That is pretty absolute, as you
required above.

Theoretically, something else could be dispatching the URIs, and you'd
have this:

environ["SCRIPT_NAME"] = "/path/to"
environ["PATH_INFO"] = "/users/dowski/myapp"

CP should then dispatch off of PATH_INFO. I suppose a cherrypy.url()
function would be needed to build URLs for redirects and such:

def url(path):
return cherrypy.request.wsgi_environ.get('SCRIPT_NAME', '') + path

Just dealing with PATH_INFO lends to simplification of
_cpwsgi.requestLine(). It can leave SCRIPT_NAME out of the equation.
Dispatching in CherryPy never needs to care about SCRIPT_NAME; all the
info it needs to find a resource is in PATH_INFO.

def requestLine(environ):
"""Rebuild first line of the request (e.g. "GET /path
HTTP/1.0")."""

#resource = environ.get('SCRIPT_NAME', '') +
environ.get('PATH_INFO', '')
resource = environ.get('PATH_INFO', '')
.....

The test suite passes with that (requestLine) change too. What sort of
problem does this cause? Does it cause problems when trying to hook
CherryPy up to Apache via mod_python+wsgi?

Note that at no point am I advocating multiple CherryPy wsgiApps in the
same process. That has other hurdles that I am not interested in
clearing right now. What I am advocating is a single CherryPy wsgiApp
coexisting nicely with other WSGI applications.


> 3. The WSGIServer should continue in its role as a server, and call a single wsgiApp. If someone wants to chain middleware, they should pass an "app" argument which has already wrapped the appropriate pieces. Unfortunately, that's harder than it should be. :( You have to subclass WSGIServer (or use a factory function) and set your own default for the "app" argument. That should be fixed, somehow.

It does in fact continue in its roll as a server and call a single
wsgiApp. It just does the wrapping in __init__().

If this doesn't fly in the core, that's fine. I'll release it as an
alternative server that gets its stuff done through subclassing
necessary the CherryPy classes.

Thanks,

Christian
http://www.dowski.com

Robert Brewer

unread,
Jan 31, 2006, 3:56:30 AM1/31/06
to cherryp...@googlegroups.com
christian wrote:
> Just dealing with PATH_INFO lends to simplification of
> _cpwsgi.requestLine(). It can leave SCRIPT_NAME out of
> the equation. Dispatching in CherryPy never needs to
> care about SCRIPT_NAME; all the info it needs to find
> a resource is in PATH_INFO.
> ...
> The test suite passes with that (requestLine) change too.
> What sort of problem does this cause? Does it cause
> problems when trying to hook CherryPy up to Apache via
> mod_python+wsgi?

It does for me. This is the one thing that I don't understand (or I'm the only one that understands it ;). Let's take your example more fully:

environ["SCRIPT_NAME"] = "/path/to"
environ["PATH_INFO"] = "/users/dowski/myapp/thing/edit"

This is not what I would call "legal" values for SCRIPT_NAME and PATH_INFO. My assumption is that SCRIPT_NAME should be "/path/to/users/dowski/myapp", which is the "virtual location" of "the application" called "myapp". At every point in the WSGI chain, this should be the case. Assume you have CherryPy 2.1 and are connecting it to, say, modpython via WSGI:

<Location /path/to/users/dowski/myapp>
SetHandler python-program
PythonHandler wsgiref.modpython_gateway::handler
PythonOption application cherrypy._cpwsgi::wsgiApp
</Location>

My question is: how does the modpython WSGI gateway decide what value SCRIPT_NAME should have? As I see it, there are three scenarios:

1. The modpython WSGI gateway could use the value of SCRIPT_NAME that Apache hands to it. In our example, this would be "/path/to/users/dowski/myapp/thing", which is not "correct". This is the way it actually works at the moment, so CherryPy works around the problem by re-concatenating SCRIPT_NAME and PATH_INFO, and working with the original request line regardless of where the server broke it in two.

2. The modpython WSGI gateway could follow a convention: perhaps that SCRIPT_NAME is equivalent to, say, the Location or Directory element where SetHandler was defined.

3. The modpython gateway could be told what SCRIPT_NAME is, via another PythonOption directive.

Solutions #2 and #3 amount to setting a config value on the server side so it can tell the application side a vital piece of information: the application's "virtual location". CherryPy (possibly de facto) takes the position:

a. "why tell the server to tell the app when you can just tell the app?", and

b. "why send such config info on every request when you can just configure it once?"

Some people's answer to that seems to be: "what if you have 12 pieces of middleware between the two end points? Now you have to tell them all." To which I can only say: well, yeah. *Something* has to declare what those 12 pieces are and how they connect. And any such connection tool would have to account for middleware or apps which require some startup config. If my application uses a persistent database connection, I'm not going to send the password through the WSGI environ, nor am I going to re-read my app's config file on each request to get it. Whatever tool was used to connect the WSGI components together should be able to tell my middleware or app to "do your one-time startup stuff now". IMO the "virtual location" should be part of that. It certainly is for CherryPy.

winmail.dat

christian

unread,
Jan 31, 2006, 8:30:59 AM1/31/06
to cherrypy-devel
Thanks for continuing this dialog, Robert. Hopefully we can get to the
bottom of this.

Robert Brewer wrote:
> christian wrote:
> > Just dealing with PATH_INFO lends to simplification of
> > _cpwsgi.requestLine(). It can leave SCRIPT_NAME out of
> > the equation. Dispatching in CherryPy never needs to
> > care about SCRIPT_NAME; all the info it needs to find
> > a resource is in PATH_INFO.
> > ...
> > The test suite passes with that (requestLine) change too.
> > What sort of problem does this cause? Does it cause
> > problems when trying to hook CherryPy up to Apache via
> > mod_python+wsgi?
>
> It does for me. This is the one thing that I don't understand (or I'm the only one that understands it ;). Let's take your example more fully:
>
> environ["SCRIPT_NAME"] = "/path/to"
> environ["PATH_INFO"] = "/users/dowski/myapp/thing/edit"
>
> This is not what I would call "legal" values for SCRIPT_NAME and PATH_INFO. My assumption is that SCRIPT_NAME should be "/path/to/users/dowski/myapp", which is the "virtual location" of "the application" called "myapp". At every point in the WSGI chain, this should be the case.

I think there are two issues that are causing us to misunderstand each
other.

First, the above example that I gave was perhaps not clear enough. I
assumed a CherryPy object tree like:

cherrypy.root = AppRoot()
cherrypy.root.users = UserDispatcher()

So "/path/to" is not very indicitive of the fact that it is the
location of the application ("but that's not the application," you say
- gimme a sec ;-).

The second issue leading to this misunderstanding is; what in the world
is an application? In this WSGI context, I call the wsgiApp the
application - thus, the entire CherryPy dispatcher, filter system,
framework, etc. *is the application*. Sure, it can host sub-apps, but
in the context of WSGI or mod_python, it's the app.

> Assume you have CherryPy 2.1 and are connecting it to, say, modpython via WSGI:
>
> <Location /path/to/users/dowski/myapp>
> SetHandler python-program
> PythonHandler wsgiref.modpython_gateway::handler
> PythonOption application cherrypy._cpwsgi::wsgiApp
> </Location>
>
> My question is: how does the modpython WSGI gateway decide what value SCRIPT_NAME should have?

Well, I would change the Location line to look like this:
<Location /path/to>

CherryPy would then dispatch the rest of the URL, as in my world ;-) it
would receive "/users/dowski/myapp/thing/edit" as the path info and
execute "edit".

Let me make a better example here (using /path/to was a bad idea I
think). Here are the desired SCRIPT_NAME and PATH_INFO vars:

environ['SCRIPT_NAME'] = "/blogs"
environ['PATH_INFO'] = "/dowski/posts/2006/01"

Here is the CP object tree for the app:
cherrypy.root = BlogHandler() # contains a default method

And a modpython configuration to match:

<Location /blogs>


SetHandler python-program
PythonHandler wsgiref.modpython_gateway::handler
PythonOption application cherrypy._cpwsgi::wsgiApp
</Location>

> As I see it, there are three scenarios:


>
> 1. The modpython WSGI gateway could use the value of SCRIPT_NAME that Apache hands to it. In our example, this would be "/path/to/users/dowski/myapp/thing", which is not "correct". This is the way it actually works at the moment, so CherryPy works around the problem by re-concatenating SCRIPT_NAME and PATH_INFO, and working with the original request line regardless of where the server broke it in two.

Yeah, that won't work.

> 2. The modpython WSGI gateway could follow a convention: perhaps that SCRIPT_NAME is equivalent to, say, the Location or Directory element where SetHandler was defined.

Hhhmm...that might work. It is the location of the CP mod_python
handler for the wsgiApp, after all.

> 3. The modpython gateway could be told what SCRIPT_NAME is, via another PythonOption directive.

That could work too. Looks like MoinMoin uses something like that:
http://linuxczar.net/index.py/HelpOnInstalling/ApacheWithModPython#head-aed66eb987149f6cd661b227731a96f58f14323a

> Solutions #2 and #3 amount to setting a config value on the server side so it can tell the application side a vital piece of information: the application's "virtual location". CherryPy (possibly de facto) takes the position:
>
> a. "why tell the server to tell the app when you can just tell the app?", and

It seems like, in a way, WSGI takes us back in to CGI. Our
multifaceted/multilayered apps are now distilled down to a single
application callable - much like a single cgi script. The nice thing
about CGI is that it was simple and it worked (usually). You drop a
cgi script into cgi-bin, and it does it's thing. Restructure so that
it's now elsewhere, and it continues to work. No need to inform it of
the changes. I think that's part of the idea with WSGI as well, and
forcing CP to know beforehand its own virtual location removes that
flexibility aspect.

> b. "why send such config info on every request when you can just configure it once?"

Hhhmm... I guess because it's just an evironment var. A bunch of them
get sent every request, and it's just a standard CGI one anyhow.

>
> Some people's answer to that seems to be: "what if you have 12 pieces of middleware between the two end points? Now you have to tell them all." To which I can only say: well, yeah. *Something* has to declare what those 12 pieces are and how they connect.

In my short experience messing with the CP WSGIServer and middleware,
with the SCRIPT_NAME and PATH_INFO changes, I didn't really have to
tell them anything. After they wrapped each other, I was left with a
single wsgi_app(environ, start_response) callable that just worked.

> And any such connection tool would have to account for middleware or apps which require some startup config. If my application uses a persistent database connection, I'm not going to send the password through the WSGI environ, nor am I going to re-read my app's config file on each request to get it. Whatever tool was used to connect the WSGI components together should be able to tell my middleware or app to "do your one-time startup stuff now".

Sure. The pieces of middleware I chained with CP (Paste's
EvalException and Saddi's sessions) both required config info. But
that was a one time deal. I passed them a wsgi_app and config info in
their __init__ methods and they returned a wsgi_app that was properly
configured and that wrapped up the passed in wsgi_app.

IMO the "virtual location" should be part of that. It certainly is for
CherryPy.

Again, it's just a standard CGI envrionment variable to me. It's going
to be there anyhow, along with a bunch of others.

So, to summarize.

1. What is an app?
a. I call it the CP wsgiApp
b. You call it something mounted on the CP object tree

2. The mod_python WSGI gateway doesn't work with my SCRIPT_NAME and
PATH_INFO changes.
a. Is the gateway wrong?
b. Am I wrong? ;-)

Whew. That was way longer than I intended.

Christian
http://www.dowski.com

Ian Bicking

unread,
Jan 31, 2006, 11:34:50 AM1/31/06
to cherryp...@googlegroups.com
christian wrote:
> Let me make a better example here (using /path/to was a bad idea I
> think). Here are the desired SCRIPT_NAME and PATH_INFO vars:
>
> environ['SCRIPT_NAME'] = "/blogs"
> environ['PATH_INFO'] = "/dowski/posts/2006/01"
>
> Here is the CP object tree for the app:
> cherrypy.root = BlogHandler() # contains a default method
>
> And a modpython configuration to match:
>
> <Location /blogs>
> SetHandler python-program
> PythonHandler wsgiref.modpython_gateway::handler
> PythonOption application cherrypy._cpwsgi::wsgiApp
> </Location>

In my experience stuff in <Location> doesn't change
SCRIPT_NAME/PATH_INFO, and it's hard (impossible?) for a handler to even
tell what <Location> it has been put in. With SCGI they added another
directive to make this more explicit:

SCGIMount /blogs localhost:4000

I believe the implementation is similar to Alias (the implementation for
that directive is actually quite readable). I don't know how it might
work in mod_python.

Another way to handle it, that at least the flup servers obey and could
be extended elsewhere, is a WSGI_SCRIPT_NAME variable, like:

<Location /blogs>
SetEnv WSGI_SCRIPT_NAME /blogs
...
</Location>

The server that receives the request should treat SCRIPT_NAME+PATH_INFO
as the complete path, and then strip WSGI_SCRIPT_NAME off the front to
form the new SCRIPT_NAME and PATH_INFO, and then delete WSGI_SCRIPT_NAME.


--
Ian Bicking / ia...@colorstudy.com / http://blog.ianbicking.org

Robert Brewer

unread,
Jan 31, 2006, 2:04:25 PM1/31/06
to cherryp...@googlegroups.com
christian wrote:
> environ on its own could be confusing outside of the WSGI context.
> cherrypy.request.wsgi_environ it shall be (barring any other
> objections).

Noticing your changesets on this, I think the wsgi_environ attribute
should simply be missing if you're not using WSGI. It will be anyway (!)
if you're using a custom HTTP server (i.e. one that is neither wsgi nor
_cphttpserver). Consumer code should either trust that the attribute is
present (and therefore raise an AttributeError if missing) or write:

wsgienv = getattr(cherrypy.request, "wsgi_environ", {})

This design should be followed for any data (not just WSGI data) which
passes through the CherryPy app server uninspected and unaltered: use
the dynamic nature of Python objects to pass it through from HTTP server
to the app, but don't require CherryPy to handle it or be aware of
it--that leads to muddy API's very quickly.

christian

unread,
Jan 31, 2006, 4:36:33 PM1/31/06
to cherrypy-devel

I see your point. I was just, um, testing your reflexes. I'll make
the change, but geeze, this is 4 changesets for what will amount to 1
line of code added to 1 file. How embarassing... Maybe I'll just add
that change into the WSGI server changeset ;-)

So ... any more thoughts on the SCRIPT_NAME/PATH_INFO matter? What
about the WSGI server modifications (which kind of hinges on SN/PI)? I
think being able to use WSGI middleware is important.

Christian
http://www.dowski.com

christian

unread,
Jan 31, 2006, 4:41:39 PM1/31/06
to cherrypy-devel
Ian Bicking wrote:
>
> Another way to handle it, that at least the flup servers obey and could
> be extended elsewhere, is a WSGI_SCRIPT_NAME variable, like:
>
> <Location /blogs>
> SetEnv WSGI_SCRIPT_NAME /blogs
> ...
> </Location>
>
> The server that receives the request should treat SCRIPT_NAME+PATH_INFO
> as the complete path, and then strip WSGI_SCRIPT_NAME off the front to
> form the new SCRIPT_NAME and PATH_INFO, and then delete WSGI_SCRIPT_NAME.

I would be cool with that, but I am not the author of the mod_python
WSGI gateway.

To me, it's just like mounting stuff in the CP object tree.
Apache/mod_python needs to be able to set the virtual path because it
is hosting the wsgiApp, just like CP needs to set the path to any
sub-apps it is hosting.

Christian
http://www.dowski.com

Robert Brewer

unread,
Jan 31, 2006, 5:19:54 PM1/31/06
to cherryp...@googlegroups.com
christian wrote:
> So ... any more thoughts on the SCRIPT_NAME/PATH_INFO matter?
> What about the WSGI server modifications (which kind of hinges
> on SN/PI)? I think being able to use WSGI middleware is important.

I think so too; however, I still have two (complementary) reservations:

1. Not all WSGI servers output the "correct" SCRIPT_NAME.
2. I have yet to see a CherryPy app which fails (regardless of whether
middleware is used or not) because SCRIPT_NAME was incorrect. I have
seen examples of CP apps which would be *easier to construct* if you
could assume cherrypy.root == AppRoot() == SCRIPT_NAME, but that's not
the same as being broken.

There is a fundamental shift happening in the way CherryPy apps get
mounted. The old way (2.1 and earlier) was to set cherrypy.root =
AppRoot(). There was no facility for running multiple apps in the same
process, and if you wanted AppRoot.index to respond to a URL like
"/path/to/myapp/", then you used the VirtualPathFilter hack to trick CP
into thinking "/path/to/myapp/" was actually "/". This was fragile and
had some nasty corner cases (as I described at length on my blog at the
time).

The new way (2.2+) changes the meaning of cherrypy.root, and in fact the
meaning of the whole cherrypy dispatch tree. Rather than munge the
incoming URL to fit the ideal tree, we now munge the tree to fit the
reality of URL's. That is, cherrypy.root should no longer be thought of
as "my app"; instead, cherrypy.root and its dependents should now be
thought of as a scaffolding, rooted at the URL "/", onto which we attach
handlers. You attach a subtree of handlers for "myapp" now by using
cherrypy.tree.mount.

This new design was necessary to support multiple apps in a single
process, while preserving the 2.x API. That is, you can still use a
naive 2.1 app in the 2.2 framework. Therefore, even if we decide someday
to trust SCRIPT_NAME, it won't be until CP 3, because the API change is
too large. It seems to me that your proposal for the WSGI server is
trying to improve the 2.1 design by munging the URL instead of the tree,
and is therefore both obsolete and too early at the same time. In CP 3
we may decide to go back to an API where you can write "cherrypy.root =
AppRoot()" and still have it play nice with multiple apps in a single
process; at that time, I think your proposal will naturally be
implemented. But it can't work in the 2.x line now IMO.

christian

unread,
Feb 1, 2006, 7:50:39 AM2/1/06
to cherrypy-devel

Robert Brewer wrote:
> christian wrote:
> > So ... any more thoughts on the SCRIPT_NAME/PATH_INFO matter?
> > What about the WSGI server modifications (which kind of hinges
> > on SN/PI)? I think being able to use WSGI middleware is important.
>
> I think so too; however, I still have two (complementary) reservations:
>
> 1. Not all WSGI servers output the "correct" SCRIPT_NAME.

It's too bad that so much seems to hinge on SCRIPT_NAME/PATH_INFO
considering they are so arbitrary (at least as far as current servers
go).

It makes sense to me that SCRIPT_NAME should be the path to the WSGI
application, and PATH_INFO the the path to the resource that that WSGI
application should dispatch. AFAIK, that application can change
SCRIPT_NAME & PATH_INFO as it sees fit. That is what happens in my
WSGIAppFilter. SCRIPT_NAME basically gets set to the CP object path up
to the hosted WSGIApp object, and the remainder of the path that does
not correspond to an object in the CP tree gets put in PATH_INFO. The
hosted WSGI application then dispatches based on PATH_INFO.

> 2. I have yet to see a CherryPy app which fails (regardless of whether
> middleware is used or not) because SCRIPT_NAME was incorrect. I have
> seen examples of CP apps which would be *easier to construct* if you
> could assume cherrypy.root == AppRoot() == SCRIPT_NAME, but that's not
> the same as being broken.

Should "If it ain't broke don't fix it" be added to the Zen of Python?
;-)

> There is a fundamental shift happening in the way CherryPy apps get
> mounted. The old way (2.1 and earlier) was to set cherrypy.root =
> AppRoot(). There was no facility for running multiple apps in the same
> process, and if you wanted AppRoot.index to respond to a URL like
> "/path/to/myapp/", then you used the VirtualPathFilter hack to trick CP
> into thinking "/path/to/myapp/" was actually "/". This was fragile and
> had some nasty corner cases (as I described at length on my blog at the
> time).

I guess I don't see it as such a fundamental shift. Instead of
manually building:

cherrypy.root.users.dowski.blog = Blog()

Now we just do:

cherrypy.tree.mount(Blog(), '/users/dowski/blog')


> The new way (2.2+) changes the meaning of cherrypy.root, and in fact the
> meaning of the whole cherrypy dispatch tree. Rather than munge the
> incoming URL to fit the ideal tree, we now munge the tree to fit the
> reality of URL's. That is, cherrypy.root should no longer be thought of
> as "my app"; instead, cherrypy.root and its dependents should now be
> thought of as a scaffolding, rooted at the URL "/", onto which we attach
> handlers. You attach a subtree of handlers for "myapp" now by using
> cherrypy.tree.mount.

Ok.

> This new design was necessary to support multiple apps in a single
> process, while preserving the 2.x API. That is, you can still use a
> naive 2.1 app in the 2.2 framework. Therefore, even if we decide someday
> to trust SCRIPT_NAME, it won't be until CP 3, because the API change is
> too large.

What would the API change consist of? If CP is dispatching off of
PATH_INFO and receives a request for '/users/fumanchu/coolapp' does it
matter if it is hooked to the root like this

cherrypy.root.users.fumanchu.coolapp = CoolApp()

or like this

cherrypy.tree.mount(CoolApp(), '/users/fumanchu/coolapp')
?

I imagine the cherrypy.tree.mount_point() and cherrypy.tree.url()
functions would have to change to prepend SCRIPT_NAME (if it is
available), but I still don't see any API changes (hit me over the head
with one if you can ;-).

> It seems to me that your proposal for the WSGI server is
> trying to improve the 2.1 design by munging the URL instead of the tree,
> and is therefore both obsolete and too early at the same time.

Really, there is no munging of the URL going on. What happens is that
the WSGI server no longer assumes that it is hosting a single WSGI app
that controls the root path. I suppose this could be carried out
further to allow for multiple WSGI apps to be hosted by the WSGI server
at specified mount points, but that wasn't the result I was going for.
I just wanted to add the ability to use WSGI middlware with CP 2.2.
Releasing the ('/') root path from control of a single app in the WSGI
server allows for that.

> In CP 3
> we may decide to go back to an API where you can write "cherrypy.root =
> AppRoot()" and still have it play nice with multiple apps in a single
> process; at that time, I think your proposal will naturally be
> implemented. But it can't work in the 2.x line now IMO.

All of my sample apps that I have written to play with the middleware
changes work fine with tree.mount(). The test suite passes. I guess I
don't see why 2.2's "multiple apps in a single process" are broken by
my changes.

Christian
http://www.dowski.com

Ian Bicking

unread,
Feb 1, 2006, 12:44:17 PM2/1/06
to cherryp...@googlegroups.com
christian wrote:
>>>So ... any more thoughts on the SCRIPT_NAME/PATH_INFO matter?
>>>What about the WSGI server modifications (which kind of hinges
>>>on SN/PI)? I think being able to use WSGI middleware is important.
>>
>>I think so too; however, I still have two (complementary) reservations:
>>
>>1. Not all WSGI servers output the "correct" SCRIPT_NAME.
>
>
> It's too bad that so much seems to hinge on SCRIPT_NAME/PATH_INFO
> considering they are so arbitrary (at least as far as current servers
> go).

There are several situations where the distinction is correct: from CGI,
from FastCgi, from SCGI with SCGIMount, and from CGI gateways to FastCGI
or SCGI. Why punish the correct implementations because of the buggy
implementations?

But that's not really the point either. SCRIPT_NAME and PATH_INFO
aren't arbitrary in WSGI, they are explicitly specified. It is the
responsibility of whatever is the gateway from the external protocol to
WSGI to fix that up, with configuration if necessary. But just because
many external protocols lose that information or get it wrong or simply
have no way to communicate it, doesn't mean anything. WSGI is not CGI
or FastCGI or whatever, it is only analogous to those systems. And when
you don't get the SCRIPT_NAME and PATH_INFO distinction right in WSGI
it's a bug.

If CP provides a configuration value to fix SCRIPT_NAME, to be used in a
deployment where SCRIPT_NAME gets lost or is incorrect, that'd be fine
and well. But it should assume it is correct by default.

>>There is a fundamental shift happening in the way CherryPy apps get
>>mounted. The old way (2.1 and earlier) was to set cherrypy.root =
>>AppRoot(). There was no facility for running multiple apps in the same
>>process, and if you wanted AppRoot.index to respond to a URL like
>>"/path/to/myapp/", then you used the VirtualPathFilter hack to trick CP
>>into thinking "/path/to/myapp/" was actually "/". This was fragile and
>>had some nasty corner cases (as I described at length on my blog at the
>>time).
>
>
> I guess I don't see it as such a fundamental shift. Instead of
> manually building:
>
> cherrypy.root.users.dowski.blog = Blog()
>
> Now we just do:
>
> cherrypy.tree.mount(Blog(), '/users/dowski/blog')

I can't see how it is fundamental either, but I wouldn't make the change
with tree.mount. Request.object_path should not be set to the complete
request path, it should be set to the path that is being processed, and
when called from wsgiApp that means PATH_INFO.

There's no fundamental shift there, it's just another (optional
keyword!) parameter added (probably to Request.run), that wsgiApp will
pass in. Call the argument scriptName.

That will fix all this problem, and it's so freakin' easy and has no
backward compatibility problems I see (unless, I suppose, someone
subclasses Request with a fixed signature for run() -- and if they did,
this doesn't seem like the sort of backward compatibility problem that
is inappropriate for a point release).

Christian Wyglendowski

unread,
Feb 1, 2006, 1:17:18 PM2/1/06
to cherryp...@googlegroups.com
Ian Bicking wrote:
>
> christian wrote:
>>>> So ... any more thoughts on the SCRIPT_NAME/PATH_INFO matter?
>>>> What about the WSGI server modifications (which kind of hinges
>>>> on SN/PI)? I think being able to use WSGI middleware is important.
>>>
>>> I think so too; however, I still have two (complementary) reservations:
>>>
>>> 1. Not all WSGI servers output the "correct" SCRIPT_NAME.
>>
>>
>> It's too bad that so much seems to hinge on SCRIPT_NAME/PATH_INFO
>> considering they are so arbitrary (at least as far as current servers
>> go).
>
> There are several situations where the distinction is correct: from CGI,
> from FastCgi, from SCGI with SCGIMount, and from CGI gateways to FastCGI
> or SCGI. Why punish the correct implementations because of the buggy
> implementations?

I am not suggesting that.

> But that's not really the point either. SCRIPT_NAME and PATH_INFO
> aren't arbitrary in WSGI, they are explicitly specified.

That's true, but they wind up being arbitrary when the servers (Apache,
IIS, whatever) hand different values out for them. I guess whatever
gateway is sitting on top of that server needs to make it right, but the
original value is "arbitrary".

> It is the
> responsibility of whatever is the gateway from the external protocol to
> WSGI to fix that up, with configuration if necessary. But just because
> many external protocols lose that information or get it wrong or simply
> have no way to communicate it, doesn't mean anything. WSGI is not CGI
> or FastCGI or whatever, it is only analogous to those systems. And when
> you don't get the SCRIPT_NAME and PATH_INFO distinction right in WSGI
> it's a bug.

I guess my last sentence above is what you already mentioned here.

> If CP provides a configuration value to fix SCRIPT_NAME, to be used in a
> deployment where SCRIPT_NAME gets lost or is incorrect, that'd be fine
> and well. But it should assume it is correct by default.

I agree.

>>> There is a fundamental shift happening in the way CherryPy apps get
>>> mounted. The old way (2.1 and earlier) was to set cherrypy.root =
>>> AppRoot(). There was no facility for running multiple apps in the same
>>> process, and if you wanted AppRoot.index to respond to a URL like
>>> "/path/to/myapp/", then you used the VirtualPathFilter hack to trick CP
>>> into thinking "/path/to/myapp/" was actually "/". This was fragile and
>>> had some nasty corner cases (as I described at length on my blog at the
>>> time).
>>
>>
>> I guess I don't see it as such a fundamental shift. Instead of
>> manually building:
>>
>> cherrypy.root.users.dowski.blog = Blog()
>>
>> Now we just do:
>>
>> cherrypy.tree.mount(Blog(), '/users/dowski/blog')
>
> I can't see how it is fundamental either, but I wouldn't make the change
> with tree.mount. Request.object_path should not be set to the complete
> request path, it should be set to the path that is being processed, and
> when called from wsgiApp that means PATH_INFO.
>
> There's no fundamental shift there, it's just another (optional
> keyword!) parameter added (probably to Request.run), that wsgiApp will
> pass in. Call the argument scriptName.

My idea for this was to change requestLine() in _cpwsgi.py to only use
the PATH_INFO. That way, CP can continue doing its http-toolkit stuff
(Request gets a real http request line that corresponds to the resource)
and still jive with WSGI.

> That will fix all this problem, and it's so freakin' easy and has no
> backward compatibility problems I see (unless, I suppose, someone
> subclasses Request with a fixed signature for run() -- and if they did,
> this doesn't seem like the sort of backward compatibility problem that
> is inappropriate for a point release).
>

With my proposed changes, the entire test suite passes, CP dispatches
off PATH_INFO, and you can wrap the CP wsgiApp in middleware using the
builtin WSGI server.

I guess I need to write a new test that (somehow) gives CP a different
value for SCRIPT_NAME and see if it is able to correctly dispatch the
request (I think it will). I would have to update tree.mount_point and
tree.url to use SCRIPT_NAME, but that shouldn't be hard. I suppose
redirects would need to take it into account as well. Anything else?

Christian
http://www.dowski.com


Ian Bicking

unread,
Feb 1, 2006, 1:29:33 PM2/1/06
to cherryp...@googlegroups.com
Christian Wyglendowski wrote:
>> There's no fundamental shift there, it's just another (optional
>> keyword!) parameter added (probably to Request.run), that wsgiApp will
>> pass in. Call the argument scriptName.
>
>
> My idea for this was to change requestLine() in _cpwsgi.py to only use
> the PATH_INFO. That way, CP can continue doing its http-toolkit stuff
> (Request gets a real http request line that corresponds to the resource)
> and still jive with WSGI.

That makes it hard for applications to figure out where they came from,
since you are throwing away useful information. It's not necessary for
CP to cover up the actual request line, it just has to parse it
correctly, which means building object_path from PATH_INFO.

Though I suppose if you keep the WSGI environment, at least that would
give applications a chance to figure out where they came from.

>> That will fix all this problem, and it's so freakin' easy and has no
>> backward compatibility problems I see (unless, I suppose, someone
>> subclasses Request with a fixed signature for run() -- and if they
>> did, this doesn't seem like the sort of backward compatibility problem
>> that is inappropriate for a point release).
>>
>
> With my proposed changes, the entire test suite passes, CP dispatches
> off PATH_INFO, and you can wrap the CP wsgiApp in middleware using the
> builtin WSGI server.

I suspect the tests don't account for any of the techniques people are
using to figure out where their app is mounted. Though they might be
trusting object_path for that as well, in which case they will also be
broken. I have no idea what techniques people are using, so it's hard
to know what will break the least. Another option would be similar to
tree.mount, but involve less fiddling with objects, would be to build a
complete object_map but to jump through all the parts from SCRIPT_NAME
immediately.

Lacking a clear "where am I" API makes it much harder to predict how
applications determine this in practice.

Robert Brewer

unread,
Feb 1, 2006, 3:00:04 PM2/1/06
to cherryp...@googlegroups.com
christian wrote:

> Robert Brewer wrote:
> > This new design was necessary to support multiple apps in a single
> > process, while preserving the 2.x API. That is, you can still use a
> > naive 2.1 app in the 2.2 framework. Therefore, even if we
> > decide someday to trust SCRIPT_NAME, it won't be until CP 3,
> > because the API change is too large.
>
> What would the API change consist of? If CP is dispatching off of
> PATH_INFO and receives a request for '/users/fumanchu/coolapp' does it
> matter if it is hooked to the root like this
>
> cherrypy.root.users.fumanchu.coolapp = CoolApp()
>
> or like this
>
> cherrypy.tree.mount(CoolApp(), '/users/fumanchu/coolapp')
> ?
>
> I imagine the cherrypy.tree.mount_point() and cherrypy.tree.url()
> functions would have to change to prepend SCRIPT_NAME (if it is
> available), but I still don't see any API changes (hit me
> over the head with one if you can ;-).

New apps using tree.mount are not the issue. The desire is that existing
2.1 apps (that use "cherrypy.root = AppRoot()") will still work in 2.2
without changes. The only way for us to support 2.1 apps and (multiple)
2.2 apps with the same code is to have cherrypy.root == "/" always.

> > It seems to me that your proposal for the WSGI server is
> > trying to improve the 2.1 design by munging the URL instead
> of the tree,
> > and is therefore both obsolete and too early at the same time.
>
> Really, there is no munging of the URL going on. What happens is that
> the WSGI server no longer assumes that it is hosting a single WSGI app
> that controls the root path.

It doesn't assume that now! Rather, the wsgiApp callable assumes it is
hosting multiple apps at various mount points, _and_ allows for problems
with bad SCRIPT_NAME's. Just because other WSGI implementations assume a
different "application callable" for each app doesn't mean CP has to.

> I just wanted to add the ability to use WSGI middlware with CP 2.2.
> Releasing the ('/') root path from control of a single app in the WSGI
> server allows for that.

> ...


> All of my sample apps that I have written to play with the middleware
> changes work fine with tree.mount(). The test suite passes.

> I guess I don't see why 2.2's "multiple apps in a single process"


> are broken by my changes.

Are we still talking about your patch for #455? I don't understand why
_cpwsgi doesn't do what you want already. I most definitely can
understand that _cpwsgiserver doesn't set SCRIPT_NAME and PATH_INFO per
the spec. That should be changed to properly set those values. That
would mean making the WSGIServer more configurable; you'd have to tell
it all of your app roots when you start it up (like every other WSGI
server is being forced to do). This would take a bit of work to get
server.start to pass CP's tree.mount_points to the WSGI server, while
still remaining decoupled. In other words, SCRIPT_NAME and PATH_INFO
should be fixed in _cpwsgiserver.HTTPRequest.parse_request, not in
_cpwsgi.CPHTTPRequest.parse_request.

Christian Wyglendowski

unread,
Feb 1, 2006, 5:30:00 PM2/1/06
to cherryp...@googlegroups.com
Ian Bicking wrote:

>
> Christian Wyglendowski wrote:
>> My idea for this was to change requestLine() in _cpwsgi.py to only use
>> the PATH_INFO. That way, CP can continue doing its http-toolkit stuff
>> (Request gets a real http request line that corresponds to the resource)
>> and still jive with WSGI.
>
> That makes it hard for applications to figure out where they came from,
> since you are throwing away useful information. It's not necessary for
> CP to cover up the actual request line, it just has to parse it
> correctly, which means building object_path from PATH_INFO.
>
> Though I suppose if you keep the WSGI environment, at least that would
> give applications a chance to figure out where they came from.

Yeah, it would know where it came from because it could check (the new)
cherrypy.request.wsgi_environ var for ['SCRIPT_NAME']. But your idea
sounds like it would work as well. And now that I think about it, I'm
not so sure handing request a "false" request line is such a good idea.

Your idea of modifying the Request object is probably better, though I
don't know how another parameter passed to Request.run will be received.
Perhaps instead Request.processRequestLine should just set object_path
to the full request path minus SCRIPT_NAME (thus, PATH_INFO). It is
actually easier to go about it in that manner rather than setting
object_path directly to PATH_INFO.

Christian
http://www.dowski.com

Robert Brewer

unread,
Feb 1, 2006, 5:54:37 PM2/1/06
to cherryp...@googlegroups.com
Christian Wyglendowski wrote:
> Perhaps instead Request.processRequestLine should just set
> object_path to the full request path minus SCRIPT_NAME
> (thus, PATH_INFO).

If object_path == PATH_INFO, how will you patch cherrypy.config.get? It
inspects object_path to return the values specific to a given URL--if
you only look at PATH_INFO, then multiple apps' configs will start to
collide.

Ian Bicking

unread,
Feb 1, 2006, 6:03:12 PM2/1/06
to cherryp...@googlegroups.com
Robert Brewer wrote:
> Christian Wyglendowski wrote:
>
>>Perhaps instead Request.processRequestLine should just set
>>object_path to the full request path minus SCRIPT_NAME
>>(thus, PATH_INFO).
>
>
> If object_path == PATH_INFO, how will you patch cherrypy.config.get? It
> inspects object_path to return the values specific to a given URL--if
> you only look at PATH_INFO, then multiple apps' configs will start to
> collide.

In CherryPaste the configuration is relative to the application root, so
any configuration attached to /static when accessed through an
application that is rooted at /myapp (i.e., SCRIPT_NAME is "/myapp")
will apply to the resource accessed with the URL path /myapp/static.
You don't put "/myapp" in your configuration files anywhere. This is
certainly most appropriate for Paste; I don't know what you plan for
people to use in CherryPy generally. I think there's a problem where
configuration of the application relative to CherryPy is confused with
configuration of the application deployment, which causes some of the
problems -- if you are indicating how CherryPy should treat the
application then all paths should be relative. If you are indicating
how the application should act in a specific deployment, then you can go
either way -- full path or just relative. I use relative for both in
Paste, for symmetry (to the degree there's per-path configuration at
all, which usually isn't necessary).

Robert Brewer

unread,
Feb 1, 2006, 6:23:59 PM2/1/06
to cherryp...@googlegroups.com
Christian Wyglendowski wrote:
> Perhaps instead Request.processRequestLine should just set
> object_path to the full request path minus SCRIPT_NAME
> (thus, PATH_INFO).

and I replied:


> If object_path == PATH_INFO, how will you patch
> cherrypy.config.get? It inspects object_path to return
> the values specific to a given URL--if you only look at
> PATH_INFO, then multiple apps' configs will start to
> collide.

and Ian Bicking wrote:
> In CherryPaste the configuration is relative to the
> application root, so any configuration attached to
> /static when accessed through an application that is
> rooted at /myapp (i.e., SCRIPT_NAME is "/myapp")
> will apply to the resource accessed with the URL
> path /myapp/static. You don't put "/myapp" in your
> configuration files anywhere. This is certainly most

> appropriate for Paste...

Sure; my point is that this is one of the showstoppers for setting
object_path to PATH_INFO in the 2.x line. Changing the config keys from
absolute to relative paths is a solution that would have to wait for CP
3.

Christian Wyglendowski

unread,
Feb 1, 2006, 9:25:42 PM2/1/06
to cherryp...@googlegroups.com
> Christian Wyglendowski wrote:
>> Perhaps instead Request.processRequestLine should just set
>> object_path to the full request path minus SCRIPT_NAME
>> (thus, PATH_INFO).
>
> and Robert Brewer replied:

>> If object_path == PATH_INFO, how will you patch
>> cherrypy.config.get? It inspects object_path to return
>> the values specific to a given URL--if you only look at
>> PATH_INFO, then multiple apps' configs will start to
>> collide.
>
> and Ian Bicking noted:

>> In CherryPaste the configuration is relative to the
>> application root, so any configuration attached to
>> /static when accessed through an application that is
>> rooted at /myapp (i.e., SCRIPT_NAME is "/myapp")
>> will apply to the resource accessed with the URL
>> path /myapp/static. You don't put "/myapp" in your
>> configuration files anywhere. This is certainly most
>> appropriate for Paste...
>
> and Robert countered:

> Sure; my point is that this is one of the showstoppers for setting
> object_path to PATH_INFO in the 2.x line. Changing the config keys from
> absolute to relative paths is a solution that would have to wait for CP
> 3.

I think this could still work for CherryPy, and again, I think the issue
of what the "app" is that we are all talking about is getting muddied.
I think that Ian and I are calling the "app" the CP (or other) wsgiApp
(correct me if I am wrong, Ian). I think that Robert is calling the
"app" a CP application mounted on the CP tree *within* the wsgiApp
(correct me if I am wrong).

I feel like we are going in circles, and I think this is my last trip
around the merry-go-round. So here I go ... (wheee!)

#mycp.conf

[/blogapp]
#stuff for blogapp here

[/projectapp]
#stuff for projectapp here

#my_cp_app.py
cherrypy.tree.mount(BlogApp(), '/blogapp')
cherrypy.tree.mount(ProjectApp(), /projectapp')
# or ...
# cherrypy.root = SomeRoot()
# cherrypy.root.blogapp = BlogApp()
# cherrypy.root.projectapp = ProjectApp()

With my patch in 455 (dropping the munged url in _cpwsgi.requestLine()
for the object_path approach now *way* above), a standard CP wsgi server
is started with cherrypy.server.start(). Someone requests
http://myserver/blogapp/posts/SCRIPT-NAME-and-PATH-INFO-hell. We have
this simplified WSGI environ:

SCRIPT_NAME = '' #empty because CP's wsgiApp is at the server root
PATH_INFO = '/blogapp/posts/SCRIPT-NAME-and-PATH-INFO-hell'

object_path gets set to PATH_INFO, config lookups should work and
everything is cool (I think).

Using another WSGI server, my CherryPy wsgiApp is mounted at
'/apps/cherrypy'. A request is made for
http://myotherserver/apps/cherrypy/projectapp/tags/filters. Acting as
it should, the WSGI server/gateway sets up the following simplified WSGI
environment:

SCRIPT_NAME = '/apps/cherrypy' #the path to the CP wsgiApp
PATH_INFO = '/projectapp/tags/filters'

Again, object_path set to PATH_INFO should cause no problems.

I don't really mind if this isn't included in 2.2. As a matter of fact,
I agree with you (Robert) that it shouldn't. I just want to do what is
right for CherryPy (and I know you do too) as far as WSGI
interoperability goes. I am *very* new to WSGI and relatively new to
the CP dev team, so I am going to submit to "rank" on this and shelve my
proposal for now.

Christian
http://www.dowski.com

Sylvain Hellegouarch

unread,
Feb 2, 2006, 2:56:45 AM2/2/06
to cherryp...@googlegroups.com

>
> Sure; my point is that this is one of the showstoppers for setting
> object_path to PATH_INFO in the 2.x line. Changing the config keys from
> absolute to relative paths is a solution that would have to wait for CP
> 3.

Indeed but this is a change we do need as Ian's way is way more flexible
and simpler :)

- Sylvain

Robert Brewer

unread,
Feb 2, 2006, 2:01:58 PM2/2/06
to cherryp...@googlegroups.com
Me:

> Sure; my point is that this is one of the showstoppers for
> setting object_path to PATH_INFO in the 2.x line. Changing
> the config keys from absolute to relative paths is a
> solution that would have to wait for CP 3.

Christian:


> I feel like we are going in circles, and I think this is my last trip
> around the merry-go-round. So here I go ... (wheee!)
>
> #mycp.conf
>
> [/blogapp]
> #stuff for blogapp here
>
> [/projectapp]
> #stuff for projectapp here
>
> #my_cp_app.py
> cherrypy.tree.mount(BlogApp(), '/blogapp')
> cherrypy.tree.mount(ProjectApp(), /projectapp')
> # or ...
> # cherrypy.root = SomeRoot()
> # cherrypy.root.blogapp = BlogApp()
> # cherrypy.root.projectapp = ProjectApp()
>
> With my patch in 455 (dropping the munged url in
> _cpwsgi.requestLine() for the object_path approach
> now *way* above), a standard CP wsgi server is
> started with cherrypy.server.start(). Someone requests
> http://myserver/blogapp/posts/SCRIPT-NAME-and-PATH-INFO-hell.
> We have this simplified WSGI environ:
>
> SCRIPT_NAME = '' #empty because CP's wsgiApp is at the server root
> PATH_INFO = '/blogapp/posts/SCRIPT-NAME-and-PATH-INFO-hell'
>
> object_path gets set to PATH_INFO, config lookups should work and
> everything is cool (I think).

Yes, except the current CP WSGI server doesn't do this. Instead, you'll
get:

SCRIPT_NAME = '/blogapp/posts/SCRIPT-NAME-and-PATH-INFO-hell'
PATH_INFO = '' #_cpwsgiserver sets *all* PATH_INFO's to ''

[which is why I said _cpwsgiserver needs to be fixed, in my last post]
In this case, setting object_path to PATH_INFO won't work.

> Using another WSGI server, my CherryPy wsgiApp is mounted at
> '/apps/cherrypy'. A request is made for
> http://myotherserver/apps/cherrypy/projectapp/tags/filters.
> Acting as it should, the WSGI server/gateway sets up the
> following simplified WSGI environment:
>
> SCRIPT_NAME = '/apps/cherrypy' #the path to the CP wsgiApp
> PATH_INFO = '/projectapp/tags/filters'
>
> Again, object_path set to PATH_INFO should cause no problems.

The problem here is that you should expect:

SCRIPT_NAME = '/apps/cherrypy/projectapp'
PATH_INFO = '/tags/filters'

That's what I hear Ian saying that Paste and "other WSGI components that
obey the spec" will hand you. Correct me if I'm wrong. If true, then
setting object_path to PATH_INFO won't work; in order for it to work,
you'd have to tell your server to lie to Paste et al regarding what
SCRIPT_NAME should be.

I could be wrong, and SCRIPT_NAME really should be '/apps/cherrypy'
(because CP's wsgiApp is 'a single WSGI app' instead of 'the same
callable for multiple WSGI apps' as I designed it). But that still
doesn't address the problem that several WSGI servers in the field,
including CP's own builtin one, don't hand you the expected SCRIPT_NAME
and PATH_INFO.

I think I finally understand the use cases you're trying to support
here--you want the same behavior that the VirtualPathFilter provided:
that CherryPy could strip off and ignore a certain leading portion of
the URL. After much discussion and aborted code, we decided that this
isn't the way to go in the rest of the 2.x line (although it will
probably be true for CP 3). Instead, in 2.2+, you _must_ construct
cherrypy.root.apps.cherrypy.projectapp.tags.filters if you want that
method to respond to "/apps/cherrypy/projectapp/tags/filters". This was
the whole rationale behind having a Tree class and its "mount" method:
make that job easier. The only way around that is to have some external
code (e.g. mod_rewrite, mod_proxy) either rewrite the URL or proxy the
request (which is the same thing) before CP receives it.

> I don't really mind if this isn't included in 2.2. As a
> matter of fact, I agree with you (Robert) that it shouldn't.
> I just want to do what is right for CherryPy (and I know you
> do too) as far as WSGI interoperability goes. I am *very*
> new to WSGI and relatively new to the CP dev team, so I am
> going to submit to "rank" on this and shelve my proposal
> for now.

OK. I don't like to "pull rank"--I'd rather reach a consensus. But if
you feel that it's not worth the effort to reach a common understanding,
I hope you can hang on to your ideas until CP 3. I think what you've
proposed is the natural model for that future.

In the short term, I still think fixing _cpwsgiserver to output the
correct SCRIPT_NAME and PATH_INFO will provide you the solution you
desire for using WSGI middleware with CP. It means you have to shift
your thinking from "wsgiApp is a single WSGI app, mounted at X" to
"wsgiApp is the same callable for multiple WSGI apps, mounted
arbitrarily". If I'm not explaining that well enough, I'd like more
dialogue about it, because it will be very important for us to explain
that concept well as we proceed with CP 2.2.

Christian Wyglendowski

unread,
Feb 2, 2006, 9:16:59 PM2/2/06
to cherryp...@googlegroups.com
[snip my example]

I said:
>> object_path gets set to PATH_INFO, config lookups should work and
>> everything is cool (I think).

Robert said:
> Yes, except the current CP WSGI server doesn't do this. Instead, you'll
> get:
>
> SCRIPT_NAME = '/blogapp/posts/SCRIPT-NAME-and-PATH-INFO-hell'
> PATH_INFO = '' #_cpwsgiserver sets *all* PATH_INFO's to ''
>
> [which is why I said _cpwsgiserver needs to be fixed, in my last post]
> In this case, setting object_path to PATH_INFO won't work.

Right - the current one doesn't. My modified one does. However, I made
my modifications in _cpwsgi - I think you are right that the change
belongs in _cpwsgiserver.

I said:
>> Using another WSGI server....


>> SCRIPT_NAME = '/apps/cherrypy' #the path to the CP wsgiApp
>> PATH_INFO = '/projectapp/tags/filters'
>>
>> Again, object_path set to PATH_INFO should cause no problems.

Robert said:
> The problem here is that you should expect:
>
> SCRIPT_NAME = '/apps/cherrypy/projectapp'
> PATH_INFO = '/tags/filters'
>
> That's what I hear Ian saying that Paste and "other WSGI components that
> obey the spec" will hand you. Correct me if I'm wrong.

I hope Ian will comment, but the way I understood it, SCRIPT_NAME is the
path up to the wsgiApp callable. '/projectapp/tags/filters' is a path
to a "sub-app" hosted with in the CP wsgiApp that is dispatched by CP.

Robert said:
> I could be wrong, and SCRIPT_NAME really should be '/apps/cherrypy'
> (because CP's wsgiApp is 'a single WSGI app' instead of 'the same
> callable for multiple WSGI apps' as I designed it). But that still
> doesn't address the problem that several WSGI servers in the field,
> including CP's own builtin one, don't hand you the expected SCRIPT_NAME
> and PATH_INFO.

'The same callable for multiple WSGI apps' is an interesting way to put
it - doesn't seem quite right. I could see 'the same WSGI callable for
multiple CP apps', but not the other way around. Regarding broken
servers; IMO, those WSGI servers/gateways need to be fixed - CP
shouldn't hack around them.

Robert said:
> I think I finally understand the use cases you're trying to support
> here--you want the same behavior that the VirtualPathFilter provided:
> that CherryPy could strip off and ignore a certain leading portion of
> the URL.

Yes, it is similar. However, there is no extra config step within CP.
The app can be moved from /dev/cherrypy to /prod/cherrypy without CP
having to know about it. I guess that gets back to whether or not CP
needs to know beforehand the complete tree structure of the location it
is mounted.

More Robert:


> After much discussion and aborted code, we decided that this
> isn't the way to go in the rest of the 2.x line (although it will
> probably be true for CP 3). Instead, in 2.2+, you _must_ construct
> cherrypy.root.apps.cherrypy.projectapp.tags.filters if you want that
> method to respond to "/apps/cherrypy/projectapp/tags/filters". This was
> the whole rationale behind having a Tree class and its "mount" method:
> make that job easier. The only way around that is to have some external
> code (e.g. mod_rewrite, mod_proxy) either rewrite the URL or proxy the
> request (which is the same thing) before CP receives it.

I realize this decision has been made for 2.2, but how does CP then
share a common root with other applications on any server platform? For
instance, hosting on Apache with mod_python, how could I have both
'/apps/cherrypy/fooapp' and '/apps/moinmoin/wiki'? Or sharing with a
PHP app - '/apps/cherrypy/barapp' and '/apps/wordpress/blog'?

More Christian:


>> I don't really mind if this isn't included in 2.2. As a
>> matter of fact, I agree with you (Robert) that it shouldn't.
>> I just want to do what is right for CherryPy (and I know you
>> do too) as far as WSGI interoperability goes. I am *very*
>> new to WSGI and relatively new to the CP dev team, so I am
>> going to submit to "rank" on this and shelve my proposal
>> for now.

Robert said:
> OK. I don't like to "pull rank"--I'd rather reach a consensus. But if
> you feel that it's not worth the effort to reach a common understanding,
> I hope you can hang on to your ideas until CP 3. I think what you've
> proposed is the natural model for that future.

I didn't feel like you were trying to "pull rank", I was just ready to
admit that I might not be seeing the whole picture. I would much rather
come to a common understanding as well.

Robert said:
> In the short term, I still think fixing _cpwsgiserver to output the
> correct SCRIPT_NAME and PATH_INFO will provide you the solution you
> desire for using WSGI middleware with CP.

That would be great. I'll look into it.

Robert said:
> It means you have to shift
> your thinking from "wsgiApp is a single WSGI app, mounted at X" to
> "wsgiApp is the same callable for multiple WSGI apps, mounted
> arbitrarily". If I'm not explaining that well enough, I'd like more
> dialogue about it, because it will be very important for us to explain
> that concept well as we proceed with CP 2.2.

Yeah, I am definitely not jiving with "wsgiApp [as] the same callable
for multiple WSGI apps". I just don't see CP's "apps" as WSGI apps.

Thanks for bearing with me on this.

Christian
http://www.dowski.com

Robert Brewer

unread,
Feb 3, 2006, 4:18:25 PM2/3/06
to cherryp...@googlegroups.com
Christian wrote:
> SCRIPT_NAME = '/apps/cherrypy' #the path to the CP wsgiApp
> PATH_INFO = '/projectapp/tags/filters'

and I countered with:


> The problem here is that you should expect:
>
> SCRIPT_NAME = '/apps/cherrypy/projectapp'
> PATH_INFO = '/tags/filters'

to which Christian answered:
> ...the way I understood it, SCRIPT_NAME is the path up to the


> wsgiApp callable. '/projectapp/tags/filters' is a path to a

> "sub-app" hosted within the CP wsgiApp that is dispatched by CP.

Phillip J. Eby wrote on web...@python.org (in a different discussion):
> My suggestion would be to add an extra WSGI key,
> maybe 'wsgi.application_root' to represent the
> "original application SCRIPT_NAME" for frameworks
> that have such a concept. Templates using Routes
> could then use that variable in place of SCRIPT_NAME.
> It seems to me that Zope request/response objects
> also need this information, in order to generate
> the magic URL0-URL9 and other such variables.
>
> The application root should of course be set at the entry point
> of the framework, so in the case of Routes, Routes could simply
> copy SCRIPT_NAME to application_root in environ if there isn't
> one already set. It could then simply use application_root
> first when generating URLs, and for that matter it could add
> extension APIs to the environ to allow accessing Routes
> APIs from embedded apps.
>
> ...this allows entire mini-applications with their own URL
> space to be embedded as "templates" within a containing
> application. Such apps can then rely on SCRIPT_NAME as
> being their root, even if they weren't originally written
> for embedding.

IMO, Phillip's comments add support for my interpretation of SCRIPT_NAME
(as it relates to CherryPy): that it's perfectly OK for SCRIPT_NAME to
be something other than "the path up to the wsgiApp callable". When I
wrote wsgiApp, it was designed to be completely independent of the URL:
one callable for multiple "WSGI apps"; each "CP app", when invoked via
_cpwsgi, equates to one "WSGI app". I can see now that I should not have
called it "wsgiApp". :(

PEP 333 takes pains to say "the application object" or "the application
callable" in many, many places where it could have just said "the
application"--I think there's a reason, and it is that "the callable" is
supposed to be decoupled from "the application". One common way to
implement that for a framework is to make "the callable" a factory
function which returns "the real callable"; I chose in CP to use
polymorphism ("many apps -> one function") rather than use a factory
("many apps -> one function -> many functions").

/soapbox off ;)

Christian Wyglendowski

unread,
Feb 3, 2006, 5:08:49 PM2/3/06
to cherryp...@googlegroups.com
at some point I said:
>> SCRIPT_NAME = '/apps/cherrypy' #the path to the CP wsgiApp
>> PATH_INFO = '/projectapp/tags/filters'
>

and Robert countered:


>> The problem here is that you should expect:
>>
>> SCRIPT_NAME = '/apps/cherrypy/projectapp'
>> PATH_INFO = '/tags/filters'
>

then I answered:


>> ...the way I understood it, SCRIPT_NAME is the path up to the
>> wsgiApp callable. '/projectapp/tags/filters' is a path to a
>> "sub-app" hosted within the CP wsgiApp that is dispatched by CP.

then Phillip J. Eby wrote on web...@python.org (in a different discussion):


>> My suggestion would be to add an extra WSGI key,
>> maybe 'wsgi.application_root' to represent the
>> "original application SCRIPT_NAME" for frameworks
>> that have such a concept. Templates using Routes
>> could then use that variable in place of SCRIPT_NAME.
>> It seems to me that Zope request/response objects
>> also need this information, in order to generate
>> the magic URL0-URL9 and other such variables.
>>
>> The application root should of course be set at the entry point
>> of the framework, so in the case of Routes, Routes could simply
>> copy SCRIPT_NAME to application_root in environ if there isn't
>> one already set. It could then simply use application_root
>> first when generating URLs, and for that matter it could add
>> extension APIs to the environ to allow accessing Routes
>> APIs from embedded apps.
>>
>> ...this allows entire mini-applications with their own URL
>> space to be embedded as "templates" within a containing
>> application. Such apps can then rely on SCRIPT_NAME as
>> being their root, even if they weren't originally written
>> for embedding.

which Robert followed with:


> IMO, Phillip's comments add support for my interpretation of SCRIPT_NAME
> (as it relates to CherryPy): that it's perfectly OK for SCRIPT_NAME to
> be something other than "the path up to the wsgiApp callable". When I
> wrote wsgiApp, it was designed to be completely independent of the URL:
> one callable for multiple "WSGI apps"; each "CP app", when invoked via
> _cpwsgi, equates to one "WSGI app". I can see now that I should not have
> called it "wsgiApp". :(

I've been following that web-sig discussion as well, and maybe I'm wrong
here, but I think he is referring to changing SCRIPT_NAME within a
wsgiApp for the benefit of a hosted wsgiApp. And that is totally cool.
Once we are inside of the CP dispatch process, if it can determine
that "projectapp" is the target CP-app in question and rewrite
SCRIPT_NAME from "/apps/cherrypy" to "/apps/cherrypy/projectapp" and
PATH_INFO to "/tags/filters", that is fine with me.

I actually do that in my WSGIAppFilter - I use
_cputil.get_object_trail() to determine what portion of PATH_INFO
corresponds to objects mounted on the tree. So SCRIPT_NAME becomes
SCRIPT_NAME + cp_tree_object_parts. The remainder of the path that
doesn't correspond to objects in the CP tree is set to PATH_INFO. Thus,
the hosted wsgi app gets a SCRIPT_NAME and PATH_INFO that are correct in
its context.

> PEP 333 takes pains to say "the application object" or "the application
> callable" in many, many places where it could have just said "the
> application"--I think there's a reason, and it is that "the callable" is
> supposed to be decoupled from "the application". One common way to
> implement that for a framework is to make "the callable" a factory
> function which returns "the real callable"; I chose in CP to use
> polymorphism ("many apps -> one function") rather than use a factory
> ("many apps -> one function -> many functions").

I like the polymorphism approach. I just think that the
_cpwsgi.wsgiApp, as a single "application callable", should dispatch to
its multiple hosted wsgi apps with PATH_INFO (i.e., that should be the
object_path). From within, if it wants to rewrite SCRIPT_NAME and
PATH_INFO to reflect the true location of the internal app, that is
fine. It really is the only way that a correct SCRIPT_NAME and
PATH_INFO could be set since only CP knows the internal object structure
of the tree.

So here is what I am proposing, in a nutshell:

1. Have CP ignore SCRIPT_NAME for locating the object requested.
2. Use PATH_INFO to locate the requested object.
3. Optionally: rewrite SCRIPT_NAME to be the original SCRIPT_NAME + the
"found application" (however that is determined - I gave my
WSGIAppFilter example).
4. Use SCRIPT_NAME + path when writing issuing redirects and other URL
writing tasks.

> /soapbox off ;)

Me too :-)

Christian

Christian Wyglendowski

unread,
Feb 5, 2006, 11:22:52 AM2/5/06
to cherryp...@googlegroups.com
Alright, I took a different approach on this. As you suggested, I went
after _cpwsgiserver this time.

Robert Brewer wrote:
> Are we still talking about your patch for #455? I don't understand why
> _cpwsgi doesn't do what you want already. I most definitely can
> understand that _cpwsgiserver doesn't set SCRIPT_NAME and PATH_INFO per
> the spec. That should be changed to properly set those values.

I have updated #455 (http://www.cherrypy.org/ticket/455) with a new
patch. _cpwsgiserver should now set the correct SCRIPT_NAME and
PATH_INFO. It probably still needs some fine tuning, but I didn't want
to go much further on it until I got some feedback (yea or nay).

> That
> would mean making the WSGIServer more configurable; you'd have to tell
> it all of your app roots when you start it up (like every other WSGI
> server is being forced to do).

That is basically what I did. The default behavior still simply takes a
single app and serves it. There is now an add_application(path, app)
method that lets you mount an app at a different point. As I think
about it, maybe this behavior should mirror the cherrypy.tree object a
little more. Maybe server.add_application() should become server.mount()?

Anyhow, the bottom line is that _cpwsgiserver can now host multiple WSGI
apps at multiple locations. The mount point for the application becomes
its SCRIPT_NAME. When a URL is requested, the server determines by its
path which application should handle it.

> This would take a bit of work to get
> server.start to pass CP's tree.mount_points to the WSGI server, while
> still remaining decoupled.

Hhhmm...I wasn't really sure what you meant by this. I'm not sure why
the _cpwsgiserver would need to be aware of application locations within
the main wsgiApp callable.

> In other words, SCRIPT_NAME and PATH_INFO
> should be fixed in _cpwsgiserver.HTTPRequest.parse_request, not in
> _cpwsgi.CPHTTPRequest.parse_request.

Thanks for bearing with me on this still. Maybe even with this latest
patch this idea still isn't feasible, but I have at least got a better
understanding of the inner workings of CP in the process :-) What I
have put together passed all current tests, but would of course require
new tests if it is a direction we want to go.

Here is an example of using the new setup:

import cherrypy
from cherrypy._cpwsgi import wsgiApp
from somepackage import somemiddleware, some_simple_app

class Root:
@cherrypy.expose
def index(self):
return "Hello, world!"

wrapped_app = somemiddleware(wsgiApp)

cherrypy.tree.mount(Root(), '/')

# tell CP to use our custom wrapped CP wsgiApp
cherrypy.server.cp_wsgi_app = wrapped_app

# add another wsgi app to host (maybe this is beyond the scope of CP?)
cherrypy.server.mount_wsgi_app('/over/here', some_simple_app)

cherrypy.server.start()

Christian Wyglendowski

unread,
Feb 6, 2006, 7:32:18 AM2/6/06
to cherryp...@googlegroups.com
Christian Wyglendowski wrote:
> # add another wsgi app to host (maybe this is beyond the scope of CP?)
> cherrypy.server.mount_wsgi_app('/over/here', some_simple_app)

Ok, I'll answer my own question. It *is* beyond the scope of CP. I
have removed it in my latest patch attached to the ticket.

Christian
http://www.dowski.com

Reply all
Reply to author
Forward
0 new messages