Mounting web2py at sub URL of web site and not root of web site.

139 views
Skip to first unread message

Graham Dumpleton

unread,
Sep 4, 2009, 12:19:39 AM9/4/09
to web2py-users
At the moment, web2py will only work if mounted at root of the web
site. Thus, using Apache/mod_wsgi you can only say:

WSGIScriptAlias / /Users/grahamd/Testing/web2py/wsgihandler.py

You cannot say:

WSGIScriptAlias /subdir /Users/grahamd/Testing/web2py/wsgihandler.py

Since web2py seems to perform all URL construction for purposes of
links using URL, only one code change appears to be required in web2py
core to allow this to work. This is to the gluon.html.URL().


*** html.py.dist 2009-09-04 14:05:02.000000000 +1000
--- html.py 2009-09-04 14:06:48.000000000 +1000
***************
*** 182,188 ****
if vars:
other += '?%s' % urllib.urlencode(vars)

! url = '/%s/%s/%s%s' % (application, controller, function, other)

if regex_crlf.search(url):
raise SyntaxError, 'CRLF Injection Detected'
--- 182,192 ----
if vars:
other += '?%s' % urllib.urlencode(vars)

! if r and r.env.script_name:
! url = '%s/%s/%s/%s%s' % (r.env.script_name, application,
! controller, function, other)
! else:
! url = '/%s/%s/%s%s' % (application, controller, function,
other)

if regex_crlf.search(url):
raise SyntaxError, 'CRLF Injection Detected'


Alas, because this is needing to access SCRIPT_NAME from original WSGI
environment, for this to work properly will require that 'request'
also be supplied in all cases where URL is used for URL
reconstruction.

For example, currently the 'welcome/views/default/index.html' page
says:

{{extend 'layout.html'}}

{{try:}}{{=H2(message)}}{{except:}}{{=BEAUTIFY(response._vars)}}
{{pass}}

{{=P(A(T("click here for the administrative interface"), _href=URL
('admin','default','index')),_style="padding-top:1em;")}}
{{=P(A(T("click here for online examples"), _href=URL
('examples','default','index')))}}

This needs to be changed to:

{{extend 'layout.html'}}

{{try:}}{{=H2(message)}}{{except:}}{{=BEAUTIFY(response._vars)}}
{{pass}}

{{=P(A(T("click here for the administrative interface"), _href=URL
('admin','default','index', r=request)),_style="padding-top:1em;")}}
{{=P(A(T("click here for online examples"), _href=URL
('examples','default','index', r=request)))}}

This will ensure that URL() has access to request object and so is
able to insert into the URL the value of SCRIPT_NAME, representing the
mount point of the WSGI application.

I don't know enough about web2py to know if there are other situations
where it would also be required to pass request object to ensure this
is all done correctly. Internally I think the core code always passes
through request object, so problem is more likely to be HTML templates
for views.

Graham

Graham Dumpleton

unread,
Sep 4, 2009, 12:28:10 AM9/4/09
to web2py-users


On Sep 4, 2:19 pm, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
Yeah, lot of stuff in 'admin' application for example, in controllers
as well as views needs to pass through the request object as currently
doesn't do so. Until that is all cleaned up, then web2py wouldn't be
able to mounted at arbitrary mount point.

Even when web2py cleaned up, guideline for any user code would be to
always pass through request object if you need this ability.

Graham

mdipierro

unread,
Sep 4, 2009, 2:11:39 AM9/4/09
to web2py-users
Hi Graham,

I do not think this is necessary. You can just create a routes.py file
in the web2py folder and write in it

routes_in=(('/grahamd/(?P<any>.*)','/\g<any>'),)
routes_out=(('/(?P<any>.*)','/grahamd/\g<any>'),)

we have been using this for pycon registration for example.

On Sep 3, 11:19 pm, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:

Graham Dumpleton

unread,
Sep 4, 2009, 2:21:30 AM9/4/09
to web2py-users


On Sep 4, 4:11 pm, mdipierro <mdipie...@cs.depaul.edu> wrote:
> Hi Graham,
>
> I do not think this is necessary. You can just create a routes.py file
> in the web2py folder and write in it
>
> routes_in=(('/grahamd/(?P<any>.*)','/\g<any>'),)
> routes_out=(('/(?P<any>.*)','/grahamd/\g<any>'),)
>
> we have been using this for pycon registration for example.

That would go against the WSGI way of doing things, in that the WSGI
application shouldn't within its routing have the mount point
hardwired into the application code.

Graham

mdipierro

unread,
Sep 4, 2009, 2:26:52 AM9/4/09
to web2py-users
I think we can modify your patch so that it does not require r=request
(which would break some apps)

I think request.env..script_name is the same for all applicaitons
within one web2py installation (correct?)
In this case we can store it in settings and retrieve it from inside
URL.

What do you think?

Massimo

On Sep 4, 1:21 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:

Graham Dumpleton

unread,
Sep 4, 2009, 5:59:41 AM9/4/09
to web2py-users
On Sep 4, 4:26 pm, mdipierro <mdipie...@cs.depaul.edu> wrote:
> I think we can modify your patch so that it does not require r=request
> (which would break some apps)

Why would requiting r=request for URL calls break some applications?

The code for URL is:

application = controller = function = None
if r:
application = r.application
controller = r.controller
function = r.function
if a:
application = a
if c:
controller = c
if f:
if isinstance(f, str):
function = f
else:
function = f.__name__

if not (application and controller and function):
raise SyntaxError, 'not enough information to build the url'

So supplying it where it wasn't previously shouldn't matter that I can
see as previously for it to work the called would have had to provide
a, c and f. If they are doing that they would still override what is
specified by the request object.

I could understand it breaking an overall application if an
application defined in or out routes which were conflicting and one
approach is perhaps to disable the automatic insertion of SCRIPT_NAME
if in/out routes, or routes detectable as conflicting, were defined.

At the moment the way I see it is if r=request not used everywhere, it
just means an appliance is relocatable as far as mount point goes. It
doesn't really mean that backward compatibility has been broken as the
application would still work when web2py at root of site.

Thus, using r=request becomes a guideline rather than a requirement.
If people want relocatable applications, they should use it. If they
don't care, don't have to. If they use in/out routes wrongly, they
could break relocatable applications.

> I think request.env..script_name is the same for all applicaitons
> within one web2py installation (correct?)

Technically no. Apache/mod_wsgi, and possibly any WSGI stack where you
use external routing to web2py, allow you to do a lot of odd things.
Take for example the Apache configuration:

WSGIScriptAliasMatch ^/u([0-9][0-9]*)/myapp(/.*)?$ \
/Users/grahamd/Testing/web2py/wsgihandler.py/myapp$2

<LocationMatch ^/u([0-9][0-9]*)/examples(/.*)?$>
WSGIApplicationGroup users
</LocationMatch>

What we have hear is many sub URLs of the site mapping to the web2py
application instance. Because we have used WSGIApplicationGroup,
rather than an instance for each sub URL, all requests still get
routed through to a single web2py instance. The application can change
its behaviour dependent on the apparent URL at which is was mounted.

In the example I gave above, I used a sub URL of the form 'u([0-9][0-9]
*)' as sort of indicative of a student number, but more practical case
where people have specifically asked how to do it in the past, is
where the qualifier may be a business name or company section.

In other words, they want to give prominence to distinguisher, be it a
student name, company name or section, as if each had its own
application instance where they don't actually and instead the single
application internally just shows different data based on mount point,
ie., SCRIPT_NAME.

So, URL of:

http://web2py.example.com/u1234/myapp

yields SCRIPT_NAME of:

/u1234

but URL of:

http://web2py.example.com/u1235/myapp

yields:

/u1235

It is quite possible that your in/out routes can perform the same
purpose, but that sort of defeats the idea that WSGI applications can
acts as components as well. Thus, a web2py instance could be composed
with other WSGI components within a Pylons stack where the Pylons
stack would deal with the external routing and the web2py application
should work without special configuration whatever the value of
SCRIPT_NAME.

> In this case we can store it in settings and retrieve it from inside
> URL.
>
> What do you think?

There are two issues around using gluon.settings.settings that I can
see.

The first is that you don't know the value of SCRIPT_NAME until the
first request is received so setting it would need to be atomic enough
that if multiple initial requests come in at same time that one
doesn't see a half set state for the dictionary. I am not too keen on
per request settings in a global configuration. Django when using
mod_python does stupid stuff like that, setting os.environ based on
per request information.

The second is that the whole idea of a global dictionary like
gluon.settings.settings means that you can only have one web2py
instance running within a Python sub interpreter. This limits the
ability to compose multiple distinct instances of web2py together
using something like Pylons. But then the structure of your site
instance with gluon contained within it, sort of precludes that
anyway.

Some frameworks such as Pylons and Werkzeug get around this sort of
issue by using thread local storage to stash references to per request
information. It can get a bit weird when talking about multiple nested
WSGI components, an example being the StackedObjectProxy from Paste
that Pylons uses.

Anyway, web2py isn't the first web framework to have this limitation.
You can't have multiple instance of Django applications either. It is
just the design model which has been chosen and nothing particularly
wrong with.

Supporting multiple instances where each uses a different data area is
applied quite successfully with Trac, which allows one Trac instance
to handle request against multiple projects, where project data
location is dictate by variables passed in through the WSGI
environment dictionary.

BTW, I am only up to start of chapter 4 at this point, so my
understand of web2py is still very limited. At the moment only looking
at it from perspective of how well a behaved WSGI application/
component it is. Later I will though bring up some issues I can see
with large scale deployment and distribution of individual appliances
to many sites.

Graham

Joe Barnhart

unread,
Sep 5, 2009, 1:07:19 PM9/5/09
to web...@googlegroups.com
+1 for Graham.

Backward compatibility is not broken because relocating the web2py site in the directory structure is a new feature.  If you want to take advantage of the new feature, you just re-code your URL functions.  Old installations work as they always did.  (Also - can we call w2p "enterprise" if it doesn't play nice with other WSGI apps?)

-- Joe B.

mdipierro

unread,
Sep 5, 2009, 1:55:08 PM9/5/09
to web2py-users
I am not convinced. I had added this in trunk but I am taking it out
because this is not the right solution to the problem and because it
does break backward compatibility.

1) You can relocate web2py in a subfolder but you must use routes.py
2) The patch does break backward compatibility if a user is already
using routes.py for this purpose
3) I do not like the fact that web2py treats input URLs differently
than output URLs. If any url rewrite has to be made this should be
done at the level of routes.py (the configuration for the rewrite.py
module).

It is very easy

#in routes.py
routes_out=(('(?P<anything>.*)','/yourpath\g<anything>'),)

Massimo


On Sep 5, 12:07 pm, Joe Barnhart <joe.barnh...@gmail.com> wrote:
> +1 for Graham.
>
> Backward compatibility is not broken because relocating the web2py site in
> the directory structure is a new feature.  If you want to take advantage of
> the new feature, you just re-code your URL functions.  Old installations
> work as they always did.  (Also - can we call w2p "enterprise" if it doesn't
> play nice with other WSGI apps?)
>
> -- Joe B.
>
> On Fri, Sep 4, 2009 at 2:59 AM, Graham Dumpleton <graham.dumple...@gmail.com

mdipierro

unread,
Sep 5, 2009, 2:04:13 PM9/5/09
to web2py-users
I should add that this is not a closed issue.

Graham has a valid point and we need to find a way that makes
everybody happy. I would like to hear an argument about why this does
not break backward compatibility (I think it does if people already
use routes for this purpose) and about what is wrong with using routes
for this.

Massimo

Joe Barnhart

unread,
Sep 5, 2009, 4:31:00 PM9/5/09
to web...@googlegroups.com
What is the role of routes.py in a production environment?  In the book, you seemed to indicate that routes.py was not the solution of choice:

"All major web servers, for example Apache and lighttpd, also have the
ability to rewrite URLs. In a production environment we suggest having the
web server perform URL rewriting."

So are we supposed to use routes.py to fix issues like this, or not?  It seems conflicting to say now that routes.py is the accepted way to do URL rewriting in mod_wsgi under Apache.

-- Joe B.

mdipierro

unread,
Sep 5, 2009, 5:09:40 PM9/5/09
to web2py-users
If the web server provides the functionality to rewrite the output
URLs, I'd use it.

If not I would use routes.py

What I am not comfortable with (but I may be missing something here)
it implementing the request.env.script_name in URL since the output of
URL is already rewritten by module rewrite.py based on routes.py.

In any case I think the comment you refer to is in the old manual.
routes has grown over time and I no longer discourage its use.

Massimo

On Sep 5, 3:31 pm, Joe Barnhart <joe.barnh...@gmail.com> wrote:
> What is the role of routes.py in a production environment?  In the book, you
> seemed to indicate that routes.py was not the solution of choice:
>
> "All major web servers, for example Apache and lighttpd, also have the
> ability to rewrite URLs. In a production environment we suggest having the
> web server perform URL rewriting."
>
> So are we supposed to use routes.py to fix issues like this, or not?  It
> seems conflicting to say now that routes.py is the accepted way to do URL
> rewriting in mod_wsgi under Apache.
>
> -- Joe B.
>
Reply all
Reply to author
Forward
0 new messages