Correct URL paths and compromises (ticket #285)

3 views
Skip to first unread message

Malcolm Tredinnick

unread,
Jul 5, 2008, 9:51:38 PM7/5/08
to django-d...@googlegroups.com
I thought I'd sit down yesterday, do a final review of #285 and commit
it. I was naïve. It turned out to be all I did for the day.

There are some compromises needed somewhere, so it's time for audience
participation. When responding to this email, please try to keep in mind
that there are many, many different ways people install and use Django
and any solution has to work for all of them, not just your preferred
method. This also isn't a case where you can say "the specs say..",
since any specs aren't worth the paper they're printed on here. What
counts is what webservers used here in the real world actually do.
Fortunately, web servers mostly follow the specs, but that doesn't mean
ISPs do.

The good news: I have a patch that is mostly backwards compatible,
doesn't require intrusive code changes and even works with Apache's
mod_rewrite, thus avoiding counter-intuitive URLs when using apache
+fastcgi the way a lot of shared-hosting environments use it
(mod_rewrite plus a django.fcgi file). Tested it on a bunch of sets
(nginx, lighttpd, cherrypy's wsgi server, apache +
fastcgi/mod_python/mod_wsgi) and everything looks good.

Thus endeth the good news.

To understand the bad news, a quick intro to the problems we have and
some "ideal" solutions (there are some proposed solutions at the bottom
for the attention deficit types who want to skip ahead)...

Firstly, there's the "development" vs "production" differences with URL
prefixes. You don't always know at development time what prefix the
applications will be installed under. This shouldn't be a problem, since
web servers set SCRIPT_NAME to be the portion they are managing and (in
theory) PATH_INFO to be the bits that are passed to your application for
handling. So SCRIPT_NAME + PATH_INFO (+ QUERY_STRING) is the URL. Except
things are never that easy. Mod_python (and a few other Apache plugins)
have noticed that PATH_INFO isn't always set correctly by Apache,
mod_rewrite changes things around and so on. Still, we can work around
all of those, so let's safely assume we have a SCRIPT_NAME portion that
is the webserver prefix and PATH_INFO is the bit our Django apps care
about for URL parsing (it's true, we can derive those bits in all cases
with minimal amounts of hassle).

Code needs to know how to construct proper URLs. Which means, amongst
other things, it needs to know SCRIPT_NAME so that that can be added as
a prefix. Portable code would also like to only have to work with
PATH_INFO, so that it is agnostic about the prefix under which it is
installed. Thus developers can have things under whatever prefix they
like and when installation happens, there isn't a dependency on the
final deployment URL.

Problem #1:
-----------

I suspect there are a number of installations around, particularly using
mod_python, that have Apache configuration files looking something like
this:

<Location /admin/>
PythonHandler django.core.handlers.modpython
SetEnv DJANGO_SETTINGS_MODULE mysite.settings
...
</Location>

<Location /site_prefix/>
PythonHandler django.core.handlers.modpython
SetEnv DJANGO_SETTINGS_MODULE mysite.settings
...
</Location>

Only *some* pieces under "/" are handled by Django, leaving the rest of
that namespace free for static documents, other scripts, etc. The fact
that mod_python passes through the URL as "/admin/foo/" and
"/site_prefix/foo/" means that the URL file (it's the same URLConf file
in both cases, since it's the same settings file) can differentiate
between the two.

Portable URL practices here would mean that the Django code shouldn't
care (or know much) about "/site_prefix/" and "/admin/" in those cases,
so there's no easy way to tell them apart. This is particularly
problematic with newforms-admin, since it's urlpatterns entry is

"admin/(.*)$"

Strip the leading "admin/" and no other pattern is going to get a look
in.

But that isn't the biggest problem...

Problem #2
-----------

The "{% url ... %}" template tag and the reverse() function. :-(

Both of these need to be aware of SCRIPT_NAME (or the equivalent) so
that they can put the right prefix onto the URLs they construct. Since
template rendering is independent of the current request, this is really
hard to work around. To the point that I don't have a solution that I'd
be comfortable including in the code. Any code that is intended to be at
all portable would need to be passing the URL prefix into every
HTTP-destined template rendering call, or else we'd need to have it
available in the thread's environment (the latter is the closest I've
come to finding happiness -- we already use the current thread's
environment for the active translation context, so this would be another
aspect like that).

Solutions(?)
============

Firstly, I'll say that we have to include something to fix the problem
with the wrong path being passed through. On everything except
mod_python, the SCRIPT_NAME is an important component of the path.
That's a solved problem, though. John Melesky's patches in #285 did most
of the work and I've shuffled things a little to make it more backwards
compatible and to handle mod_rewrite in the Apache case. So no problems
there. Take it as given that we present the proper full path in the
request object. That's a bug fix, nothing more or less.

The meta-problem is that to avoid problems #1 and #2 above, we need to
keep the URLConf files aware of the full path (solution #3 below tries
to avoid this, but it has a hole).

Solution #1
===========

Nothing changes in URLConf-land. Every time you install under a
different prefix, you need to edit your root URLConf (only). This means
that if you're writing code that is installation location agnostic, it's
going to look like this:

SITE_PREFIX="/site_prefix/"

urlpatterns = patterns('',
('^%s/foo/...' % SITE_PREFIX, ....),
...
)

I've written a couple of sites like that and all that SITE_PREFIX stuff
hanging around is kind of noisy and interferes with the real point of
the code. But it gets the job done.

So no changes == minimally disruptive, but slightly messy long-term.

Solution #2
===========

We introduce a new second argument to patterns() which is the common
prefix to put before all the patterns in that call. This isn't hard on a
technical level and would be backwards compatible, if a little prone to
misreading of old code. The above example now becomes

urlpatterns = patterns('', SITE_PREFIX,
('foo/...', ...),
...
)

Less shouting all round.

Solution #3
===========

We shove the current SCRIPT_NAME prefix into the currently active
context, just as we do with the active locale. The reverse() function
knows to look there for the prefix (and if nothing's present it's the
same as an empty prefix). Using the current thread's context doesn't
make me deliriously happy, but I can live with it for something like
this.

This is the neatest solution from an ideal world perspective, since it
respects the design principles behind PATH_INFO and SCRIPT_NAME and
similar webserver-set environment variables.

Unfortunately, I don't see how to make something like the admin pattern
for newforms-admin work, then. Particularly under installations do the
SCRIPT_NAME / PATH_INFO split. If we could find a way of saying "these
go to the admin path, these go to the foo path, these go to the blah
path", I'd probably like this solution a bit more.

<End of solutions>

Personally, I prefer solution #2 and somehow I'll learn to live with the
fact there'll always be the ugliness of not being prefix-independent.
It's sad, but it might be the most pragmatic.

However, I'd like to hear some other well-considered opinions first in
case there's a possibility I've forgotten.

Regards,
Malcolm

Malcolm Tredinnick

unread,
Jul 5, 2008, 10:38:32 PM7/5/08
to django-d...@googlegroups.com
No matter how careful I am with the "set everything out", something
always gets forgotten ...

On Sun, 2008-07-06 at 11:51 +1000, Malcolm Tredinnick wrote:
[...]


> Solution #3
> ===========
>
> We shove the current SCRIPT_NAME prefix into the currently active
> context, just as we do with the active locale. The reverse() function
> knows to look there for the prefix (and if nothing's present it's the
> same as an empty prefix). Using the current thread's context doesn't
> make me deliriously happy, but I can live with it for something like
> this.
>
> This is the neatest solution from an ideal world perspective, since it
> respects the design principles behind PATH_INFO and SCRIPT_NAME and
> similar webserver-set environment variables.
>
> Unfortunately, I don't see how to make something like the admin pattern
> for newforms-admin work, then. Particularly under installations do the
> SCRIPT_NAME / PATH_INFO split. If we could find a way of saying "these
> go to the admin path, these go to the foo path, these go to the blah
> path", I'd probably like this solution a bit more.

I forgot to mention that this solution would also allow people more
access to custom URLConf methods that are set on a per-request basis,
since the code seems to be encouraging people to use those (not
something I'm amazingly in love with, but we've already started down
that slope). So this would kill #3530 and #5034 at the same time (and if
we don't go this, then we're effectively saying we wontfix those tickets
as well).

So if we can find a way to overcome problem #1 (and I also forgot to
mention that separate settings files that differed only by their
ROOT_URLCONF setting would be a solution to the multi-Location issue),
then this would graduate to my preferred solution.

Malcolm


Ken Arnold

unread,
Jul 5, 2008, 10:53:33 PM7/5/08
to Django developers
On Jul 5, 9:51 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
> Solution #1
> ===========
>         SITE_PREFIX="/site_prefix/"
>
>         urlpatterns = patterns('',
>            ('^%s/foo/...' % SITE_PREFIX, ....),
>            ...
>         )
> Solution #2
> ===========
>         urlpatterns = patterns('', SITE_PREFIX,
>            ('foo/...', ...),
>            ...
>         )

-1 on adding another parameter to `patterns`; it's messy enough as-is.
(In hindsight, the second parameter should have been a tuple of
patterns anyway, but that's barely worth changing now.)

Fortunately there's a better solution for both of those, namely, the
one we're already using for `include`. The RegexURLResolver object
already has a `regex` variable that would handle this gracefully:
instead of:

RegexURLResolver(r'^/', urlconf)

you just do:

RegexURLResolver(r'^/'+SITE_PREFIX, urlconf)

Problem solved. Almost. Unfortunately, that line I just quoted is in
django.core.handlers.base and also django.core.urlresolvers. That is
to say, it's hard-coded in, in a way that's a good deal more difficult
to override than it should be. The source of the problem is the
*convention* that urlconf is a module that has a `urlpatterns`
variable. Making that be *configuration* instead would obviate the
problem.

Proposed solution: introduce settings.root_resolver. Default it to
something like:

def root_resolver():
return RegexURLResolver(settings.RESOLVER_PREFIX,
settings.ROOT_URLCONF)

(Too bad settings isn't a class, otherwise we could use a descriptor
and make it just settings.ROOT_RESOLVER, and take advantage of
inheritance, which also makes the whole notion of default settings
follow from ordinary OOP.)

Then you'd just put in your settings file:
RESOLVER_PREFIX = '^/site_prefix'
for Malcolm's particular example, or override root_resolver (notice
the OOP language) for more involved cases like having multiple
prefixes. See my recent post about refactoring the resolver for ideas
there.

I'm just using this issue as a way to advocate for more configuration,
but to be honest, you could also solve this issue with machinery
Django already provides, using a shim urlconf. I think that's even
described in the docs. Anyway, it would look like:

site/
urls_shim.py: patterns('', ('^site_prefix',
include(site.urls)), ('^admin', include(django.contrib.admin.urls))
urls.py: everything else
settings.py: ROOT_URLCONF = 'site.urls_shim'

I hope I'm actually understanding the issue and not needlessly
complicating things.

Regards, and thanks everyone for all the hard work.
-Ken

Malcolm Tredinnick

unread,
Jul 5, 2008, 11:06:13 PM7/5/08
to django-d...@googlegroups.com

That assumes the prefix is the same for all the urlpatterns you'll ever
include. Which isn't the case in the problem #1 situation. Allowing the
prefix to passed into patterns() means you can have different prefixes
for different collections of URLs.

I also intentionally didn't advocate a new setting here, since it isn't
necessary. People can put a setting for it in their settings file if
they want, but it's not compulsory (and might be too restrictive --
which is one of the things this thread is trying to work out).

Malcolm


alex....@gmail.com

unread,
Jul 5, 2008, 11:52:06 PM7/5/08
to Django developers
I have an idea about the admin problem, what if we changed the admin
back to be a normal includes and have admin_site.root actually be a
getter for an object that for all purposes acts like a urlconf file(ie
it has an attr: urlpatterns that contains a normal pattern object)?

On Jul 5, 10:06 pm, Malcolm Tredinnick <malc...@pointy-stick.com>

Ivan Sagalaev

unread,
Jul 6, 2008, 8:21:46 AM7/6/08
to django-d...@googlegroups.com
Malcolm Tredinnick wrote:
> <Location /admin/>
> PythonHandler django.core.handlers.modpython
> SetEnv DJANGO_SETTINGS_MODULE mysite.settings
> ...
> </Location>
>
> <Location /site_prefix/>
> PythonHandler django.core.handlers.modpython
> SetEnv DJANGO_SETTINGS_MODULE mysite.settings
> ...
> </Location>

I believe setting up Apache like this effectively means that SCRIPT_NAME
is empty and one wants to handle all those urls as PATH_INFO. I.e.
"/site_prefix/" here is not exactly some out-of-bound prefix because
"/admin/" would be "/site_prefix/admin/" otherwise.

So I think the solution to this problem is proper documentation. What do
you think?

> The "{% url ... %}" template tag and the reverse() function. :-(
>
> Both of these need to be aware of SCRIPT_NAME (or the equivalent) so
> that they can put the right prefix onto the URLs they construct. Since
> template rendering is independent of the current request, this is really
> hard to work around.

I think I can come up with something that looks like a compromise of all
your solutions :-)

Basically we can use an urlresolver itself to keep a SCRIPT_NAME. It can
be set by HTTTP handler on first request or on every request. It doesn't
matter much because we will document that a single Django project can
have only one site prefix. I think it's a reasonable "restriction"
because it's how it is in most cases.

To sum up:

- urlresolver is used instead of thread's environment
- no changes to urlconfs, resolver (and hence reverse) knows about
SCRIPT_NAME by itself

Malcolm Tredinnick

unread,
Jul 6, 2008, 8:37:55 AM7/6/08
to django-d...@googlegroups.com

On Sun, 2008-07-06 at 16:21 +0400, Ivan Sagalaev wrote:
> Malcolm Tredinnick wrote:
> > <Location /admin/>
> > PythonHandler django.core.handlers.modpython
> > SetEnv DJANGO_SETTINGS_MODULE mysite.settings
> > ...
> > </Location>
> >
> > <Location /site_prefix/>
> > PythonHandler django.core.handlers.modpython
> > SetEnv DJANGO_SETTINGS_MODULE mysite.settings
> > ...
> > </Location>
>
> I believe setting up Apache like this effectively means that SCRIPT_NAME
> is empty and one wants to handle all those urls as PATH_INFO. I.e.
> "/site_prefix/" here is not exactly some out-of-bound prefix because
> "/admin/" would be "/site_prefix/admin/" otherwise.

No, SCRIPT_NAME would be more or less /site_prefix/ in the first case
and /admin/ in the second case. They aren't part of PATH_INFO. I say
"more or less" because sometimes the handler executable is include in
the SCRIPT_NAME passed from Apache and sometimes it isn't. I can't
remember which it is in the above case and it doesn't matter. We
"normalise" SCRIPT_NAME, since it's faked with mod_python in any case.
But we do have to be consistent. PATH_INFO does not include any
information from the portion of the URL used to determine which script
handles the request. We cannot change the meaning of these variables,
even though they are a little inconsistently applied.

> So I think the solution to this problem is proper documentation. What do
> you think?
>
> > The "{% url ... %}" template tag and the reverse() function. :-(
> >
> > Both of these need to be aware of SCRIPT_NAME (or the equivalent) so
> > that they can put the right prefix onto the URLs they construct. Since
> > template rendering is independent of the current request, this is really
> > hard to work around.
>
> I think I can come up with something that looks like a compromise of all
> your solutions :-)
>
> Basically we can use an urlresolver itself to keep a SCRIPT_NAME. It can
> be set by HTTTP handler on first request or on every request. It doesn't
> matter much because we will document that a single Django project can
> have only one site prefix. I think it's a reasonable "restriction"
> because it's how it is in most cases.

This sounds equivalent to putting it in the current execution thread's
storage, except it has problems: you are assuming it will be constant
across all requests. If we're going to assume that, it will be an extra
restriction on what happens now (it's not true in the multi-Location
case I gave).

I'm not against this idea (requiring the use of a different settings
file for each Location), but that's a decision to make deliberately.

Note also that you solution is a little more complex than it sounds on
the surface, since everything still has to work outside of the
request-response path, just using what is in the settings file. That's
why I mentioned originally that if nothing is set for SCRIPT_NAME (the
URL prefix bit) in the thread's environment, we would assume it to be
empty: that handles the case, for example, of calling reverse() at the
command prompt or somewhere else. You'd have to do the same in your
case, it appears, and then it's basically the same solution, just hiding
the data in a different place.

Regards,
Malcolm


Ivan Sagalaev

unread,
Jul 6, 2008, 9:24:09 AM7/6/08
to django-d...@googlegroups.com
Malcolm Tredinnick wrote:
> No, SCRIPT_NAME would be more or less /site_prefix/ in the first case
> and /admin/ in the second case.

Ok, got your point. I now understand why you wanted SITE_PREFIX as a
parameter for patterns().

> But we do have to be consistent. PATH_INFO does not include any
> information from the portion of the URL used to determine which script
> handles the request.

Ideologically I agree. However in practice it's just more convenient in
this case to shove all these things under PATH_INFO. What are
disadvantages for this?

> This sounds equivalent to putting it in the current execution thread's
> storage, except it has problems: you are assuming it will be constant
> across all requests.

I just looked into core/handers/base.py and saw that an instance of
RegexURLResolver is created for each request. So it actually hasn't be
constant across all of them. Before I was under impression that resolver
is created only once.

> Note also that you solution is a little more complex than it sounds on
> the surface, since everything still has to work outside of the
> request-response path, just using what is in the settings file.

This is a trade-of then. If we rely on a setting then a user should keep
his settings in sync with web-server config. If we rely on a web-server
alone, reverse can't work outside of request environment.

I think we can just add a new "script_name" parameter for reverse and
resolve. It will be filled by default from request environment
(whichever object you choose to serve as one) and it can be set in shell
explicitly.

Yuri Baburov

unread,
Jul 6, 2008, 9:51:10 AM7/6/08
to django-d...@googlegroups.com
Hi Malcolm,

Solution #4:
make this choice an optional second argument of include(), but not patterns().
patterns are overloaded, and this approach combines advantages of
solution #1 and solution #2.

Solution #5
add map: root path -> urlconf

--
Best regards, Yuri V. Baburov, ICQ# 99934676, Skype: yuri.baburov,
MSN: bu...@live.com

Graham Dumpleton

unread,
Jul 6, 2008, 9:59:39 PM7/6/08
to Django developers


On Jul 6, 10:21 pm, Ivan Sagalaev <man...@softwaremaniacs.org> wrote:
> Malcolm Tredinnick wrote:
> >         <Location /admin/>
> >            PythonHandler django.core.handlers.modpython
> >            SetEnv DJANGO_SETTINGS_MODULE mysite.settings
> >            ...
> >         </Location>
>
> >         <Location /site_prefix/>
> >            PythonHandler django.core.handlers.modpython
> >            SetEnv DJANGO_SETTINGS_MODULE mysite.settings
> >            ...
> >         </Location>
>
> I believe setting up Apache like this effectively means that SCRIPT_NAME
> is empty and one wants to handle all those urls as PATH_INFO. I.e.
> "/site_prefix/" here is not exactly some out-of-bound prefix because
> "/admin/" would be "/site_prefix/admin/" otherwise.
>
> So I think the solution to this problem is proper documentation. What do
> you think?

SCRIPT_NAME comes out to be odd things at time when using mod_python
and not something that one could not really rely upon.

The reason SCRIPT_NAME doesn't work in mod_python for Location
directive, is that SCRIPT_NAME derives from file based resource
matching. Thus, if using Location directive there is no file based
resource to match and so it cannot work out what leading part of URL
is SCRIPT_NAME.

If using mod_python 3.3, then from memory, in simple case one can
access:

req.hlist.location

and it should give you the Location directive mount point, which
should logically be the same as SCRIPT_NAME.

This may at least allow you to automate the mod_python 3.3 case,
whereas for older versions of mod_python you may have to do what all
WSGI adapters for mod_python do, which is require user to specify
SCRIPT_NAME using SetEnv.

Graham

Malcolm Tredinnick

unread,
Jul 6, 2008, 10:09:42 PM7/6/08
to django-d...@googlegroups.com

On Sun, 2008-07-06 at 18:59 -0700, Graham Dumpleton wrote:
[...]

> If using mod_python 3.3, then from memory, in simple case one can
> access:
>
> req.hlist.location
>
> and it should give you the Location directive mount point, which
> should logically be the same as SCRIPT_NAME.
>
> This may at least allow you to automate the mod_python 3.3 case,
> whereas for older versions of mod_python you may have to do what all
> WSGI adapters for mod_python do, which is require user to specify
> SCRIPT_NAME using SetEnv.

Thanks, Graham. I knew about this for 3.3 and found some other
references you and people had come up with on the modpython list for
earlier versions, too.

On an implementation level, I've gone with the SetEnv solution for
mod_python, after having read a bunch of the modpython list posts about
this and the source for various modpython releases. I've been carefully
avoiding basing this thread on too many specifics at that level there
are these necessary technical difficulties that each implementation has
to overcome and they're not really relevant to the general solution.
I've managed to normalise everything fairly sensibly so that we have
"stuff that is used to determine that Django should be invoked" (which
we'll call SCRIPT_NAME) and "stuff that Django is passed to act
upon" (a.k.a. PATH_INFO) in each case and they are pretty uniform across
the board (and match the RFC definitions as much as practical).

Having to rummage around in different parts of req for mod_python 3.3
and somewhere else for earlier versions looked like a big hassle, so I
followed what everybody else is doing (and what Django apparently did in
the pre-historic cms days).

Regards,
Malcolm

Reply all
Reply to author
Forward
0 new messages