Expanding the flexibility of URL resolution

10 views

Skip to first unread message

Marty Alchin

unread,

Oct 9, 2007, 2:26:25 PM10/9/07

to django-d...@googlegroups.com

Apologies in advance: This is a long email, and I may tend to ramble.
Try to bear with me, please.

In testing out a framework for creating Netvibes widgets (just to see
if I could), I had what I thought to be a great idea for resolving
URLs to methods on a Widget instance. Essentially, each view a widget
provides would be just a method on the object, which was declared
using a decorator to supply a regex. This exact method isn't the
purpose of this email, but it helps to have some history.

import widgets

class ExampleWidget(widgets.Widget):
template = 'widgets/example.html'

pref = widgets.TextPreference()

@widgets.urlpattern('^(?P<number>\d+)/$')
def test(request, number):
# Do something interesting here and return a response

Then I'd like the urls.py entry to look like this:

(r'^widget/', ExampleWidget()),

and have /widget/13/ be routed to ExampleWidget().test(request, number=13).

Unfortunately, due to the way RegexURLResolver works, this could never
happen, since there's not a real, importable module that contains the
urlpatterns. Now, I could get around this by writing a custom object
with a specific key in sys.modules, then passing that key off to
RegexURLResolver, and it would probably work. I hope nobody wants me
to do that.

So, I considered alternatives and I came up with a fairly simple way
to go about it that could solve a few other issues that have been
raised in the past, and some that exist in a present/future alternate
dimension (the newforms-admin branch), but it's still a bit hackish,
and I wanted some input before I go too far with it.

Right now, in django.conf.urls.defaults, include() simply returns a
list containing a single item, the module name passed to it. This
seemed odd to me, until I noticed that patterns() checks to see if one
of the tuple items is a list, and if it is, pass it off to
RegexURLResolver. This in and of itself seemed quite hackish to me,
and thankfully my proposal addresses this as well.

Basically, I'd like to cut out the middle man -- the list -- and
simply have include() return a RegexURLResolver instance. patterns()
would then be able to use isinstance(t[1], RegexURLResolver) to make
its choice, which seems much cleaner to me. But the main advantage to
this is that patterns() would be able to accept anything that
subclasses RegexURLResolver, and process it accordingly.

The default RegexURLResolver would be used when include() is called,
but if my base Widget class subclassed RegexURLResolver, I could pass
an instance and all would work well.

I also did a bit of cursory research on the issue before writing this.
I've noticed requests for more flexible URL resolution, most notably
for dealing with subdomains. The technique I describe would allow a
snippet, say SubDomainURLResolver, to subclass RegexURLResolver and
lookup the destination view from urlpatterns in subdomain-specific
files.

Also, there was some recent talk about the desire to relegate a URL to
different views based on the HTTP method used. This approach could
simplify that process, and may also help with the REST API project,
though I admit I have no research on that to back me up.

And yes, I know there is a problem with those scenarios, I'll get back
to that in a minute.

The newforms-admin connection comes in when I remember back to some
discussion on what ended up being #4516. That ticket proposes a
separate method for handling URL resolution, but with this new
approach, ModelAdmin and AdminSite could become RegexURLResolver
subclasses, allowing a more flexible approach, while also getting rid
of that if/elif/else block, and simultaneously removing the need to
specify (.*)$ in the admin's urlpattern, bringing it more in line with
how the existing admin's urlpattern works. Those last two are more
cosmetic, but they seem, to me at least, to be benefits.

Now for the known problems.

Technically this is backwards-incompatible for one specific case. If
anyone out there is specifying their included urlconfs as a
single-item list, instead of using include(), their code would break.
I expect this is a low number, and should probably be discouraged
anyway, but it's a definite possibility. The only way around it would
be to keep the existing type check, and just add an elif to handle the
RegexURLResolver subclasses. I'm not a fan of doing this, but it's a
simple way to maintain backwards compatibility, so there you have it.

Currently, RegexURLResolver is instantiated with the supplied regex as
one of its arguments. Since this approach would have to allow
instantiating it without the regex, it would have to be applied after
the fact, during processing in patterns(). For my test code, I just
set the attribute manually (both there and in the base handler), but
there might be a better way to do it.

Two of my benefit scenarios would rely on the resolver having access
to more information than it currently gets. In order to relegate based
on subdomain or HTTP method, the resolve() method would need to
receive the HttpRequest object, not just the path. This should be a
fairly easy change, but probably isn't necessary for most uses. It'd
just be a nice-to-have to make this even more flexible.

I haven't yet considered how reverse lookups would work under this
approach. This is my first delve into URL resolution, so I'm still
fairly naive on exactly how that works. I definitely plan to do more
research on it and see if there are any shortcomings on that front.

I have some basic working code at the moment (it only takes about a
dozen lines total), but it's littered with debuggings and abandoned
code, so I just wanted to get some feedback before I clean it up and
open a ticket. Sorry for the length of this email, and I appreciate
any feedback.

-Gul

Malcolm Tredinnick

unread,

Oct 9, 2007, 3:12:35 PM10/9/07

to django-d...@googlegroups.com

Hey Marty,

I can cheat a little bit here because I've thought about this a lot in
the past. So this is slightly more than just a shot from the hip,
despite the quick response time. This is all very hypothetical, because
I've only ever done thought experiments here.

You'll want to get feedback from Adrian, too, because I know he's done
some thinking in this area but I'll let him explain his ideas.

On Tue, 2007-10-09 at 14:26 -0400, Marty Alchin wrote:
[...]

> I also did a bit of cursory research on the issue before writing this.
> I've noticed requests for more flexible URL resolution, most notably
> for dealing with subdomains. The technique I describe would allow a
> snippet, say SubDomainURLResolver, to subclass RegexURLResolver and
> lookup the destination view from urlpatterns in subdomain-specific
> files.

Most of the variations people want to do that aren't of the "I want a
pony" variety have made me think that RegexURLResolver probably isn't
the base class of the refactoring. Instead, you pull out the pieces that
aren't reg-exp specific into a base class and that is the thing that is
subclasses. I'm not sure what the interface of this base class would
look like at the moment, but that's been my intuitive feeling about what
might have to change here.

This is sort of coupled to the resolution process. We have a string
presented to us (the URL) and we want to return a view and some
parameters -- positional and keyword arguments -- to pass to it. It's
that simple.

At the moment, the resolution process uses RegexURLResolver which is
tightly tied to RegexURLPattern. In idle moments I'd been thinking if we
could twist this around so that each item in the patterns() result has a
resolve() method to which we pass the string. We call them in order,
from first to last, and the stop the first time we get back a non-None
result -- in which case it will be the view and the calling parameters.

So the refactoring is probably closer to working out the sort of
interface a RegexURLPattern class should present (resolve() and
reverse() methods, pretty much) and then making the main loop call each
one in turn.

> Also, there was some recent talk about the desire to relegate a URL to
> different views based on the HTTP method used. This approach could
> simplify that process, and may also help with the REST API project,
> though I admit I have no research on that to back me up.

This always feels like helping people create an unmaintainable setup --
and to be discouraged for that reason -- but it's a two-sided thing.
Look at what is expected in WSGI environments and take guidance from
there. My thinking is along the lines of "if I use CherryPy's server
instead of Apache -- so I'm in a full WSGI setup -- how can I achieve
the same type of control? Is it even reasonable to think I should be
able to?"

Of course, there are going to be people who feel that everything should
be possible, but it already possible to do everything Apache does -- by
using Apache! So what's a more balanced separation and what layers does
WSGI expect would handle these? We shouldn't be duplicating features
that already exist in a standards-compliant implementation of upstream
server software.

> The newforms-admin connection comes in when I remember back to some
> discussion on what ended up being #4516. That ticket proposes a
> separate method for handling URL resolution, but with this new
> approach, ModelAdmin and AdminSite could become RegexURLResolver
> subclasses, allowing a more flexible approach, while also getting rid
> of that if/elif/else block, and simultaneously removing the need to
> specify (.*)$ in the admin's urlpattern, bringing it more in line with
> how the existing admin's urlpattern works. Those last two are more
> cosmetic, but they seem, to me at least, to be benefits.

This is the sort of case that made me think the current class isn't the
true point of departure for generalisation.

> Now for the known problems.
>
> Technically this is backwards-incompatible for one specific case. If
> anyone out there is specifying their included urlconfs as a
> single-item list, instead of using include(), their code would break.

They'll learn to live with the disappointment and use include() instead.
I wouldn't worry too much about this case if I were you.

[...]

> I haven't yet considered how reverse lookups would work under this
> approach. This is my first delve into URL resolution, so I'm still
> fairly naive on exactly how that works. I definitely plan to do more
> research on it and see if there are any shortcomings on that front.

That's one of the interfaces that any class has to supply: a reverse()
method, given basically the same information that it is given now. Then
the subclass writer needs to work out how to generate the appropriate
URL.

Look at a bit closer at the separation between reg-exp specific stuff
and the more general interface here. My gut feeling is that it an be
teased apart at a slightly different place to remove the need, for
example, to have a RegexURLResolver without a reg-exp (which is
unfortunate design). At the moment, it feels like you're trying to force
flexibility into RegexURLResolver that twists it beyond its original
purpose and changes it to an entirely different thing. So don't look at
subclassing, look at calling it a different thing and possibly changing
the processing loop to move resolving responsibility more into the
individual line items, as mentioned above. Then the subclassing happens
at the line items -- and each line items can be a different sort of
resolver.

Regards,
Malcolm

Marty Alchin

unread,

Oct 9, 2007, 3:46:14 PM10/9/07

to django-d...@googlegroups.com

On 10/9/07, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
> I can cheat a little bit here because I've thought about this a lot in
> the past. So this is slightly more than just a shot from the hip,
> despite the quick response time. This is all very hypothetical, because
> I've only ever done thought experiments here.

Well, i definitely appreciate the quick response! I figured somebody
had thought about this in the past, which is precisely why I brought
it up prior to submitting any code.

> You'll want to get feedback from Adrian, too, because I know he's done
> some thinking in this area but I'll let him explain his ideas.

The more, the merrier!

> Most of the variations people want to do that aren't of the "I want a
> pony" variety have made me think that RegexURLResolver probably isn't
> the base class of the refactoring. Instead, you pull out the pieces that
> aren't reg-exp specific into a base class and that is the thing that is
> subclasses. I'm not sure what the interface of this base class would
> look like at the moment, but that's been my intuitive feeling about what
> might have to change here.

I had wondered about that myself, but I thought it'd be better to
start with a minimally invasive approach and work from there. I'd
rather see a more comprehensive approach, I just didn't want to
overstep on the first attempt.

> This is sort of coupled to the resolution process. We have a string
> presented to us (the URL) and we want to return a view and some
> parameters -- positional and keyword arguments -- to pass to it. It's
> that simple.

I'm glad to hear it stated that way, because I like that idea.

> At the moment, the resolution process uses RegexURLResolver which is
> tightly tied to RegexURLPattern. In idle moments I'd been thinking if we
> could twist this around so that each item in the patterns() result has a
> resolve() method to which we pass the string. We call them in order,
> from first to last, and the stop the first time we get back a non-None
> result -- in which case it will be the view and the calling parameters.
>
> So the refactoring is probably closer to working out the sort of
> interface a RegexURLPattern class should present (resolve() and
> reverse() methods, pretty much) and then making the main loop call each
> one in turn.

Well, even my minor modifications do exactly that, at least for full
resolvers. It doesn't handle simple patterns any differently than the
current system, so I think I see what you're getting at here. I'm
understanding correctly, an inheritance scheme might look something
like this:

URLResolver (either an actual class, or just a protocol to be duck-typed)
- RegexPatternResolver (or whatever), for matching a single regex to
a single view
- RegexModuleResolver (or whatever), for matching a set of patterns
in a module, after matching a specified regex prefix
- WidgetResolver, for matching class-declared view methods

So that they're essentially all peers as far as the main URL
dispatching is concerned. Some might subclass others, but that would
be solely for sharing code, and wouldn't really have any functional
impact on the system.

Personally, I would vote for having a concrete URLResolver class to
work from, so that the dispatcher can check to see if it's an instance
of that and handle it properly. If it's not, it would be able to
handle it like it currently does, dropping the appropriate classes in
based on the strings provided. This would be solely for backwards
compatibility, of course, but I'd hate to make everybody change their
entire URL configuration just for this.

The one question I would add on this topic, then, is whether something
like RegexURLResolver would still be tied to RegexURLPattern. Could we
have it so that, after loading the specified module, it would loop
through those, again being agnostic in regards to which type of
resolver is used? I would assume this would not only be possible, but
the optimal case, but I'm no expert here.

> > Also, there was some recent talk about the desire to relegate a URL to
> > different views based on the HTTP method used. This approach could
> > simplify that process, and may also help with the REST API project,
> > though I admit I have no research on that to back me up.
>
> This always feels like helping people create an unmaintainable setup --
> and to be discouraged for that reason -- but it's a two-sided thing.
> Look at what is expected in WSGI environments and take guidance from
> there. My thinking is along the lines of "if I use CherryPy's server
> instead of Apache -- so I'm in a full WSGI setup -- how can I achieve
> the same type of control? Is it even reasonable to think I should be
> able to?"
>
> Of course, there are going to be people who feel that everything should
> be possible, but it already possible to do everything Apache does -- by
> using Apache! So what's a more balanced separation and what layers does
> WSGI expect would handle these? We shouldn't be duplicating features
> that already exist in a standards-compliant implementation of upstream
> server software.

I'll admit, I'm not a big fan of anything that could be gained by
passing the request to the resolver. It wouldn't add much complexity
for the common case, but I agree that it would mostly allow people to
unnecessarily complicate things. It's not that difficult to set up
VirtualHosts specifying different settings file, each inheriting from
a common settings file, while relegating to a different URLConf.
Likewise, it's not that difficult to use mod_rewrite to add the HTTP
method to the beginning (or end) of the URL, at which point they can
use the existing stuff to dispatch in that manner. I see little reason
to make that even easier.

> > The newforms-admin connection comes in when I remember back to some
> > discussion on what ended up being #4516. That ticket proposes a
> > separate method for handling URL resolution, but with this new
> > approach, ModelAdmin and AdminSite could become RegexURLResolver
> > subclasses, allowing a more flexible approach, while also getting rid
> > of that if/elif/else block, and simultaneously removing the need to
> > specify (.*)$ in the admin's urlpattern, bringing it more in line with
> > how the existing admin's urlpattern works. Those last two are more
> > cosmetic, but they seem, to me at least, to be benefits.
>
> This is the sort of case that made me think the current class isn't the
> true point of departure for generalisation.

Yeah, I agree.

> > Technically this is backwards-incompatible for one specific case. If
> > anyone out there is specifying their included urlconfs as a
> > single-item list, instead of using include(), their code would break.
>
> They'll learn to live with the disappointment and use include() instead.
> I wouldn't worry too much about this case if I were you.

Thank you for that! I really didn't want to worry about it, but I had
to at least mention it.

> > I haven't yet considered how reverse lookups would work under this
> > approach. This is my first delve into URL resolution, so I'm still
> > fairly naive on exactly how that works. I definitely plan to do more
> > research on it and see if there are any shortcomings on that front.
>
> That's one of the interfaces that any class has to supply: a reverse()
> method, given basically the same information that it is given now. Then
> the subclass writer needs to work out how to generate the appropriate
> URL.

I figured it would be along those lines. I just hadn't done enough
looking to be sure.

> Look at a bit closer at the separation between reg-exp specific stuff
> and the more general interface here. My gut feeling is that it an be
> teased apart at a slightly different place to remove the need, for
> example, to have a RegexURLResolver without a reg-exp (which is
> unfortunate design). At the moment, it feels like you're trying to force
> flexibility into RegexURLResolver that twists it beyond its original
> purpose and changes it to an entirely different thing. So don't look at
> subclassing, look at calling it a different thing and possibly changing
> the processing loop to move resolving responsibility more into the
> individual line items, as mentioned above. Then the subclassing happens
> at the line items -- and each line items can be a different sort of
> resolver.

I'll definitely do so. For the record, my need would still have
included regex, so it wouldn't be quite so unfortunately, but it would
definitely have opened up the ability to have a mismatch there.

I have a few other things on my plate still, and this spawned from a
test project that I don't really care much about, but I'll try to keep
it under somewhat active development. Anybody else who's interested in
this, feel free to work on it without me. I'd hate to hold it up
because people think I'm the point man.

-Gul

Marty Alchin

unread,

Oct 9, 2007, 3:53:30 PM10/9/07

to django-d...@googlegroups.com

On 10/9/07, Marty Alchin <gulo...@gamemusic.org> wrote:
> I'll admit, I'm not a big fan of anything that could be gained by
> passing the request to the resolver.

I forgot! After [4237] went in, people can specify a custom resolver
based on the request anyway. So once resolvers are a little more
generic, the people who want it shouldn't have any problem writing a
resolver/middleware combination approach. It wouldn't have any impact
on any refactoring we do, since we wouldn't have to do any special
handling for it. We don't have to support it, yet they still get it.
Everybody wins.