Review of DEP 201 - simplified routing syntax

367 views
Skip to first unread message

Aymeric Augustin

unread,
May 12, 2017, 6:33:28 AM5/12/17
to django-d...@googlegroups.com
Hello,

I reviewed the current version of DEP 201 as well as related discussions. I took notes and wrote down arguments along the way. I'm sharing them below. It may be useful to add some of these arguments to the DEP.

Sjoerd, Tom, I didn't want to edit your DEP directly, but if you agree with the items marked [Update DEP] below I can prepare a PR. I will now take a look at the pull requests implementing this DEP.


Should it live as a third-party package first?

The original motivation for this DEP was to make Django easier to use by people who aren't familiar with regexes.

While regexes are a powerful tool, notably for shell scripting, I find it counter-productive to make them a prerequisite for building a Django website. You can build a very nice and useful website with models, forms, templates, and the admin, without ever needing regexes — except, until now, for the URLconf!

Since we aren't going to say in the official tutorial "hey install this third-party package to manage your URLs", that goal can only be met by building the new system into Django.

Besides, I suspect many professional Django developers copy-paste regexes without a deep understanding of how they work. For example, I'd be surprised if everyone knew why it's wrong to match a numerical id in a URL with \d+ (answer at the bottom of this email).

Not only is the new system easier for beginners, but I think it'll also be adopted by experienced developers to reduce the risk of mistakes in URLpatterns, which are an inevitable downside of their syntax. Django can solve problems like avoiding \d+ for everyone.

Anecdote: I keep making hard-to-detect errors in URL regexes. The only URL regexes I wrote that can't be replicated with the new system are very dubious and could easily be replaced with a more explicit `if some_condition(request.path): raise Http404` in the corresponding view. I will be happy to convert my projects to the new system.

No progress was made in this area since 2008 because URL regexes are a minor annoyance. After you write them, you never see them again. I think that explains why no popular alternative emerged until now.

Since there's a significant amount of prior art in other projects, a strong consensus on DEP 201 being a good approach, and a fairly narrow scope, it seems reasonable to design the new system directly into Django.


What alternatives would be possible?

I have some sympathy with the arguments for a pluggable URL resolver system, similar to what I did for templates in Django 1.8. However I don't see this happening any time soon because there's too little motivation to justify the effort. As I explained above, developers tend to live with whatever the framework provides.

Of course, if someone wants to write a fully pluggable URL resolver, that's great! But there's no momentum besides saying that "it should be done that way". Furthermore, adding the new system shouldn't make it more difficult to move to a fully pluggable system. If anything, it will clean up the internals and prepare further work in the area. Some changes of this kind were already committed.

DEP 201 is mostly independent from the problem of allowing multiple views to match the same URL — that is, to resume resolving URL patterns if a view applies some logic and decides it can't handle a URL. This is perhaps the biggest complaint about the current design of the URL resolver. Solutions include hacking around the current design or changing it fundamentally.

This proposal doesn't change anything to the possibility of autogenerating URL structures, like DRF's routers do.

I'm aware of one realistic proposals for a more elaborate and perhaps cleaner system: Marten Kenbeek's dispatcher API refactor at: https://github.com/knbk/django/tree/dispatcher_api. I can't say if the implementation of DEP 201 will break that effort. Anyway we can't wait forever on a branch that has no schedule for completion.

To sum up, I don't see any bigger improvement that is likely to be implemented in the short run and would justify delaying DEP 201.


Which types should be supported?

Integers, slugs, UUIDs and arbitrary strings will cover the vast majority of cases. I haven't seen any other examples in all discussions.

I don't think it's reasonable to include floats in URLs. Reversing a URL could lead to atrocities such as /foo/0.300000000000000004/bar/. This doesn't look good.

I don't think the use case is sufficiently common to add supports for Decimal, which wouldn't suffer from that issue.

So I would remove float and add slug. [Update DEP]

I'm proposing the following regexes:

path = .+
string = [^/]+
int = [0-9]+  # negative integers are uncommon, I think they should be left to a custom converter
slug = [-a-zA-Z0-9_]  # see django.core.validators.slug_re — we must stick to the current definition.
uuid = [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}  # hyphens are mandatory to avoid multiple URLs resolving to the same view


How to handle the transition new API?

The plan to move imports to django.urls to facilitate the transition is a nice hack. Who doesn't like a nice hack? ;-)

It would be nice to specify a target for starting the deprecation of django.conf.urls.url. Deprecate in 3.1, remove in 4.1? [Update DEP]

Bikeshedding: I'm somewhat disturbed by the choice of path_regex instead of regex_path, because the thing is primarily a path, which happens to be implemented with a regex. I would like to suggest re_path, which reuses the name of the re module in Python to make a shorter name. [Update DEP]


How to manage leading and trailing slashes?

The second discussion in the mailing list determined that keeping the current system was fine: all URLs are relative (not starting with a slash) and the root URLconf is implicitly relative to /.

This is inconsistent with Flask, where URLs start with a leading slash, but Flask doesn't have relative URLs or includes.


Other bugs in the DEP

Failure to perform a type conversion against a captured string should result in an Http404 exception being raised.

I believe this should say "result in the pattern not matching". [Update DEP]

I'm not a fan of the CUSTOM_URL_CONVERTERS setting. I think it would be sufficient to document a `register_converter` function that users can call in their URLconf module — so the converter is registered before the URLconf is loaded. [Update DEP]

I think the section about "Preventing unintended errors" should be removed. If it turns out to be a real problem, then we can reconsider and add a check. [Update DEP]


Best regards,

-- 
Aymeric.


Answer to "why it's wrong to match an id in a URL with \d+"

Django's URL resolver matches URLs as str (Unicode strings), not bytes, after percent and UTF-8 decoding. As a consequence \d matches any character in Unicode character category Nd, for example, ١ which is 1 in Arabic (unless you also specified the ASCII flag on the regex).

Interestingly Python's int() function will still do the right thing: int('١') == 1. But that means you have multiple URLs resolving to the same page, which can be bad for SEO. In general it isn't a good practice to have unexpected URLs resolving by accident to an unintended view.

For this reason, I'm always using the more verbose [0-9] instead of \d in my URLconfs.

The Django admin suffers from this bug :-) See it for yourself at /admin/sites/site/١/.


Aymeric Augustin

unread,
May 12, 2017, 8:20:07 AM5/12/17
to django-d...@googlegroups.com
After getting approval from Tom on IRC, I updated the DEP according to my email below: https://github.com/django/deps/pull/41

The next steps are:

- account for any remaining feedback (I feel I made only minor changes compared to the last version, hopefully we can wrap this up quickly now)
- get approval from the technical board
- complete the implementation!

-- 
Aymeric.


Florian Apolloner

unread,
May 12, 2017, 12:43:34 PM5/12/17
to Django developers (Contributions to Django itself)


On Friday, May 12, 2017 at 12:33:28 PM UTC+2, Aymeric Augustin wrote:
Django's URL resolver matches URLs as str (Unicode strings), not bytes, after percent and UTF-8 decoding. As a consequence \d matches any character in Unicode character category Nd, for example, ١ which is 1 in Arabic (unless you also specified the ASCII flag on the regex)

Ha, I was thinking that you might get somewhere along the lines of this. That only became an issue with python 3 right? Before regex defaulted to re.ASCII.
 

Marten Kenbeek

unread,
May 12, 2017, 1:05:45 PM5/12/17
to Django developers (Contributions to Django itself)
That's not quite right. Django has actually been using the `re.UNICODE` flag since at least 1.0, so you'd have the same problem on Python 2. 

Aymeric Augustin

unread,
May 12, 2017, 2:34:56 PM5/12/17
to django-d...@googlegroups.com
On 12 May 2017, at 19:05, Marten Kenbeek <marte...@gmail.com> wrote:

That's not quite right. Django has actually been using the `re.UNICODE` flag since at least 1.0, so you'd have the same problem on Python 2. 


Since it was hardly noticed in a decade, it can't be too much of a problem in practice...

-- 
Aymeric.

Marten Kenbeek

unread,
May 13, 2017, 7:35:37 AM5/13/17
to Django developers (Contributions to Django itself)
The regex in `url()` can be translated using `ugettext_lazy()`, in which case the lazy translation happens when `resolve()` or `reverse()` is called. It seems the current `path()` implementation would force evaluation when the URLs are first loaded, so `resolve()` and `reverse()` would use a fixed language (whichever was active when they were loaded) rather than allowing lazy translations.

I'm not sure how often this feature is used, but it seems like something `path()` should support out of the box. 

On Friday, May 12, 2017 at 12:33:28 PM UTC+2, Aymeric Augustin wrote:

Sjoerd Job Postmus

unread,
May 13, 2017, 8:32:00 AM5/13/17
to Django developers (Contributions to Django itself)


On Saturday, May 13, 2017 at 1:35:37 PM UTC+2, Marten Kenbeek wrote:
The regex in `url()` can be translated using `ugettext_lazy()`, in which case the lazy translation happens when `resolve()` or `reverse()` is called. It seems the current `path()` implementation would force evaluation when the URLs are first loaded, so `resolve()` and `reverse()` would use a fixed language (whichever was active when they were loaded) rather than allowing lazy translations.

I'm not sure how often this feature is used, but it seems like something `path()` should support out of the box. 

I agree. It's an oversight which I think is solvable. The `LocaleRegexProvider` class already handles the caching.

I will check later, but I think all that's needed is to decorate the `path` function with `keep_lazy_str`. Does that seem correct to you?

Sjoerd Job Postmus

unread,
May 15, 2017, 2:14:04 AM5/15/17
to Django developers (Contributions to Django itself)
Thinking about it some more: that is not the solution.

Also, there are probably a couple of corner cases to consider (still: the same as for normal regular expressions):

* What if the "keys" (named parameters) of a translated path differs from that of the original path?
* What if the converters differ?

For now I think the best way forward is to assume the above corner cases do not happen, or otherwise fall out-of-scope.

Aymeric Augustin

unread,
May 21, 2017, 3:56:13 AM5/21/17
to django-d...@googlegroups.com
Hello,


Sjoerd has taken the lead on the implementation, please get in touch if you'd like to help!

Thanks,

-- 
Aymeric.


Chris Foresman

unread,
May 23, 2017, 4:27:50 PM5/23/17
to Django developers (Contributions to Django itself)
Huzzah! Looking forward to the new syntax landing.
Reply all
Reply to author
Forward
0 new messages