Considering removing support for ("iLmsu") regex groups in URLpatterns. Do you use them?

346 views
Skip to first unread message

Tim Graham

unread,
Dec 16, 2016, 2:19:52 PM12/16/16
to Django developers (Contributions to Django itself)
Python deprecated usage of flags not at the start of a regular expression [0], e.g. 'CaseInsensitive(?i)' instead of '(?i)CaseInsensitive'.

Deprecation warnings shows up in a few URL tests that are using (?i) to get case-insensitive matching of URLpatterns. However, because the URL resolver prefixes '^/' [or get_script_prefix()] to all patterns [1], the warning happens even if the regex group is at the start of a urlpattern, e.g.

/home/tim/code/django/django/urls/resolvers.py:464: DeprecationWarning: Flags not at the start of the expression ^\/(?i)CaseInsensiti (truncated)
  if re.search('^%s%s' % (re.escape(_prefix), pattern), candidate_pat % candidate_subs, re.UNICODE):

A better sense of what's affected can be seen on my PR that removes support for the ignored groups [2]. All this landed in 2008 in Malcolm's rewrite of URL parsing [3].

I'm not sure if any of these groups are used in URLpatterns in the wild or if it's okay to proceed with the removal. To keep the feature, I imagine Django would need to do some extraction of flags from URLpatterns and put them at the start of patterns, but I'm not too sure.

Thanks for your feedback.

[0] http://bugs.python.org/issue22493
[1] https://github.com/django/django/blob/5d28fef8f9329e440ee67cefc900dbf89f4c524c/django/urls/resolvers.py#L464
[2] https://github.com/django/django/pull/7701
[3] https://github.com/django/django/commit/a63a83e5d88cd1696d1c40e89f254f69116c6800

Adam Johnson

unread,
Dec 18, 2016, 4:18:33 PM12/18/16
to django-d...@googlegroups.com
Since they were used in several places in Django's test suite I feel like it's highly likely they're out there in use in the wild.

Also if Django were to remove it, it would both 1) be a bit surprising compared to the re module, as it's an implementation detail that the urlparser  prefixes '^/' and 2) need a deprecation path that would require code to work in Python 3.6 that's equivalent to implementing the feature, so why not keep said code around?

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/a1489dba-c602-4df4-87c4-3edf2ada702e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Adam

Tim Graham

unread,
Dec 18, 2016, 6:44:32 PM12/18/16
to Django developers (Contributions to Django itself)
Only case insensitive matching was tested in the URL tests and none of this is documented. That's the only flag where I see a straightforward use case (but I avoid regexes of any complexity and didn't even know what the flags were for until I just looked them up), even if case-insensitive URLs aren't a good practice. From the W3C [0]: "URLs in general are case-sensitive (with the exception of machine names). There may be URLs, or parts of URLs, where case doesn't matter, but identifying these may not be easy. Users should always consider that URLs are case-sensitive."). I guess the "safest" option is to keep the code around and let this feature die when the deprecation ends in Python 3.7 (and meanwhile see if anyone notices the deprecation warning in their code and files a ticket about it). The only extra work compared to removing this now is silencing the Python deprecation warnings in the Django tests in the meantime.

[0] https://www.w3.org/TR/WD-html40-970708/htmlweb.html

Here are the other flag values, feel free to offer a use case if you have one:

re.I -- match case insensitive

re.L -- The use of this flag is discouraged as the locale mechanism is very unreliable, and it only handles one “culture” at a time anyway; you should use Unicode matching instead, which is the default in Python 3 for Unicode (str) patterns. This flag makes sense only with bytes patterns.

re.M -- MULTILINE -- When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '$' only at the end of the string and immediately before the newline (if any) at the end of the string.

re.S -- DOTALL - Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

re.U -- Unicode -- these are redundant in Python 3 since matches are Unicode by default for strings


On Sunday, December 18, 2016 at 4:18:33 PM UTC-5, Adam Johnson wrote:
Since they were used in several places in Django's test suite I feel like it's highly likely they're out there in use in the wild.

Also if Django were to remove it, it would both 1) be a bit surprising compared to the re module, as it's an implementation detail that the urlparser  prefixes '^/' and 2) need a deprecation path that would require code to work in Python 3.6 that's equivalent to implementing the feature, so why not keep said code around?
On 16 December 2016 at 19:19, Tim Graham <timog...@gmail.com> wrote:
Python deprecated usage of flags not at the start of a regular expression [0], e.g. 'CaseInsensitive(?i)' instead of '(?i)CaseInsensitive'.

Deprecation warnings shows up in a few URL tests that are using (?i) to get case-insensitive matching of URLpatterns. However, because the URL resolver prefixes '^/' [or get_script_prefix()] to all patterns [1], the warning happens even if the regex group is at the start of a urlpattern, e.g.

/home/tim/code/django/django/urls/resolvers.py:464: DeprecationWarning: Flags not at the start of the expression ^\/(?i)CaseInsensiti (truncated)
  if re.search('^%s%s' % (re.escape(_prefix), pattern), candidate_pat % candidate_subs, re.UNICODE):

A better sense of what's affected can be seen on my PR that removes support for the ignored groups [2]. All this landed in 2008 in Malcolm's rewrite of URL parsing [3].

I'm not sure if any of these groups are used in URLpatterns in the wild or if it's okay to proceed with the removal. To keep the feature, I imagine Django would need to do some extraction of flags from URLpatterns and put them at the start of patterns, but I'm not too sure.

Thanks for your feedback.

[0] http://bugs.python.org/issue22493
[1] https://github.com/django/django/blob/5d28fef8f9329e440ee67cefc900dbf89f4c524c/django/urls/resolvers.py#L464
[2] https://github.com/django/django/pull/7701
[3] https://github.com/django/django/commit/a63a83e5d88cd1696d1c40e89f254f69116c6800

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.



--
Adam

Adam Johnson

unread,
Dec 19, 2016, 3:23:38 AM12/19/16
to django-d...@googlegroups.com
 I guess the "safest" option is to keep the code around and let this feature die when the deprecation ends in Python 3.7 (and meanwhile see if anyone notices the deprecation warning in their code and files a ticket about it). The only extra work compared to removing this now is silencing the Python deprecation warnings in the Django tests in the meantime.

Sounds fair. Flags could always be added to url() as an extra parameter.

To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Adam

Sjoerd Job Postmus

unread,
Dec 20, 2016, 2:16:23 AM12/20/16
to django-d...@googlegroups.com
On Mon, Dec 19, 2016 at 08:23:09AM +0000, Adam Johnson wrote:
> >
> > I guess the "safest" option is to keep the code around and let this
> > feature die when the deprecation ends in Python 3.7 (and meanwhile see if
> > anyone notices the deprecation warning in their code and files a ticket
> > about it). The only extra work compared to removing this now is silencing
> > the Python deprecation warnings in the Django tests in the meantime.
>
>
> Sounds fair. Flags could always be added to *url()* as an extra parameter.

How would that work with nested URL patterns? Would adding a flag also
apply that for the "parent" and/or "child" patterns? Would that also
work correctly for reverse?

Asking because I seriously don't know.

Adam Johnson

unread,
Dec 20, 2016, 5:21:46 AM12/20/16
to django-d...@googlegroups.com
I think the current implementation means they affect all included children.

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

For more options, visit https://groups.google.com/d/optout.



--
Adam

Shai Berger

unread,
Dec 20, 2016, 11:50:39 AM12/20/16
to django-d...@googlegroups.com
I think part of Sjoerd's point was that current implementation also means that
including the flag in a child affects parents -- but only with regard to said
child. So, if you have

url('a/', include("b"))

and in b:

url('b/$', blah),
url('c/$', bleh, flags=re.I),

then the valid urls include

a/c/
A/c/
a/C/
A/C/
a/b/

but not

A/b/

which is a bit odd.

On Tuesday 20 December 2016 12:21:18 Adam Johnson wrote:
> I think the current implementation means they affect all included children.
>
> On 20 December 2016 at 07:15, Sjoerd Job Postmus <sjoe...@sjec.nl> wrote:
> > On Mon, Dec 19, 2016 at 08:23:09AM +0000, Adam Johnson wrote:
> > > > I guess the "safest" option is to keep the code around and let this
> > > >
> > > > feature die when the deprecation ends in Python 3.7 (and meanwhile
> > > > see
> >
> > if
> >
> > > > anyone notices the deprecation warning in their code and files a
> > > > ticket about it). The only extra work compared to removing this now
> > > > is
> >
> > silencing
> >
> > > > the Python deprecation warnings in the Django tests in the meantime.
> > >
> > > Sounds fair. Flags could always be added to *url()* as an extra
> >
> > parameter.
> >
> > How would that work with nested URL patterns? Would adding a flag also
> > apply that for the "parent" and/or "child" patterns? Would that also
> > work correctly for reverse?
> >
> > Asking because I seriously don't know.
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Django developers (Contributions to Django itself)" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to django-develop...@googlegroups.com.
> > To post to this group, send email to django-d...@googlegroups.com.

Marten Kenbeek

unread,
Dec 20, 2016, 1:01:44 PM12/20/16
to Django developers (Contributions to Django itself)
This issue is actually limited to reverse(). When resolving urls, each nested regex is matched against the url separately, so you can just apply the flags to the "local" regex pattern, and get:

a/c/
a/C/
a/b/

matching, but not:

A/c/
A/C/
A/b/

The behaviour for reverse() can be a problem. For example, consider the following patterns:

url(r'([A-Z]+)/', include('b'))

and in b:

url(r'([A-Z]+)/$', blah, name='blah', flags=re.I)

Since the flag can only apply to the combined pattern r'([A-Z]+)/([A-Z]+)/$', we can either incorrectly reject reverse('blah', args=('A', 'b')), or we can return an invalid url for reverse('blah', args=('a', 'b'))

This seems to be a problem with the current implementation as well, which would return an invalid url for reverse('blah', args=('a', 'b')), since (?i) applies to the whole pattern regardless of its position. 

re.I is probably the only flag we need to care about. re.U is always used, use of re.L is discouraged, and re.M/re.S only apply to multi-line strings. 

Tim Graham

unread,
Dec 21, 2016, 10:21:04 AM12/21/16
to Django developers (Contributions to Django itself)
I learned a couple more things:
- The end of the Python deprecation is TBA (not in 3.7 as I stated before), perhaps not until Python 2.7 is unsupported (4 more years).
- Using (?i) in urlpatterns is promoted at http://stackoverflow.com/questions/1515634/case-insensitive-urls-for-django

If case-insensitive URL matching is going to before a more formal feature, maybe it makes sense to either 1) have a global "setting" that makes all patterns case-insensitive or 2) restrict the case-insensitive designation to urlpatterns in the ROOT_URLCONF (to avoid the ambiguous nesting problem that Marten mentioned).

Alternatively, we could decide that it's not Django's job to provide this (not sure if other frameworks do -- personally I don't see case-insensitive URLs as a best practice) and instead suggest configuring your front-end webserver (e.g. Apache mod_rewrite) to handle it such as by converting URLs to lowercase before passing them to Django. A Django middleware might also be able to do this, idea from [2].

related links:
https://groups.google.com/d/topic/django-users/X0MkPp-_R-Q/discussion "I would like to make urls of my site case-insensitive."
http://stackoverflow.com/questions/14814419/how-do-i-make-urls-case-insensitive-in-linux-server - "How do I make URLs case insensitive in Linux server"
[2] https://groups.google.com/d/topic/django-developers/UehV6WZhJTM/discussion - Case-insensitive URLS

Tim Graham

unread,
Dec 23, 2016, 10:22:40 AM12/23/16
to Django developers (Contributions to Django itself)
I found a flask thread [0] where Armin Ronacher said this about case-insensitive URLs:

"I think it's a horrible idea. It destroys your caching and creates multiple entries for search engines."

Adam Oakman added: "Agreed. My personal method is to code the 404 handler to look for uppercase characters and redirect to a lowercase equivalent to catch such cases.

I propose to deprecate usage of these regex groups and offer no replacement besides this advice.

[0] http://librelist.com/browser/flask/2011/6/24/case-insensitive-routing/#ea14dfeac18a440b8a792358691f15e0

Tim Graham

unread,
Dec 27, 2016, 1:19:54 PM12/27/16
to Django developers (Contributions to Django itself)
Reply all
Reply to author
Forward
0 new messages