For example, regexp like that
(https://docs.djangoproject.com/en/dev/topics/http/urls/#example):
{{{
url(r'^articles/(\d{4})/$', 'news.views.year_archive'),
}}}
will accept "articles/११११/". This is missleading for developer. It could
also be harmful. If developer will use string parsed like that in other
way than parsing to int(), it could lead to error or even security issue.
--
Ticket URL: <https://code.djangoproject.com/ticket/22378>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0
Comment:
Could you elaborate a bit on the problem and what change you think we
should make? When I try your example with something like `articles/१/`,
the value passed to the view is "1". If there are security implications
you should report them privately per
https://docs.djangoproject.com/en/1.6/internals/security/#reporting-
security-issues. Thanks.
--
Ticket URL: <https://code.djangoproject.com/ticket/22378#comment:1>
Comment (by aaugustin):
I think the suggestion is simply to use `[0-9]` which is unambiguous even
in an unicode world.
--
Ticket URL: <https://code.djangoproject.com/ticket/22378#comment:2>
Comment (by tomwys):
Yes, using `[0-9]` is fine. It would be great to change all `\d` to
`[0-9]` in documentation and tests related to urls.
ie. developer without knowledge about `\d` and `re.UNICODE` used in urls
can expect that following code ensures that digit is written to
`/tmp/otuput`.
{{{
from django.conf.urls import patterns, include, url
from django.contrib import admin
from django.http import HttpResponse
def view(request, arg):
open("/tmp/output", 'w').write(arg.encode('utf-8'))
return HttpResponse(arg)
urlpatterns = patterns('',
url(r'^(\d)/$', view),
)
}}}
This is not true for `localhost:8000/१/` (tested on cPython 2.7.5 and
Django master). If developer assume that file (or other storage like
radius, cache etc.) contains digit, but this is not true, it could cause
errors (or even security issues in extreme cases) in the future.
Also, even if it is passed to `int(...)` and converted to integer that
way, it creates multiple valid urls for single resource, without developer
knowing about it. This is minor, but unwanted behavior.
(and it is not easy to discover, most of Django developers are puzzled
when I talk with them about this problem
--
Ticket URL: <https://code.djangoproject.com/ticket/22378#comment:3>
* type: Uncategorized => Cleanup/optimization
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/22378#comment:4>
Comment (by mjtamlyn):
A related, but more complex issue is the matter of `\w` with `re.UNICODE`.
I can see that some projects may wish to accept characters like `è` in
URLs, others may be unwittingly doing so. I believe we had this discussion
recently regarding username validation.
I think a specific note about `\w` and `\d` in the URL documentation would
be advisable. In particular, the "correct" regex for capturing a slug
field should also be mentioned.
This also relates to some of my comments on
https://code.djangoproject.com/ticket/22218#comment:2 regarding providing
a non-regex based approach to URL resolving in core where novice
developers are not required to think about these problems.
--
Ticket URL: <https://code.djangoproject.com/ticket/22378#comment:5>
Comment (by tomwys):
Hurl approach is interesting, but it is not resolving issues with
re.UNICODE (but could after some modifications).
Also `[0-9]+` is not solving all problems, because you can add leading
zeros "0001". Correct regex would be `([1-9][0-9]*)` or `([1-9][0-9]*)|0`
if you want to support `0`. But no one would like to use regexps like that
in all of their urls.
Other approach would be to introduce more advanced parsing mechanism that
not only validates data, but also converts it to objects and back to
strings (something like to_python in models). There are several examples
of such parsers that would make sense:
* integer
* float
* decimal
* date
* time
* datetime
Also some of users may want to add more advanced custom parsers like:
* date range (where first date must be earlier)
* integer range (same as above)
* GPS coordinate
* list of slugs (ie. for tags)
Advantages of this approach:
* more robust - it is guaranteed to receive assumed value without
complicated regexps,
* more elastic - you can write your views and templates without assuming
in what format date will be stored in url,
* removes from views repetitive tasks like parsing date from url.
Example syntax:
{{{
url(r'^articles/(\d{4})/$', 'news.views.year_archive'),
# becomes:
url([r'^articles/', year, '/$'], 'news.views.year_archive')),
url(r'^(?P<question_id>\d+)/$', views.detail, name='detail'),
# becomes:
url(r'^', integer('question_id'), r'^/$'], views.detail, name='detail'),
url(r'^weeklyreports/(?P<year>\d{4})/(?P<month>\d{2})/(?P<day>\d+)/$',
'weeklyreports', name='weeklyreports'),
# becomes:
url([r'^weeklyreports/', date('report_date, 'YYYY/MM/DD'), r'/$'],
'weeklyreports', name='weeklyreports'),
}}}
This could be combined with Hurl approach.
--
Ticket URL: <https://code.djangoproject.com/ticket/22378#comment:6>
* owner: nobody => chriscauley
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/22378#comment:7>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"66ec9ee441618894c1ccebdcdd5eb4d7fbf4a6d3"]:
{{{
#!CommitTicketReference repository=""
revision="66ec9ee441618894c1ccebdcdd5eb4d7fbf4a6d3"
Fixed #22378 -- Updated \d to [0-9]+ in urlpatterns of docs and tests.
Thanks tomwys for the suggestion.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22378#comment:8>