It's a bit sloppy to use a protocol-level message for an application-level
requirement — in other words, to reply to a well-formed HTTP request with
400 Bad Request. [http://tools.ietf.org/html/rfc3987#section-3.2 Section
3.2 of RFC 3987] proposes a solution to this problem. Basically, non-ASCII
bytes that do not create a valid utf-8 sequence should remain URL-encoded.
This may not be trivial to implement, but it provide better error handling
and it's normalized.
With this change, non-existing but well-formed URLs will return a 404
instead of a 400. That's what people expect, as shown in the comments of
#5738 and #16541.
Django builds URLs according to
[http://tools.ietf.org/html/rfc3987#section-3.1 section 3.1 of RFC 3987].
With this change, URLs will round trip cleanly through the reversing /
resolving (that's one of the guarantees of RFC 3987) and Django will be
able to deal with legacy, non-utf-8 URLs. I pursued these goals in #19468
with a more primitive technique (depending only on the encoding) and that
didn't work out.
--
Ticket URL: <https://code.djangoproject.com/ticket/19508>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:1>
Comment (by anubhav9042):
I am thinking to work on this.....
I have some idea of what to do, by visiting the links provided in the
summary.
Can anyone give me some ideas to begin with...??
--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:2>
* cc: anubhav9042@… (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:3>
* owner: nobody => anubhav9042
* status: new => assigned
Comment:
Will be working on this in my GSoC project
--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:4>
Comment (by anubhav9042):
Loic and I had a discussion on this today.
Few things worth mentioning:
- When having urls like `/test/~%A9`, we get `400` when project is
deployed as `WSGIHandler` has fallback for `400` in case of
`UnicodeDecodeError`.
- In development we get `500` because `StaticFilesHandler` is used and it
does not have that fallback.
- It cannot be reproduced in tests because the `ClientHandler` again
raises `UnicodeDecodeError` in `get_path_info()`.
- The problem arises when in
[https://github.com/django/django/blob/master/django/core/handlers/wsgi.py#L210
get_path_info()], we decode the url in `utf-8`.
- When we pass a url, it passes through `unquote()` in `urllib` where it
converts all percent encodings, even those which should remain url-
encoded.
I am working on this now and will report again soon.
--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:5>
* has_patch: 0 => 1
Comment:
https://github.com/django/django/pull/2919
--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:6>
Comment (by timgraham):
[https://github.com/django/django/pull/2932 Alternate PR] from Anubhav.
--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:7>
* owner: anubhav9042 => loic
Comment:
This ticket was recently mentioned in a ML thread:
https://groups.google.com/d/topic/django-developers/mS9-HXI4ljw/discussion
I amended the patch from Anubhav to make `uri_to_iri()` return unicode
rather than UTF-8 (Option 2 in PR comment
https://github.com/django/django/pull/2932/files#r15440287). This is
consistent with Werkzeug's implementation (Refs
http://werkzeug.pocoo.org/docs/0.9/utils/#werkzeug.urls.uri_to_iri)
Made a new PR - https://github.com/django/django/pull/3350 - feedback
welcome.
--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:8>
Comment (by loic):
I reworked the implementation of the "repercent" step. Anyone experienced
with unicode to double check?
I still want to review our usage of the various quote/unquote functions
and their respective quirks in term of input/return values (and their
discrepancies between PY2 and PY3).
--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:9>
* stage: Accepted => Ready for checkin
Comment:
Looks good, thanks!
--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:10>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"10b17a22bec2eaf44c3315614aea87c127caee46"]:
{{{
#!CommitTicketReference repository=""
revision="10b17a22bec2eaf44c3315614aea87c127caee46"
Fixed #19508 -- Implemented uri_to_iri as per RFC.
Thanks Loic Bistuer for helping in shaping the patch and Claude Paroz
for the review.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:11>