[Django] #19508: Implement URL decoding according to RFC 3987

19 views
Skip to first unread message

Django

unread,
Dec 22, 2012, 6:13:04 AM12/22/12
to django-...@googlegroups.com
#19508: Implement URL decoding according to RFC 3987
------------------------------------------------+------------------------
Reporter: aaugustin | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
------------------------------------------------+------------------------
Since #5738, when Django fails to decode an URL because it isn't valid
UTF-8, it returns a HTTP 400 error with no content.

It's a bit sloppy to use a protocol-level message for an application-level
requirement — in other words, to reply to a well-formed HTTP request with
400 Bad Request. [http://tools.ietf.org/html/rfc3987#section-3.2 Section
3.2 of RFC 3987] proposes a solution to this problem. Basically, non-ASCII
bytes that do not create a valid utf-8 sequence should remain URL-encoded.
This may not be trivial to implement, but it provide better error handling
and it's normalized.

With this change, non-existing but well-formed URLs will return a 404
instead of a 400. That's what people expect, as shown in the comments of
#5738 and #16541.

Django builds URLs according to
[http://tools.ietf.org/html/rfc3987#section-3.1 section 3.1 of RFC 3987].
With this change, URLs will round trip cleanly through the reversing /
resolving (that's one of the guarantees of RFC 3987) and Django will be
able to deal with legacy, non-utf-8 URLs. I pursued these goals in #19468
with a more primitive technique (depending only on the encoding) and that
didn't work out.

--
Ticket URL: <https://code.djangoproject.com/ticket/19508>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Dec 22, 2012, 6:22:28 AM12/22/12
to django-...@googlegroups.com
#19508: Implement URL decoding according to RFC 3987
--------------------------------------+------------------------------------

Reporter: aaugustin | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by claudep):

* stage: Unreviewed => Accepted


--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:1>

Django

unread,
Mar 2, 2014, 5:54:07 AM3/2/14
to django-...@googlegroups.com
#19508: Implement URL decoding according to RFC 3987
--------------------------------------+------------------------------------

Reporter: aaugustin | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by anubhav9042):

I am thinking to work on this.....
I have some idea of what to do, by visiting the links provided in the
summary.

Can anyone give me some ideas to begin with...??

--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:2>

Django

unread,
Mar 2, 2014, 6:13:50 AM3/2/14
to django-...@googlegroups.com
#19508: Implement URL decoding according to RFC 3987
--------------------------------------+------------------------------------

Reporter: aaugustin | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by anubhav9042):

* cc: anubhav9042@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:3>

Django

unread,
May 18, 2014, 2:17:01 AM5/18/14
to django-...@googlegroups.com
#19508: Implement URL decoding according to RFC 3987
-------------------------------------+-------------------------------------
Reporter: aaugustin | Owner:
Type: | anubhav9042
Cleanup/optimization | Status: assigned

Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by anubhav9042):

* owner: nobody => anubhav9042
* status: new => assigned


Comment:

Will be working on this in my GSoC project

--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:4>

Django

unread,
Jul 14, 2014, 6:45:08 AM7/14/14
to django-...@googlegroups.com
#19508: Implement URL decoding according to RFC 3987
-------------------------------------+-------------------------------------
Reporter: aaugustin | Owner:
Type: | anubhav9042
Cleanup/optimization | Status: assigned
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by anubhav9042):

Loic and I had a discussion on this today.
Few things worth mentioning:

- When having urls like `/test/~%A9`, we get `400` when project is
deployed as `WSGIHandler` has fallback for `400` in case of
`UnicodeDecodeError`.
- In development we get `500` because `StaticFilesHandler` is used and it
does not have that fallback.
- It cannot be reproduced in tests because the `ClientHandler` again
raises `UnicodeDecodeError` in `get_path_info()`.
- The problem arises when in
[https://github.com/django/django/blob/master/django/core/handlers/wsgi.py#L210
get_path_info()], we decode the url in `utf-8`.
- When we pass a url, it passes through `unquote()` in `urllib` where it
converts all percent encodings, even those which should remain url-
encoded.

I am working on this now and will report again soon.

--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:5>

Django

unread,
Jul 14, 2014, 4:40:46 PM7/14/14
to django-...@googlegroups.com
#19508: Implement URL decoding according to RFC 3987
-------------------------------------+-------------------------------------
Reporter: aaugustin | Owner:
Type: | anubhav9042
Cleanup/optimization | Status: assigned
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by anubhav9042):

* has_patch: 0 => 1


Comment:

https://github.com/django/django/pull/2919

--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:6>

Django

unread,
Aug 14, 2014, 1:50:26 PM8/14/14
to django-...@googlegroups.com
#19508: Implement URL decoding according to RFC 3987
-------------------------------------+-------------------------------------
Reporter: aaugustin | Owner:
Type: | anubhav9042
Cleanup/optimization | Status: assigned
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by timgraham):

[https://github.com/django/django/pull/2932 Alternate PR] from Anubhav.

--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:7>

Django

unread,
Oct 13, 2014, 3:50:44 AM10/13/14
to django-...@googlegroups.com
#19508: Implement URL decoding according to RFC 3987
--------------------------------------+------------------------------------
Reporter: aaugustin | Owner: loic
Type: Cleanup/optimization | Status: assigned

Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by loic):

* owner: anubhav9042 => loic


Comment:

This ticket was recently mentioned in a ML thread:
https://groups.google.com/d/topic/django-developers/mS9-HXI4ljw/discussion

I amended the patch from Anubhav to make `uri_to_iri()` return unicode
rather than UTF-8 (Option 2 in PR comment
https://github.com/django/django/pull/2932/files#r15440287). This is
consistent with Werkzeug's implementation (Refs
http://werkzeug.pocoo.org/docs/0.9/utils/#werkzeug.urls.uri_to_iri)


Made a new PR - https://github.com/django/django/pull/3350 - feedback
welcome.

--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:8>

Django

unread,
Oct 14, 2014, 1:30:50 AM10/14/14
to django-...@googlegroups.com
#19508: Implement URL decoding according to RFC 3987
--------------------------------------+------------------------------------
Reporter: aaugustin | Owner: loic
Type: Cleanup/optimization | Status: assigned
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by loic):

I reworked the implementation of the "repercent" step. Anyone experienced
with unicode to double check?

I still want to review our usage of the various quote/unquote functions
and their respective quirks in term of input/return values (and their
discrepancies between PY2 and PY3).

--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:9>

Django

unread,
Oct 15, 2014, 3:18:37 PM10/15/14
to django-...@googlegroups.com
#19508: Implement URL decoding according to RFC 3987
-------------------------------------+-------------------------------------
Reporter: aaugustin | Owner: loic
Type: | Status: assigned
Cleanup/optimization | Version: master
Component: HTTP handling | Resolution:
Severity: Normal | Triage Stage: Ready for
Keywords: | checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by claudep):

* stage: Accepted => Ready for checkin


Comment:

Looks good, thanks!

--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:10>

Django

unread,
Oct 15, 2014, 3:32:49 PM10/15/14
to django-...@googlegroups.com
#19508: Implement URL decoding according to RFC 3987
-------------------------------------+-------------------------------------
Reporter: aaugustin | Owner: loic
Type: | Status: closed
Cleanup/optimization | Version: master
Component: HTTP handling | Resolution: fixed

Severity: Normal | Triage Stage: Ready for
Keywords: | checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Loic Bistuer <loic.bistuer@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"10b17a22bec2eaf44c3315614aea87c127caee46"]:
{{{
#!CommitTicketReference repository=""
revision="10b17a22bec2eaf44c3315614aea87c127caee46"
Fixed #19508 -- Implemented uri_to_iri as per RFC.

Thanks Loic Bistuer for helping in shaping the patch and Claude Paroz
for the review.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19508#comment:11>

Reply all
Reply to author
Forward
0 new messages