on 1.4.10:
{{{
In [2]: reverse('test', args=['foo:bar'])
Out[2]: '/foo:bar'
}}}
but on 1.6.2:
{{{
In [2]: reverse('test', args=['foo:bar'])
Out[2]: '/foo%3Abar'
}}}
It would seem to me that this is a regression, as ":@-._~!$&'()*+,;=" are
all allowed unescaped in path segments AFAIK.
I'm bringing this up because it breaks certain OAuth 1 clients against
Bitbucket.
In some places we redirect to URLs whose path segment contains a ":".
Prior to us upgrading to 1.6 the response's location header preserved that
colon, but now it gets escaped, changing the URL (e.g.
https://api.bitbucket.org/2.0/repositories/david/django-
storages/pullrequests/51/diff redirecting to
https://api.bitbucket.org/2.0/repositories/david/django-
storages/diff/regadas/django-storages%3A069fd1d01fbf..f153a70ba254)
In OAuth 1, requests are signed, including the request URL, but the
RFC-5849 does not mandate any pre-processing of the URL. For several OAuth
clients (including requests-oauthlib and python-oauth2) that means they
compute the signature over a string that contains "%3A" instead of ":".
On the server however, the request path automatically gets unquoted before
it hits the middlewares and views. As our OAuth layer is a middleware that
reconstructs the signature, it ends up computing over ":", yielding a
different signature than the client, breaking authentication.
This might be addressable by changing these OAuth clients to perform
unquoting on the path segment, but a better solution would seem to make
`urlresolvers.py:RegexURLResolver` respect the reserved characters for
path segments and not escape what does not need to be escaped.
I'll follow up with a pull request, unless there are strong feelings, or
unwanted consequences of that approach.
--
Ticket URL: <https://code.djangoproject.com/ticket/22223>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0
Comment:
Pull request: https://github.com/django/django/pull/2408
--
Ticket URL: <https://code.djangoproject.com/ticket/22223#comment:1>
Comment (by aaugustin):
This change is indeed documented in the release notes. It's a consequence
of #13260. See [https://code.djangoproject.com/ticket/13260#comment:17 my
analysis] for details.
The change suggested here would still preserve the requirements of #13260,
which was primarily concerned with % characters in variable parts of URLs.
Now the question is -- what characters do we consider safe? By default
[http://docs.python.org/2/library/urllib.html#urllib.quote urllib.quote]
preserves `A-Za-z0-9_.-` and characters defined as safe, which default to
`/`.
Based on RFC 1738:
1. {{{ <>"#%{}|\^~[]`}}} are unsafe and must be encoded (that list
includes the SPACE character).
2. `;/?:@=&` are reserved and must be encoded unless they are used for
their special meaning.
3. `$-_.+!*'(),` are safe and need not be encoded.
We can certainly put the third set of characters in the safe list.
If characters from the second set end up unencoded in URLs generated by
Django, we start relying on user-agent quirks to re-encode them properly
in HTTP request lines. However, `/` is part of this list and considered
safe by the stdlib by default (which may not mean much; the stdlib
contains many unfortunate API choices).
Sinc the path segment always starts with a slash following the host and
ends at the end of the URL or with one of `?`, `;` or `#` (which is always
unsafe), we may choose to preserve `/:@=&`, that is, all of the second set
except for `?` and `;`. If we want to be more careful, we may choose to
preserve only `/` and `:` because `/` is safe by default and `:` is only
used to separate the protocol from the remainder of the URL. That would
resolve your problem.
Can you clarify how you came up with `/:@&=+$,`? If you're including some
characters from the third set above, you should probably include all of
them.
The fix should be backported to 1.6.x since it's a regression.
--
Ticket URL: <https://code.djangoproject.com/ticket/22223#comment:2>
Comment (by aaugustin):
Related bug filed against `urllib`: http://bugs.python.org/issue2637
See also `django.utils.encoding.iri_to_uri` whose docstring discusses RFC
3986, with different character sets from those I mentioned :(
See also `django.utils.html.smart_urlquote` which uses
`safe=b'!*\'();:@&=+$,/?#[]~'`. I'm not sure where that list comes from.
--
Ticket URL: <https://code.djangoproject.com/ticket/22223#comment:3>
* needs_better_patch: 0 => 1
* has_patch: 0 => 1
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/22223#comment:4>
Comment (by erik.van.zijst):
> Can you clarify how you came up with /:@&=+$,?
I took that from paragraph 3.3 of RFC 2396:
> 3.3. Path Component
>
> The path component contains data, specific to the authority (or the
> scheme if there is no authority component), identifying the resource
> within the scope of that scheme and authority.
>
> path = [ abs_path | opaque_part ]
>
> path_segments = segment *( "/" segment )
> segment = *pchar *( ";" param )
> param = *pchar
>
> pchar = unreserved | escaped |
> ":" | "@" | "&" | "=" | "+" | "$" | ","
Also, looking more closely at RFC 1738, it seems to me that the only
reserved characters in HTTP path segments are: "/;?". I'm not entirely
sure where you found those other character sets.
Btw, when I backport the change to 1.6.x, do you want me to create another
pull request against master, or are patches against older branches
automatically merged up?
--
Ticket URL: <https://code.djangoproject.com/ticket/22223#comment:5>
Comment (by aaugustin):
My comment is based on notes I made 5 or 6 years ago working on a very
similar problem.
We have too many RFCs...
Backporting is usually handled by the committer. This patch should apply
cleanly to 1.6.
--
Ticket URL: <https://code.djangoproject.com/ticket/22223#comment:6>
Comment (by erik.van.zijst):
Anything I need to do to the pull request at this point?
--
Ticket URL: <https://code.djangoproject.com/ticket/22223#comment:7>
* cc: denilsonsa@… (added)
Comment:
FYI, I had submitted a ticket asking for a way to customize the quoting
behavior. In my case, I wanted to quote `/`.
https://code.djangoproject.com/ticket/22589
Given, the different needs, maybe the list of safe/unsafe characters could
be supplied in urls.py, per-argument per-url? Just a quick idea:
{{{
url(r'^something/(?P<title>[^/]*)/something/else$', views.something,
name='something', safechars={ 'title': ':@=$-_.+!*'(),' }),
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22223#comment:8>
* cc: apollo13 (added)
Comment:
@denilsonsa Something like that is certainly not going to happen.
--
Ticket URL: <https://code.djangoproject.com/ticket/22223#comment:9>
* cc: jorgecarleitao@… (added)
Comment:
I've reviewed the PR, and added a comment on github.
It seems that 2396 is obsolete by RFC 3986 (Jan. 2005) [1]:
In particular
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
[...]
2.3. Unreserved Characters
Characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include uppercase and lowercase
letters, decimal digits, hyphen, period, underscore, and tilde.
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
They also have an appendix with a compiled information about this
http://tools.ietf.org/html/rfc3986#appendix-A
I'm no expert in this, just bringing your attention to it.
[1] http://tools.ietf.org/html/rfc3986#section-2
--
Ticket URL: <https://code.djangoproject.com/ticket/22223#comment:10>
* needs_better_patch: 1 => 0
Comment:
I think that we should indeed follow RFC 3986. See also
http://bugs.python.org/issue16285.
I've pushed a patch with my proposal, based on Erik's patch.
https://github.com/django/django/pull/2859
--
Ticket URL: <https://code.djangoproject.com/ticket/22223#comment:11>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/22223#comment:12>
* status: new => closed
* resolution: => fixed
Comment:
In [changeset:"e167e96cfea670422ca75d0b35fe7c4195f25b63"]:
{{{
#!CommitTicketReference repository=""
revision="e167e96cfea670422ca75d0b35fe7c4195f25b63"
Fixed #22223 -- Prevented over-escaping URLs in reverse()
And follow more closely the class of characters defined in the
RFC 3986.
Thanks Erik van Zijst for the report and the initial patch, and
Tim Graham for the review.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22223#comment:13>