django.core.signing and safe characters

75 views
Skip to first unread message

Ole Laursen

unread,
Jun 20, 2017, 6:18:21 AM6/20/17
to Django developers (Contributions to Django itself)
Hi!

Maybe this has no practical implications, but this has been bugging me for a couple of years now, ever since I started using django.core.signing to generate tokens: if you take a look at

  https://github.com/django/django/blob/master/django/core/signing.py

the comment at the top says

   There are 65 url-safe characters: the 64 used by url-safe base64 and the ':'.
   These functions make use of all of them.

Yet, : is specifically mentioned as a reserved character:

  https://perishablepress.com/stop-using-unsafe-characters-in-urls/

It is used for the scheme "https:". encodeURIComponent(":") returns "%3A".

If I do a test with a link like <a href="/:baz/?foo:=:bar"> in Firefox, the browser doesn't quote any of the colons, though. OTOH, if you put in "foo:bar/" as a relative link, foo: is interpreted as a scheme. So it's not unconditionally safe.

Furthermore, the above page lists some more characters as safe:

  $-_.+!*'(),

Of these only -_.!*'() are not quoted by encodeURIComponent and -_ (and perhaps .) are already taken by signing code.

But in any case, the comment, although satisfying to read, is AFAICT incorrect?

I don't know if it is worth it to switch to another default separator (say *). There would need to be a fallback to : for some years at least.


Ole

Florian Apolloner

unread,
Jun 20, 2017, 6:35:45 AM6/20/17
to Django developers (Contributions to Django itself)
Hi,


On Tuesday, June 20, 2017 at 12:18:21 PM UTC+2, Ole Laursen wrote:
Yet, : is specifically mentioned as a reserved character:

It depends on the context. The assumption here is that the encoded data is always used as part of the path/querystring, for which rfc1738 says:

 Within the <path> and <searchpart> components, "/", ";", "?" are reserved. The "/" character may be used within HTTP to designate a hierarchical structure.

In that sense, ":" is not a reserved character.

It is used for the scheme "https:". encodeURIComponent(":") returns "%3A".

The usefulness of that function seems questionable to me.
 
If I do a test with a link like <a href="/:baz/?foo:=:bar"> in Firefox, the browser doesn't quote any of the colons, though.

This is correct behaviour since ":" is not reserved after all.
 
OTOH, if you put in "foo:bar/" as a relative link, foo: is interpreted as a scheme. So it's not unconditionally safe.

That is true, but then again relative links are hardly useful for cases where d.c.signing is used (usually in an email or when providing a full link to a resource). Even if not, I'd strongly suggest to start with an absolute path at least.

I don't know if it is worth it to switch to another default separator (say *). There would need to be a fallback to : for some years at least.

Given the "impact" it is imo not worth to change.

Cheers,
Florian

Ole Laursen

unread,
Jun 21, 2017, 6:42:39 AM6/21/17
to django-d...@googlegroups.com
Hi again!

I'm sorry if I gave the impression that I'm trying to nitpick
adherence to a standard. There was some discussion about it in the
comments in the link I provided, and it looks like there are different
interpretations, but that's not what I'm interested in.

What I'm addressing here is specifically the comment in the source and
the assumptions it is making, which I have found confusing in practice
because I bumped into the encoding issues.

Reading that comment, I expect to be able get a token and construct a
URL with it, and get a nice URL with no percent goo in it. I don't
expect URL safe strings to be encoded by standard API, neither in the
browser or server-side.

>> It is used for the scheme "https:". encodeURIComponent(":") returns "%3A".
>
> The usefulness of that function seems questionable to me.

Could you elaborate why? What other function would you use client-side
to make sure that you generate a valid query string given a set of
parameters?

Python does the same, by the way:

$ python3
>>> import urllib.parse
>>> urllib.parse.quote(":")
'%3A'
>>> urllib.parse.urlencode({ ':foo': ':bar' })
'%3Afoo=%3Abar'

It looks like urllib is more eager when it comes to escaping, * is
also escaped by default. . isn't. So it would probably have been
better if the signing code had picked . as default separator.


Ole

Florian Apolloner

unread,
Jun 21, 2017, 9:26:29 AM6/21/17
to Django developers (Contributions to Django itself)
Hi Ole,


On Wednesday, June 21, 2017 at 12:42:39 PM UTC+2, Ole Laursen wrote:
I'm sorry if I gave the impression that I'm trying to nitpick
adherence to a standard.

You absolutely did not, I guess my mail was tenser than it was intended to be.
 
What I'm addressing here is specifically the comment in the source and
the assumptions it is making, which I have found confusing in practice
because I bumped into the encoding issues.

There is certainly no problem with adjusting the comment to be more accurate.

>> It is used for the scheme "https:". encodeURIComponent(":") returns "%3A".
>
> The usefulness of that function seems questionable to me.

Could you elaborate why? What other function would you use client-side
to make sure that you generate a valid query string given a set of
parameters?

Sorry I've mixed it up with with encodeURI. encodeURIComponent takes the "safe" route and escapes everything (at least as far as I understand it) because it does not know where the component (unless "component" itself has a sepcific meaning that I am not aware of) ends up -- this also makes the function somewhat ugly to use because you escape more than you need to. On the other hand, if you use encodeURI:

encodeURI('http://google.com?:a=:asd')
"http://google.com?:a=:asd"

it does not escape ':' since it knows that there is no need to escape it.

It looks like urllib is more eager when it comes to escaping, * is
also escaped by default. . isn't. So it would probably have been
better if the signing code had picked . as default separator.

I think this is a result of internally going through quote which only allows safe characters which are safe in every context. Whether that is a good decision or not, I do not know. It would probably have been better to choose '.' as delimiter, but I am not sure that the effort to change that is worth it. In the end I fear that a stylistic change is just not worth it (that doesn't mean that the documentation shouldn't be accurate though).

Cheers,
Florian

Ole Laursen

unread,
Jun 21, 2017, 10:07:25 AM6/21/17
to django-d...@googlegroups.com
2017-06-21 15:26 GMT+02:00 Florian Apolloner <f.apo...@gmail.com>:
> Sorry I've mixed it up with with encodeURI. encodeURIComponent takes the
> "safe" route and escapes everything (at least as far as I understand it)
> because it does not know where the component (unless "component" itself has
> a sepcific meaning that I am not aware of) ends up -- this also makes the
> function somewhat ugly to use because you escape more than you need to. On
> the other hand, if you use encodeURI:
>
> encodeURI('http://google.com?:a=:asd')
> "http://google.com?:a=:asd"
>
> it does not escape ':' since it knows that there is no need to escape it.

This is a side note, but actually encodeURI can't be used to encode
query string parameters. It won't quote = or & either, as you can see
from your example. You need to use encodeURIComponent on each of the
parts separately.

So of the pair, encodeURI is probably the one that has questionable
use as you said, I've personally never needed it.


Ole
Reply all
Reply to author
Forward
0 new messages