For example,
{{{
smart_urlquote('http://example.com/?q=http%3A%2F%2Fexample.com%2F%3Fx%3D1%26q%3Ddjango')
}}}
returns
'http://example.com/?q=http://example.com/?x=1&q=django'
which has the query parameters {{{ {'q': ['http://example.com/?x=1',
'django']} }}}
where it should have returned the exact url back
'http://example.com/?q=http%3A%2F%2Fexample.com%2F%3Fx%3D1%26q%3Ddjango'
which has the query parameters {{{ {'q':
['http://example.com/?x=1&q=django']} }}}
I have written a test here -
https://github.com/ienzam/django/blob/master/tests/utils_tests/test_html.py#L203
{{{
# Ensure that embedded quoted url does not get unquoted
self.assertEqual(
quote('http://example.com/?q=http%3A%2F%2Fexample.com%2F%3Fx%3D1%26q%3Ddjango'),
'http://example.com/?q=http%3A%2F%2Fexample.com%2F%3Fx%3D1%26q%3Ddjango')
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22267>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* version: 1.5 => master
* needs_tests: => 0
* needs_docs: => 0
* stage: Unreviewed => Accepted
Comment:
It would seem the purpose of its current implementation is that a URL
which is already quoted is not quoted again, but an unquoted URL will be
quoted. The context in which this is applied, is django.utils.html.urlize.
This was added in #9655
(https://github.com/django/django/commit/e3a7bfccbb83712caf0645e4e33f5c03d9dc462b),
although at that time with a little more code than it's current version.
The characters you are expecting to be escaped are currently marked as
safe. Otherwise, even `http://example.com?q=` would already have the URL
encoding applied to it. At first sight, a solution seems to be to make
`smart_urlquote` even smarter, and only consider the currently listed
characters safe if they are not within a URL parameter themselves. I agree
that this is a bug after all.
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:1>
Comment (by anonymous):
This is the commit which introduced unquote
https://github.com/django/django/commit/b70c371fc1f18ea0c43b503122df3f311afc7105
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:2>
* cc: meenzam@… (added)
* has_patch: 0 => 1
Comment:
The simple fix will be to unquote and quote only the path of the url, I
have
https://github.com/ienzam/django/commit/692ac89689dd9ddf5895bfaf0181434c5daf81e9
This passes the current unit tests.
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:3>
* needs_better_patch: 0 => 1
Comment:
In that patch however, the conditions are altered: quoting is not
performed if any exception is raised and caught. This does not appear to
be part of the tests, so they still all succeed.
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:4>
* has_patch: 1 => 0
Comment:
Hmm, yes the commit is wrong, urlize() will not always escape.
Can you provide the intended behaviour of urlize()? It will then be easier
to fix when we have a set goal. Currently it is not totally clear what is
the correct behaviour of urlize() and smart_urlquote().
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:5>
Comment (by erikr):
The formal description is whatever is documented. However, developers may
depend on other details of its behaviour, which we should not change if it
is not necessary.
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:6>
Comment (by erikr):
By which I don't mean that this particular issue should not be fixed
anymore, in case I was ambiguous - it is definitely a bug.
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:7>
Comment (by meenzam@…):
Then I will suggest to return None from {{{smart_urlquote()}}} in case of
any exception, then {{{urlize()}}} will not think the string as an url and
will show text only.
Also not only path, but also query parameter and fragment should be quoted
(because they may contain ").
For query string, we may first {{{parse_qs()}}} and then individually
encode each key value pairs and remake the query string.
For fragment, We can do the same as path (unquoting and quoting again).
Not perfect solution though because some sites store query parameters in
fragments.
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:8>
Comment (by ienzam):
Wrote a new patch -
https://github.com/ienzam/django/commit/c22ce94588243da76b01e064ca36c9e63b0b54a2
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:9>
* component: Template system => Utilities
Comment:
That commit is built upon your previous patch, so it's very hard for me to
review. At first sight, I don't think that inline function is really
necessary, given how simple it is. But the rest I can't really review in
this way. Also, it would be good to have some tests for how urlize handles
the invalid URLs for which you now want to return None, as that behaviour
should not change.
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:10>
Comment (by erikr):
You might want to have a look at #22223 and
http://bugs.python.org/issue2637 regarding which characters should be safe
for which segments of the URL.
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:11>
Comment (by ienzam):
Read the bug and ticket.
From RFC 2396 ([http://www.ietf.org/rfc/rfc2396.txt]), valid characters
for different parts of the url are following:
path => {{{Alphanumeric and /;-_.!~*'():@&=+$,}}}
query and fragment => {{{Alphanumeric and /;-_.!~*'():@&=+$,?}}} (only
differs from path with an extra {{{?}}})
Can we not unquote and quote again because it can introduce some corner
cases.
We can check if the url is already quoted or not (which was done till
v1.4).
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:12>
* status: new => assigned
* owner: nobody => ienzam
* has_patch: 0 => 1
* needs_tests: 0 => 1
Comment:
Can you look at my patch? Merged all the changes into this one -
https://github.com/ienzam/django/commit/be378e404bedd169bb48ab945e5fd90e90a00d1e
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:13>
Comment (by meenzam@…):
Any update?
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:14>
Comment (by bendavis78):
I'm just curious, is there a precedent for returning None when there's an
error? Normally if there's an exception, it should bubble up, or a more
specific one should be raised.
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:15>
Comment (by ienzam):
If None is returned then
https://github.com/ienzam/django/blob/be378e404bedd169bb48ab945e5fd90e90a00d1e/django/utils/html.py#L274
"if url:" will be false and the invalid url will be treated as text.
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:16>
* needs_better_patch: 1 => 0
* needs_tests: 1 => 0
Comment:
I have chosen a slightly different approach, keeping the unquote/quote
process but on the separate parts as ienzam proposed.
https://github.com/django/django/pull/2902
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:17>
* needs_better_patch: 0 => 1
Comment:
PR has some test failures.
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:18>
* needs_better_patch: 1 => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:19>
* stage: Accepted => Ready for checkin
Comment:
@meenzam, perhaps you'd like to review this as well before we merge it.
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:20>
Comment (by ienzam):
I will test the with the tests from here tomorrow -
https://github.com/ienzam/django/commit/be378e404bedd169bb48ab945e5fd90e90a00d1e
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:21>
Comment (by ienzam):
Thank you for doing this. This patch fixes all of the problems I was
facing.
One concern is though about the url: {{{ http://example.com/%C3%B6/ä/ }}}
The last part of the url {{{ ä }}} implies that this url is not quoted so
the full url should be quoted - {{{ http://example.com/%25C3%25B6/%C3%A4/
}}}
Currently it becomes: {{{ http://example.com/%C3%B6/%C3%A4/ }}}
But maybe this is not a real life scenario (hopefully).
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:22>
Comment (by claudep):
Interesting case, but I think this should be a separate issue. This ticket
is fixing the query string problem.
Your example might be solved by short-circuiting the `unquote` part if the
URL contains any non-safe character.
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:23>
* cc: kevin-brown (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:24>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"4b8a1d2c0d1a8c5107f3aef01597db78d2a2a5ce"]:
{{{
#!CommitTicketReference repository=""
revision="4b8a1d2c0d1a8c5107f3aef01597db78d2a2a5ce"
Fixed #22267 -- Fixed unquote/quote in smart_urlquote
Thanks Md. Enzam Hossain for the report and initial patch, and
Tim Graham for the review.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:25>
Comment (by Claude Paroz <claude@…>):
In [changeset:"b9d9287f59eb5c33dd8bc81179b4cf197fd54456"]:
{{{
#!CommitTicketReference repository=""
revision="b9d9287f59eb5c33dd8bc81179b4cf197fd54456"
Fixed urlize after smart_urlquote rewrite
Refs #22267.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:26>
Comment (by Claude Paroz <claude@…>):
In [changeset:"ec808e807ad711b993f199f6b4165ac6d0e1125b"]:
{{{
#!CommitTicketReference repository=""
revision="ec808e807ad711b993f199f6b4165ac6d0e1125b"
Fixed urlize regression with entities in query strings
Refs #22267.
Thanks Shai Berger for spotting the issue and Tim Graham for the
initial patch.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:27>
Comment (by Claude Paroz <claude@…>):
In [changeset:"ac07890f959c467b3fc9c6dd6d36aafc2eff1fcc"]:
{{{
#!CommitTicketReference repository=""
revision="ac07890f959c467b3fc9c6dd6d36aafc2eff1fcc"
[1.8.x] Fixed urlize regression with entities in query strings
Refs #22267.
Thanks Shai Berger for spotting the issue and Tim Graham for the
initial patch.
Backport of ec808e807 from master.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:28>
Comment (by Tim Graham <timograham@…>):
In [changeset:"7b1a67cce52e5c191fbfa1bca501c6f0222db019"]:
{{{
#!CommitTicketReference repository=""
revision="7b1a67cce52e5c191fbfa1bca501c6f0222db019"
Fixed escaping regression in urlize filter.
Now that the URL is always unescaped as of refs #22267,
we should re-escape it before inserting it into the anchor.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:29>
Comment (by Tim Graham <timograham@…>):
In [changeset:"aba74d6f1e221c10be2798f55971d710521cb404"]:
{{{
#!CommitTicketReference repository=""
revision="aba74d6f1e221c10be2798f55971d710521cb404"
[1.8.x] Fixed escaping regression in urlize filter.
Now that the URL is always unescaped as of refs #22267,
we should re-escape it before inserting it into the anchor.
Backport of 7b1a67cce52e5c191fbfa1bca501c6f0222db019 from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22267#comment:30>