Should reverse() return a Unicode string?

144 views
Skip to first unread message

Jon Dufresne

unread,
Sep 18, 2014, 4:01:21 PM9/18/14
to django-d...@googlegroups.com
Hi,

In my Django application, I'm making a strong attempt to always deal
with Unicode strings at the application layer. Byte strings are only
handled at the protocol layer -- sending data out on the "wire". If my
application tests if an object is a string, I'll use isinstance(obj,
unicode) (Python2).

One gotcha that I noticed is that reverse() will always return a byte
string. Tracing this through the code, I see this happens during the
call to iri_to_uri(), as this function creates a string consisting
only of ASCII characters, other characters are escaped.

Now, reverse() is often used to grab a URL and handle it at the
application layer. It is not reserved only for the protocol layer. An
example would be presenting a URL inside a HTML template, (as an href
or as text), mail, or JSON.

In my opinion, reverse() should return a Unicode string, even if that
string consists only of ASCII characters. It is not until the string
hits the wire that it ought to be forced to bytes.

To verify this, I have created a unit test that I placed in
"urlpatterns_reverse.tests.URLPatternReverse" to demonstrate this is
at the Django layer.

def test_reverse_unicode(self):
name, expected, args, kwargs = test_data[0]
self.assertIsInstance(
reverse(name, args=args, kwargs=kwargs),
six.text_type)

What do you think? If others agree, I can file a bug and create a pull
request to fix this.

Thanks,
Jon

Carl Meyer

unread,
Sep 18, 2014, 5:11:39 PM9/18/14
to django-d...@googlegroups.com
Hi Jon,
It makes sense to me that `reverse()` should return a text (unicode)
string. A URL may be "just bytes" on the network, but within the Django
context it is clearly text.

I'm a bit concerned about the backwards-compatibility implications,
particularly for Python 3 projects where `bytes` and `str` don't
silently interoperate. It would be really interesting to hear if anyone
on this list has a real-world Python 3 Django project handy and could
test the impact of this change.

Carl

Wim Feijen

unread,
Sep 18, 2014, 5:23:13 PM9/18/14
to django-d...@googlegroups.com
Please do. :) 

- Wim

Tom Christie

unread,
Sep 19, 2014, 8:13:06 AM9/19/14
to django-d...@googlegroups.com
One point of clarity is that we ought to return the same type for each of `reverse`, `request.path`, `request.get_full_path`, `request.path_info`, and the values in the `request.GET` dictionary. Given that, the answer is clearly "it should be a string".

It's also a little unclear to me what type is currently expected from values in the `request.META` dictionary. I believe that the `HTTP_*` keys currently return byte objects, but it's not documented, or clear to me if that's consistently true.

Jon Dufresne

unread,
Sep 19, 2014, 4:09:03 PM9/19/14
to django-d...@googlegroups.com
On Fri, Sep 19, 2014 at 5:13 AM, Tom Christie <christ...@gmail.com> wrote:
> One point of clarity is that we ought to return the same type for each of
> `reverse`, `request.path`, `request.get_full_path`, `request.path_info`, and
> the values in the `request.GET` dictionary. Given that, the answer is
> clearly "it should be a string".

From my testing, these are all returning text strings except `reverse()`.

The pull request for the reverse() fix can be found here:
<https://github.com/django/django/pull/3239>
Reply all
Reply to author
Forward
0 new messages