Unicode decoding error

96 views
Skip to first unread message

Christophe Pettus

unread,
Apr 11, 2017, 11:18:02 AM4/11/17
to Django users
I've run into the issue described in the code below, where (as far as I can tell) a natural use of __str__ in Python 2.7 results in a Unicode error. I'm not quite sure how to write this code to work properly on both Python 2 and Python 3; what am I missing?

(Note this issue happens on Python 2.7 regardless of the presence of the @python_2_unicode_compatible decorator.)

Models:

from django.db import models
from django.utils.encoding import python_2_unicode_compatible

@python_2_unicode_compatible
class A(models.Model):
c = models.CharField(max_length=20)

def __str__(self):
return self.c

@python_2_unicode_compatible
class B(models.Model):
a = models.ForeignKey(A)

def __str__(self):
return str(self.a)


Failure example:

>>> from test.models import A, B
>>> a = A(c=u'répairer')
>>> a.save()
>>> a.id
1
>>> a1 = A.objects.get(id=1)
>>> a1
<A: répairer>
>>> b = B(a_id=1)
>>> b.save()
>>> b.id
1
>>> b1 = B.objects.get(id=1)
>>> b1
<B: [Bad Unicode data]>
>>> print b1
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/xof/Documents/Dev/environments/peep/lib/python2.7/site-packages/django/utils/six.py", line 842, in <lambda>
klass.__str__ = lambda self: self.__unicode__().encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

--
-- Christophe Pettus
x...@thebuild.com

Tim Graham

unread,
Apr 11, 2017, 12:52:45 PM4/11/17
to Django users
As documented you must return text and not bytes from __str__() when using @python_2_unicode_compatible. That means six.text_type(self.a) rather than str(self.a) (which returns bytes on Python 2).

Christophe Pettus

unread,
Apr 11, 2017, 1:06:46 PM4/11/17
to django...@googlegroups.com
Thanks, and thanks for accepting my documentation change suggestion!

https://github.com/django/django/pull/8349
> --
> You received this message because you are subscribed to the Google Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
> To post to this group, send email to django...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/96a2e023-bf4b-4584-ae36-30e9d48c8927%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Mike Dewhirst

unread,
Apr 11, 2017, 7:34:51 PM4/11/17
to django...@googlegroups.com
On 12/04/2017 2:52 AM, Tim Graham wrote:
> As ​documented
> <https://docs.djangoproject.com/en/1.11/ref/utils/#django.utils.encoding.python_2_unicode_compatible>
> you must return /text/ and not /bytes/ from |__str__()| when using
> |@python_2_unicode_compatible|. That means |six.text_type(self.a)|
> rather than |str(self.a)| (which returns bytes on Python 2).
Tim

Does this mean I should globally replace "str(" with "|six.text_type("
in a 2/3 codebase?|

???

Cheers

Mike

>
> On Tuesday, April 11, 2017 at 11:18:02 AM UTC-4, Christophe Pettus wrote:
>
> I've run into the issue described in the code below, where (as far
> as I can tell) a natural use of __str__ in Python 2.7 results in a
> Unicode error. I'm not quite sure how to write this code to work
> properly on both Python 2 and Python 3; what am I missing?
>
> (Note this issue happens on Python 2.7 regardless of the presence
> of the @python_2_unicode_compatible decorator.)
>
> Models:
>
> from django.db import models
> from django.utils.encoding import python_2_unicode_compatible
>
> @python_2_unicode_compatible
> class A(models.Model):
> c = models.CharField(max_length=20)
>
> def __str__(self):
> return self.c
>
> @python_2_unicode_compatible
> class B(models.Model):
> a = models.ForeignKey(A)
>
> def __str__(self):
> return str(self.a)
>
>
> Failure example:
>
> >>> from test.models import A, B
> >>> a = A(c=u'répairer')
> >>> a.save()
> >>> a.id <http://a.id>
> 1
> >>> a1 = A.objects.get(id=1)
> >>> a1
> <A: répairer>
> >>> b = B(a_id=1)
> >>> b.save()
> >>> b.id <http://b.id>
> 1
> >>> b1 = B.objects.get(id=1)
> >>> b1
> <B: [Bad Unicode data]>
> >>> print b1
> Traceback (most recent call last):
> File "<console>", line 1, in <module>
> File
> "/Users/xof/Documents/Dev/environments/peep/lib/python2.7/site-packages/django/utils/six.py",
> line 842, in <lambda>
> klass.__str__ = lambda self: self.__unicode__().encode('utf-8')
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
> position 1: ordinal not in range(128)
>
> --
> -- Christophe Pettus
> x...@thebuild.com <javascript:>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-users...@googlegroups.com
> <mailto:django-users...@googlegroups.com>.
> To post to this group, send email to django...@googlegroups.com
> <mailto:django...@googlegroups.com>.
> <https://groups.google.com/d/msgid/django-users/96a2e023-bf4b-4584-ae36-30e9d48c8927%40googlegroups.com?utm_medium=email&utm_source=footer>.

Antonis Christofides

unread,
Apr 12, 2017, 5:28:33 AM4/12/17
to django...@googlegroups.com
> Does this mean I should globally replace "str(" with "|six.text_type(" in a
> 2/3 codebase?|
I don't think so; afaiu this must be done for the return value of __str__(), not
everywhere.

Antonis Christofides
http://djangodeployment.com

Christophe Pettus

unread,
Apr 12, 2017, 12:31:56 PM4/12/17
to Django users

> On Apr 12, 2017, at 02:26, Antonis Christofides <ant...@djangodeployment.com> wrote:
>
>> Does this mean I should globally replace "str(" with "|six.text_type(" in a
>> 2/3 codebase?|
> I don't think so; afaiu this must be done for the return value of __str__(), not
> everywhere.

The rules as I understand them are:

1. Define __str__(), not __unicode__() on classes.
2. Decorate your class with @python_2_unicode_compatible.
3. Always return six.text_type from your __str__() function.
4. When casting a class instance to text, use six.text_type(), and not str() (unicode() still works, but it's not Python 3).

In Python 3, this is all a no-op: The __str__() method returns Python 3's string class, which is Unicode.
In Python 2, the decorator uses your __str__() method for the class' __unicode__() method, and creates a new __str__() method that returns a Python 2 string (not unicode) object, UTF-8 encoded.

Personally, I would prefer to use the Python 2 'unicode' type everywhere I can in Python 2, so casting everything to six.text_type (and use from __future__ import unicode_literals etc.) would do that.

Mike Dewhirst

unread,
Apr 13, 2017, 3:26:02 AM4/13/17
to django...@googlegroups.com
Thanks Christophe

Mike
Reply all
Reply to author
Forward
0 new messages