[Django] #18063: repr() should count only ascii, not unicode

15 views
Skip to first unread message

Django

unread,
Apr 5, 2012, 7:23:19 AM4/5/12
to django-...@googlegroups.com
#18063: repr() should count only ascii, not unicode
-------------------------------+--------------------
Reporter: guettli | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 1.4
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------
repr() should only contain ascii, not unicode. You get strange errors like
"unprintable Exception" if there are non ascii chars in the repr() of a
model (e.g. raise Exception(u'Failed obj-repr=%r' % obj)).

The attached patch passes all unittests on up to date django 1.4 SVN.

--
Ticket URL: <https://code.djangoproject.com/ticket/18063>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Apr 11, 2012, 9:46:16 AM4/11/12
to django-...@googlegroups.com
#18063: repr() should count only ascii, not unicode
-------------------------------+--------------------------------------
Reporter: guettli | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 1.4
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Changes (by guettli):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Comment:

Reference to the Python Doc:
http://docs.python.org/reference/datamodel.html?highlight=repr#object.__repr__


{{{
The return value must be a string object
}}}
It must not be a unicode object in Python 2.x

--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:1>

Django

unread,
Apr 19, 2012, 8:35:34 PM4/19/12
to django-...@googlegroups.com
#18063: repr() should count only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: guettli | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by lukeplant):

* component: Uncategorized => Database layer (models, ORM)
* type: Uncategorized => Bug
* stage: Unreviewed => Accepted


--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:2>

Django

unread,
Apr 27, 2012, 3:20:45 AM4/27/12
to django-...@googlegroups.com
#18063: repr() should return only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: guettli | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:3>

Django

unread,
Aug 20, 2012, 12:13:33 AM8/20/12
to django-...@googlegroups.com
#18063: repr() should return only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: guettli | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by mrmachine):

* cc: real.human@… (added)


Comment:

Updated patch takes into account Python 3.x compatibility.

--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:4>

Django

unread,
Aug 20, 2012, 12:15:16 AM8/20/12
to django-...@googlegroups.com
#18063: repr() should return only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: guettli | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by mrmachine):

* version: 1.4 => master


--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:5>

Django

unread,
Aug 20, 2012, 1:53:52 AM8/20/12
to django-...@googlegroups.com
#18063: repr() should return only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: guettli | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by mrmachine):

Moved `encode()` out of try/except block.

--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:6>

Django

unread,
Aug 20, 2012, 2:51:29 AM8/20/12
to django-...@googlegroups.com
#18063: repr() should return only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: guettli | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: master
(models, ORM) | Resolution: fixed
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by Simon Meers <simon@…>):

* status: new => closed
* resolution: => fixed


Comment:

In [3fce0d2a9162cf6e749a6de0b18890dea8955e89]:
{{{
#!CommitTicketReference repository=""
revision="3fce0d2a9162cf6e749a6de0b18890dea8955e89"
Fixed #18063 -- Avoid unicode in Model.__repr__ in python 2

Thanks guettli and mrmachine.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:7>

Django

unread,
Aug 21, 2012, 9:34:59 PM8/21/12
to django-...@googlegroups.com
#18063: repr() should return only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: guettli | Owner: nobody
Type: Bug | Status: reopened
Component: Database layer | Version: master
(models, ORM) | Resolution:
Severity: Release blocker | Triage Stage:
Keywords: | Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by kmtracey):

* status: closed => reopened
* resolution: fixed =>
* severity: Normal => Release blocker
* stage: Accepted => Unreviewed


Comment:

I don't believe this fix is correct. More broadly, the problem description
is not correct. Nowhere in the referenced Python documentation
(http://docs.python.org/reference/datamodel.html?highlight=repr#object.__repr!__)
does it say the (byte)string returned by `__repr__` must contain only
ASCII characters. Django was not returning unicode from `__repr__`, it was
returning a utf-8 encoded bytestring. That's perfectly legal Python. The
fact that some other bits of Python tools are unhelpful and deal with non-
ascii data by throwing up "unprintable exception object" when an exception
is raised involving a model instance with non-ASCII data in its repr
indicates a bug somewhere else. That bug should be fixed at its source,
not by removing non-ASCII characters from model instances' repr.

The referenced doc also states "This is typically used for debugging, so
it is important that the representation is information-rich and
unambiguous." This change has moved in the wrong direction on that score.
Consider before the change:

{{{
--> ./manage.py shell
Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import django
>>> django.get_version()
u'1.5.dev20120819181059'
>>> from ctrac.models import Cat
>>> Cat.objects.filter(adopted_name__startswith='Am')
[<Cat: Skittle (now Amélie)>]
>>> quit()
}}}

After the change:

{{{
--> ./manage.py shell
Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import django
>>> django.get_version()
u'1.5.dev20120821213816'
>>> from ctrac.models import Cat
>>> Cat.objects.filter(adopted_name__startswith='Am')
[<Cat: Skittle (now Am?lie)>]
>>>
}}}

That second output would have me concerned the my data been corrupted,
when in fact it has not. It is just `__repr__` that is now corrupting it
on output.

I believe this change should be reverted and the ticket closed either
wontfix or needsinfo. needsinfo would be to investigate under what
conditions the real problem (unprintable exception objects) occurs and to
see if there is anything that Django is doing to cause it (though I rather
suspect is a base Python problem).

Marking release blocker because this has introduced a regression in
functionality from previous release.

--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:8>

Django

unread,
Aug 21, 2012, 9:44:02 PM8/21/12
to django-...@googlegroups.com
#18063: repr() should return only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: guettli | Owner: nobody
Type: Bug | Status: reopened
Component: Database layer | Version: master
(models, ORM) | Resolution:
Severity: Release blocker | Triage Stage:
Keywords: | Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by DrMeers):

Hmm, I think you are correct here Karen, thanks for investigating this
further.

--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:9>

Django

unread,
Aug 21, 2012, 9:50:27 PM8/21/12
to django-...@googlegroups.com
#18063: repr() should return only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: guettli | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: master
(models, ORM) | Resolution: fixed
Severity: Release blocker | Triage Stage:
Keywords: | Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Meers <simon@…>):

* status: reopened => closed
* resolution: => fixed


Comment:

In [dfe63a52effab2c8b5f72a6aceb8646f03d490bb]:
{{{
#!CommitTicketReference repository=""
revision="dfe63a52effab2c8b5f72a6aceb8646f03d490bb"
Revert "Fixed #18063 -- Avoid unicode in Model.__repr__ in python 2"

This reverts commit 3fce0d2a9162cf6e749a6de0b18890dea8955e89.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:10>

Django

unread,
Aug 21, 2012, 9:52:26 PM8/21/12
to django-...@googlegroups.com
#18063: repr() should return only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: guettli | Owner: nobody
Type: Bug | Status: reopened
Component: Database layer | Version: master
(models, ORM) | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by DrMeers):

* status: closed => reopened
* resolution: fixed =>

* severity: Release blocker => Normal


Comment:

Auto-"fixed" by git commit; reopening to mark as wontfix/needsinfo

--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:11>

Django

unread,
Aug 21, 2012, 9:53:43 PM8/21/12
to django-...@googlegroups.com
#18063: repr() should return only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: guettli | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: master
(models, ORM) | Resolution: needsinfo
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by DrMeers):

* status: reopened => closed
* resolution: => needsinfo


Comment:

See Karen's notes above.

--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:12>

Django

unread,
Aug 29, 2012, 3:43:33 AM8/29/12
to django-...@googlegroups.com
#18063: repr() should return only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: guettli | Owner: nobody
Type: Bug | Status: reopened
Component: Database layer | Version: master
(models, ORM) | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by guettli):

* status: closed => reopened

* resolution: needsinfo =>


Comment:

The Python documentation says about repr:

{{{
... so it is important that the representation is information-rich and
unambiguous
}}}

It should be **unambiguous**. For me a utf8 byte string is ambiguous. It
could by a pure binary string, or it could be an other encoding like
latin1. You get strange UnicodeErrors if you pass around utf8 byte strings
in Python 2.x. It is hard to get to the root of the problem, especially if
you are new to python.

I understand Karen that she is worried about [<Cat: Skittle (now Am?lie)>]
looking strange and broken. But this output is much better than a unicode
exception without any usable output.

--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:13>

Django

unread,
Aug 29, 2012, 4:04:36 AM8/29/12
to django-...@googlegroups.com
#18063: repr() should return only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: guettli | Owner: nobody
Type: Bug | Status: reopened
Component: Database layer | Version: master
(models, ORM) | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by guettli):

I read the documentation again:
http://docs.python.org/reference/datamodel.html?highlight=repr#object.__repr__

I admit that this is not a good solution:
{{{
[<Cat: Skittle (now Am?lie)>]
}}}

What do you think about this solution? I can be used to recreate the
object like suggested in the above link.

{{{
[<Cat: Skittle (now Am\xc3\xa9lie)>]
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:14>

Django

unread,
Sep 2, 2012, 11:02:10 AM9/2/12
to django-...@googlegroups.com
#18063: repr() should return only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: guettli | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: master
(models, ORM) | Resolution: wontfix
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by kmtracey):

* status: reopened => closed

* resolution: => wontfix


Comment:

Django code that generates an exception when handed a bytestring which
contains non-ASCII characters is broken. '''That''' is the code that
should be fixed. I strongly believe changing Model `__repr__` to avoid
putting non-ASCII characters in it is the wrong approach. It is
sidestepping a problem rather than fixing the source (or sources) of the
problem.

Bytestrings are ambiguous, yes. You need to rely on some other information
to know how to properly decode bytestrings. However, in Python 2
`__repr__` must return a bytestring, therefore Django must encode to
something. The overall approach taken by Django, consistently, when
unicode support was added, was to "assume utf-8" wherever a bytestring has
to be decoded/encoded and the "correct" encoding is unknown. Returning a
utf-8 encoded bytestring from `__repr__` is consistent with this overall
approach. I don't believe it should be changed.

I'm closing this wontfix. If you can lay out a case (or cases) where non-
ASCII data in `__repr__` leads to exceptions caused by Django code, then
we should fix those issues. But I don't believe the fixes will require
changing the behavior of `__repr__`, and the original description and
discussion so far here has focused solely on `__repr__`, which is the
wrong place to look. Therefore please open a new ticket (or tickets) to
address issues found where Django code cannot correctly handle non-ASCII
data in `__repr__`.

--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:15>

Django

unread,
Sep 2, 2012, 1:47:50 PM9/2/12
to django-...@googlegroups.com
#18063: repr() should return only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: guettli | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: master
(models, ORM) | Resolution: wontfix
Severity: Normal | Triage Stage:
Keywords: | Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by claudep):

Completely agree with Karen.

However, now that we tend to generalize unicode literals, we should take
care not to use repr in other constructed strings (as in #17566), unless
we can easily obtain !UnicodeDecodeError. Python 3, here we come!

--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:16>

Django

unread,
May 7, 2018, 5:40:11 AM5/7/18
to django-...@googlegroups.com
#18063: repr() should return only ascii, not unicode
-------------------------------------+-------------------------------------
Reporter: Thomas Güttler | Owner: nobody

Type: Bug | Status: closed
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Thomas Güttler):

Just for the records, there is a corresponding question:
https://stackoverflow.com/questions/46726926/unicodedecodeerror-using-
django-and-format-strings

--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:17>

Reply all
Reply to author
Forward
0 new messages