--
Ticket URL: <https://code.djangoproject.com/ticket/18063>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0
Comment:
Reference to the Python Doc:
http://docs.python.org/reference/datamodel.html?highlight=repr#object.__repr__
{{{
The return value must be a string object
}}}
It must not be a unicode object in Python 2.x
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:1>
* component: Uncategorized => Database layer (models, ORM)
* type: Uncategorized => Bug
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:2>
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:3>
* cc: real.human@… (added)
Comment:
Updated patch takes into account Python 3.x compatibility.
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:4>
* version: 1.4 => master
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:5>
Comment (by mrmachine):
Moved `encode()` out of try/except block.
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:6>
* status: new => closed
* resolution: => fixed
Comment:
In [3fce0d2a9162cf6e749a6de0b18890dea8955e89]:
{{{
#!CommitTicketReference repository=""
revision="3fce0d2a9162cf6e749a6de0b18890dea8955e89"
Fixed #18063 -- Avoid unicode in Model.__repr__ in python 2
Thanks guettli and mrmachine.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:7>
* status: closed => reopened
* resolution: fixed =>
* severity: Normal => Release blocker
* stage: Accepted => Unreviewed
Comment:
I don't believe this fix is correct. More broadly, the problem description
is not correct. Nowhere in the referenced Python documentation
(http://docs.python.org/reference/datamodel.html?highlight=repr#object.__repr!__)
does it say the (byte)string returned by `__repr__` must contain only
ASCII characters. Django was not returning unicode from `__repr__`, it was
returning a utf-8 encoded bytestring. That's perfectly legal Python. The
fact that some other bits of Python tools are unhelpful and deal with non-
ascii data by throwing up "unprintable exception object" when an exception
is raised involving a model instance with non-ASCII data in its repr
indicates a bug somewhere else. That bug should be fixed at its source,
not by removing non-ASCII characters from model instances' repr.
The referenced doc also states "This is typically used for debugging, so
it is important that the representation is information-rich and
unambiguous." This change has moved in the wrong direction on that score.
Consider before the change:
{{{
--> ./manage.py shell
Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import django
>>> django.get_version()
u'1.5.dev20120819181059'
>>> from ctrac.models import Cat
>>> Cat.objects.filter(adopted_name__startswith='Am')
[<Cat: Skittle (now Amélie)>]
>>> quit()
}}}
After the change:
{{{
--> ./manage.py shell
Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import django
>>> django.get_version()
u'1.5.dev20120821213816'
>>> from ctrac.models import Cat
>>> Cat.objects.filter(adopted_name__startswith='Am')
[<Cat: Skittle (now Am?lie)>]
>>>
}}}
That second output would have me concerned the my data been corrupted,
when in fact it has not. It is just `__repr__` that is now corrupting it
on output.
I believe this change should be reverted and the ticket closed either
wontfix or needsinfo. needsinfo would be to investigate under what
conditions the real problem (unprintable exception objects) occurs and to
see if there is anything that Django is doing to cause it (though I rather
suspect is a base Python problem).
Marking release blocker because this has introduced a regression in
functionality from previous release.
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:8>
Comment (by DrMeers):
Hmm, I think you are correct here Karen, thanks for investigating this
further.
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:9>
* status: reopened => closed
* resolution: => fixed
Comment:
In [dfe63a52effab2c8b5f72a6aceb8646f03d490bb]:
{{{
#!CommitTicketReference repository=""
revision="dfe63a52effab2c8b5f72a6aceb8646f03d490bb"
Revert "Fixed #18063 -- Avoid unicode in Model.__repr__ in python 2"
This reverts commit 3fce0d2a9162cf6e749a6de0b18890dea8955e89.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:10>
* status: closed => reopened
* resolution: fixed =>
* severity: Release blocker => Normal
Comment:
Auto-"fixed" by git commit; reopening to mark as wontfix/needsinfo
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:11>
* status: reopened => closed
* resolution: => needsinfo
Comment:
See Karen's notes above.
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:12>
* status: closed => reopened
* resolution: needsinfo =>
Comment:
The Python documentation says about repr:
{{{
... so it is important that the representation is information-rich and
unambiguous
}}}
It should be **unambiguous**. For me a utf8 byte string is ambiguous. It
could by a pure binary string, or it could be an other encoding like
latin1. You get strange UnicodeErrors if you pass around utf8 byte strings
in Python 2.x. It is hard to get to the root of the problem, especially if
you are new to python.
I understand Karen that she is worried about [<Cat: Skittle (now Am?lie)>]
looking strange and broken. But this output is much better than a unicode
exception without any usable output.
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:13>
Comment (by guettli):
I read the documentation again:
http://docs.python.org/reference/datamodel.html?highlight=repr#object.__repr__
I admit that this is not a good solution:
{{{
[<Cat: Skittle (now Am?lie)>]
}}}
What do you think about this solution? I can be used to recreate the
object like suggested in the above link.
{{{
[<Cat: Skittle (now Am\xc3\xa9lie)>]
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:14>
* status: reopened => closed
* resolution: => wontfix
Comment:
Django code that generates an exception when handed a bytestring which
contains non-ASCII characters is broken. '''That''' is the code that
should be fixed. I strongly believe changing Model `__repr__` to avoid
putting non-ASCII characters in it is the wrong approach. It is
sidestepping a problem rather than fixing the source (or sources) of the
problem.
Bytestrings are ambiguous, yes. You need to rely on some other information
to know how to properly decode bytestrings. However, in Python 2
`__repr__` must return a bytestring, therefore Django must encode to
something. The overall approach taken by Django, consistently, when
unicode support was added, was to "assume utf-8" wherever a bytestring has
to be decoded/encoded and the "correct" encoding is unknown. Returning a
utf-8 encoded bytestring from `__repr__` is consistent with this overall
approach. I don't believe it should be changed.
I'm closing this wontfix. If you can lay out a case (or cases) where non-
ASCII data in `__repr__` leads to exceptions caused by Django code, then
we should fix those issues. But I don't believe the fixes will require
changing the behavior of `__repr__`, and the original description and
discussion so far here has focused solely on `__repr__`, which is the
wrong place to look. Therefore please open a new ticket (or tickets) to
address issues found where Django code cannot correctly handle non-ASCII
data in `__repr__`.
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:15>
Comment (by claudep):
Completely agree with Karen.
However, now that we tend to generalize unicode literals, we should take
care not to use repr in other constructed strings (as in #17566), unless
we can easily obtain !UnicodeDecodeError. Python 3, here we come!
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:16>
Comment (by Thomas Güttler):
Just for the records, there is a corresponding question:
https://stackoverflow.com/questions/46726926/unicodedecodeerror-using-
django-and-format-strings
--
Ticket URL: <https://code.djangoproject.com/ticket/18063#comment:17>