{{{
# coding=utf-8
from __future__ import absolute_import, unicode_literals
import sys
import pytest
from django.core.management.base import OutputWrapper
from django.utils.encoding import smart_bytes
def test_bad_unicode_case_names():
bad_name = smart_bytes(u'£')
ow = OutputWrapper(sys.stdout)
with pytest.raises(UnicodeDecodeError):
ow.write(bad_name)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/26731>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0
Old description:
> In a management command in Python 2.7, if you include unicode characters
> when writing to stdout (with self.stdout.write) you will get a
> UnicodeDecodeError
>
> {{{
> # coding=utf-8
> from __future__ import absolute_import, unicode_literals
>
> import sys
>
> import pytest
> from django.core.management.base import OutputWrapper
> from django.utils.encoding import smart_bytes
>
> def test_bad_unicode_case_names():
> bad_name = smart_bytes(u'£')
> ow = OutputWrapper(sys.stdout)
> with pytest.raises(UnicodeDecodeError):
> ow.write(bad_name)
> }}}
New description:
In a management command in Python 2.7, if you include unicode characters
when writing to stdout (with self.stdout.write) you will get a
UnicodeDecodeError
{{{
# coding=utf-8
from __future__ import absolute_import, unicode_literals
import sys
import pytest
from django.core.management.base import OutputWrapper
from django.utils.encoding import smart_bytes
def test_bad_unicode_names():
bad_name = smart_bytes(u'£')
ow = OutputWrapper(sys.stdout)
with pytest.raises(UnicodeDecodeError):
ow.write(bad_name)
}}}
--
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:1>
Comment (by timgraham):
How do you end up with a situation where you cast a unicode string with
non-ASCII characters to bytes?
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:2>
Comment (by dhobbs):
The string came from the db. The actual error came from
django/core/management/base.py", line 111, in write
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:3>
Comment (by timgraham):
So the broken code is
`self.stdout.write('{}'.format(possibly_unicode_string_from_db))` without
`unicode_literals`?
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:4>
Comment (by claudep):
Apart from the content of a `BinaryField`, I don't see how any non-ASCII
bytestring can come from the database.
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:5>
Comment (by timgraham):
The issue is that the non-ASCII Unicode string from the database is
coerced into the bytestring `'{}'` (basically the same situation as
#21933).
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:6>
Comment (by dhobbs):
It's also compounded by the fact that sys.stdout.write copes with it but
self.stdout.write doesn't
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:7>
Comment (by timgraham):
It's because `OutputWrapper`'s default `ending` is `u'\n'` so we end up
comparing bytestring to Unicode in `msg.endswith(ending)`. I'll leave it
up to Claude or another Unicode expert about the correct resolution for
this.
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:8>
Comment (by claudep):
@dhobbs It's still a bit mysterious for us how you got the non-ASCII
bytestring, that *might* be the bug in the first place. Could you develop
a bit more about your use case?
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:9>
Comment (by timgraham):
`'{}'.format(possibly_unicode_string_from_db)` gives `str` on Python 2.
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:10>
Comment (by claudep):
{{{
>>> print('{}'.format(u'un café ?'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 6: ordinal not in range(128)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:11>
Comment (by timgraham):
I'm using this management command:
{{{
# -*- coding: utf-8 -*-
from django.core.management.base import BaseCommand
from polls.models import Question
class Command(BaseCommand):
def handle(self, *args, **options):
v = 'Output: %s'.format(Question.objects.latest('id'))
print(type(v))
print(v)
self.stdout.write(v)
}}}
with a question with some non-ASCII chars in the name.
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:12>
* type: Uncategorized => Bug
* component: Uncategorized => Core (Management commands)
* stage: Unreviewed => Accepted
Comment:
Wow, I realize now that `format` or `%` (mod) are calling the `__str__` of
the model. Please, Python 3, come soon!
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:13>
* has_patch: 0 => 1
Comment:
[https://github.com/django/django/pull/6755 PR]
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:14>
* needs_better_patch: 0 => 1
Comment:
Tests aren't passing on Windows.
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:15>
* keywords: => py2
Comment:
If someone is interested in the fix that Claude proposed, they'll need to
debug the Windows test failures and propose an updated patch.
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:16>
* status: new => closed
* resolution: => wontfix
Comment:
Closing due to the end of Python 2 support in master in a couple weeks.
--
Ticket URL: <https://code.djangoproject.com/ticket/26731#comment:17>