[Django] #26093: makemessages messes up unicode characters on Python 3

13 views
Skip to first unread message

Django

unread,
Jan 18, 2016, 8:13:47 AM1/18/16
to django-...@googlegroups.com
#26093: makemessages messes up unicode characters on Python 3
--------------------------------------+---------------------
Reporter: sephii | Owner: nobody
Type: Bug | Status: new
Component: Internationalization | Version: 1.9
Severity: Normal | Keywords: python3
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+---------------------
If you run the makemessages command on Python 3 (tested on Python 3.4.2,
Django 1.9.1) and you have strings that contain unicode characters they
will get incorrectly escaped or even stripped out from the generated PO
file.

For example in a template with {% trans "hello world" %} (the space here
is the unicode character 202f), you'll end up with an msgid
"hello\\u202fworld", which makes the original string unrecognized as a
translation key. Trying with the non-breaking space character (00a0) makes
it disappear completely and creates an msgid "helloworld".

The same works fine on Python 2, the unicode characters are preserved in
the resulting PO file.

--
Ticket URL: <https://code.djangoproject.com/ticket/26093>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jan 18, 2016, 1:52:10 PM1/18/16
to django-...@googlegroups.com
#26093: makemessages messes up unicode characters on Python 3
-------------------------------------+-------------------------------------

Reporter: sephii | Owner: nobody
Type: Bug | Status: new
Component: | Version: 1.9
Internationalization |
Severity: Normal | Resolution:

Keywords: python3 | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by claudep):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Comment:

I was not able to reproduce. It might be nice to provide a test in the
Django test suite to ensure the behavior is correct (or not!).

--
Ticket URL: <https://code.djangoproject.com/ticket/26093#comment:1>

Django

unread,
Jan 18, 2016, 1:55:51 PM1/18/16
to django-...@googlegroups.com
#26093: makemessages messes up unicode characters on Python 3
--------------------------------------+------------------------------------

Reporter: sephii | Owner: nobody
Type: Bug | Status: new
Component: Internationalization | Version: 1.9
Severity: Normal | Resolution:
Keywords: python3 | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by claudep):

* stage: Unreviewed => Accepted


Comment:

I might have talked too quickly. No problem with É, for example, but with
a non-breaking space, the makemessages output: `
./templates/ext_edit.html.py:25: invalid multibyte sequence`.

--
Ticket URL: <https://code.djangoproject.com/ticket/26093#comment:2>

Django

unread,
Jan 19, 2016, 12:46:48 PM1/19/16
to django-...@googlegroups.com
#26093: makemessages messes up unicode characters on Python 3
--------------------------------------+------------------------------------

Reporter: sephii | Owner: nobody
Type: Bug | Status: new
Component: Internationalization | Version: 1.9
Severity: Normal | Resolution:
Keywords: python3 | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by claudep):

The failing test on Python 3:
https://github.com/django/django/compare/master...claudep:26093?expand=1

--
Ticket URL: <https://code.djangoproject.com/ticket/26093#comment:3>

Django

unread,
Jan 19, 2016, 2:30:53 PM1/19/16
to django-...@googlegroups.com
#26093: makemessages messes up unicode characters on Python 3
--------------------------------------+------------------------------------

Reporter: sephii | Owner: nobody
Type: Bug | Status: new
Component: Internationalization | Version: 1.9
Severity: Normal | Resolution:
Keywords: python3 | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by claudep):

This issue is related to the way xgettext interprets escape sequences in
Python source files.
`u'sequence: \xa0'` (note the prefix) is interpreted as an ending unicode
non-breaking space (correct).
`'sequence: \xa0'` (without the prefix) is interpreted as an ending \xa0
byte (which is non-valid UTF-8).
There are not many characters that %r outputs as an escape, but the non-
breaking space is still an important use case.

So xgettext is still interpreting strings in the Python 2 way, as it
cannot differentiate between Python versions by simply reading the source
file.

A possible workaround would be to force outputting the `u''` prefix on
Python 3 when we templatize templates during the extraction process.

--
Ticket URL: <https://code.djangoproject.com/ticket/26093#comment:4>

Django

unread,
Jan 23, 2016, 4:16:17 AM1/23/16
to django-...@googlegroups.com
#26093: makemessages messes up unicode characters on Python 3
--------------------------------------+------------------------------------

Reporter: sephii | Owner: nobody
Type: Bug | Status: new
Component: Internationalization | Version: 1.9
Severity: Normal | Resolution:
Keywords: python3 | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by claudep):

* has_patch: 0 => 1


Comment:

[https://github.com/django/django/pull/6018 PR]

--
Ticket URL: <https://code.djangoproject.com/ticket/26093#comment:5>

Django

unread,
Jan 23, 2016, 7:19:46 AM1/23/16
to django-...@googlegroups.com
#26093: makemessages messes up unicode characters on Python 3
-------------------------------------+-------------------------------------

Reporter: sephii | Owner: nobody
Type: Bug | Status: new
Component: | Version: 1.9
Internationalization |
Severity: Normal | Resolution:
Keywords: python3 | Triage Stage: Ready for
| checkin

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/26093#comment:6>

Django

unread,
Jan 23, 2016, 8:02:18 AM1/23/16
to django-...@googlegroups.com
#26093: makemessages messes up unicode characters on Python 3
-------------------------------------+-------------------------------------
Reporter: sephii | Owner: nobody
Type: Bug | Status: closed
Component: | Version: 1.9
Internationalization |
Severity: Normal | Resolution: fixed

Keywords: python3 | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Claude Paroz <claude@…>):

* status: new => closed
* resolution: => fixed


Comment:

In [changeset:"104eddbdf6c31984b5afbdf5477267570de6d0f4" 104eddb]:
{{{
#!CommitTicketReference repository=""
revision="104eddbdf6c31984b5afbdf5477267570de6d0f4"
Fixed #26093 -- Allowed escape sequences extraction by gettext on Python 3

Thanks Sylvain Fankhauser for the report and Tim Graham for the review.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/26093#comment:7>

Reply all
Reply to author
Forward
0 new messages