Seen on German Windows7 SP1 with 64bit Python 3.4.1 and gettext 0.18.1.
When you have '''an existing .po file''' with translations, e.g.
{{{
msgid ""
msgstr ""
"Project-Id-Version: \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2014-03-03 10:44+0100\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: \n"
"Language-Team: \n"
"Language: de\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
msgid "Size"
msgstr "Größe"
}}}
and then run
{{{
manage.py makemessages --no_location --no_wrap -l de
}}}
to update the .po file, you get '''a corrupted .po file''':
{{{
msgid ""
msgstr ""
"Project-Id-Version: \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2014-03-03 10:44+0100\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: \n"
"Language-Team: \n"
"Language: de\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
msgid "Size"
msgstr "Größe"
}}}
== Investigation ==
Setting environment variables like LANG, LANGUAGE, LC_ALL, LC_MESSAGES,
etc. has no effect on the outcome. Also calling chcp 65001 on the Windows
console does not fix the problem.
However, it seems like the described behavior was introduced as a side-
effect with this commit
[https://github.com/django/django/commit/dbb48d2bb99a5f660cf2d85f137b8d87fc12d99f]:
All file accesses in ''makemessages.py'' were changed to explicitly use
utf-8, but the stdout of the gettext binaries (like ''msgmerge'' etc.) are
still interpreted with the Windows encoding cp1252, as ''popen_wrapper''
sets ''universal_newline=True'' which in turn uses the encoding returned
by ''locale.getpreferredencoding()''.
As a test I patched ''popen_wrapper'' to interpret output of external
processes with utf-8 encoding instead of
''locale.getpreferredencoding()'':
{{{
def popen_wrapper_utf8(args,
os_err_exc_type=django.core.management.base.CommandError):
"""Monkey-patch for django.core.management.utils.popen_wrapper"""
try:
p = Popen(args, shell=False, stdout=PIPE, stderr=PIPE,
close_fds=os.name != 'nt', universal_newlines=True)
except OSError as e:
six.reraise(os_err_exc_type, os_err_exc_type('Error executing %s:
%s' %
(args[0],
e.strerror)), sys.exc_info()[2])
output, errors = p.communicate()
# Additional utf-8 decoding
output =
output.encode(locale.getpreferredencoding(False)).decode('utf-8')
#
return (
output,
force_text(errors, DEFAULT_LOCALE_ENCODING, strings_only=True),
p.returncode
)
}}}
This has the desired effect and prevents the corruption of the .po files.
I also tried changing the value returned by
''locale.getpreferredencoding()'' to "utf-8", but that seems impossible on
Windows, as Python uses the win32 API ''GetACP()'', which according to
MSDN [http://msdn.microsoft.com/en-
us/library/windows/desktop/dd318070(v=vs.85).aspx] only returns ANSI
codepages and thus will never return "utf-8".
--
Ticket URL: <https://code.djangoproject.com/ticket/23271>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* needs_docs: => 0
* severity: Normal => Release blocker
* needs_tests: => 0
* stage: Unreviewed => Accepted
Comment:
I'm guilty for the regression, but I'm afraid I will not be able to debug
this issue as I don't have access to Windows machines.
If noone else can fix it, I can revert the patch in the 1.7 branch to
remove the release blocking for 1.7.
--
Ticket URL: <https://code.djangoproject.com/ticket/23271#comment:1>
Comment (by andrewgodwin):
I am also without a dev environment. If we have to remove the patch from
1.7, and we think we can do it cleanly, then it's an option.
--
Ticket URL: <https://code.djangoproject.com/ticket/23271#comment:2>
* version: 1.7-rc-2 => master
Comment:
[cdfefbec721b59695e28] has been reverted on the 1.7.x branch. I've applied
[67870137b9e1f1] instead to fix #22686, but I'd like to keep the more
extensive fix on master. The release blocking flag is now only applicable
to master.
--
Ticket URL: <https://code.djangoproject.com/ticket/23271#comment:3>
Comment (by timgraham):
Ramiro said he would have some time to look at this before 1.8 alpha.
--
Ticket URL: <https://code.djangoproject.com/ticket/23271#comment:4>
* has_patch: 0 => 1
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/23271#comment:5>
* status: new => closed
* resolution: => fixed
Comment:
In [changeset:"6fb9dee470d57882e378247fd2706d5f9867b5f9"]:
{{{
#!CommitTicketReference repository=""
revision="6fb9dee470d57882e378247fd2706d5f9867b5f9"
Fixed #23271 -- Don't corrupt PO files on Windows when updating them.
Make sure PO catalog text fetched from gettext programs via standard
output isn't corrupted by mismatch between assumed (UTF-8) and real
(CP1252) encodings. This can cause mojibake to be written when creating
or updating PO files.
Also fixes #23311.
Thanks to contributor with Trac nick 'danielmenzel' for the report,
excellent research and fix.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/23271#comment:6>
* status: closed => new
* has_patch: 1 => 0
* resolution: fixed =>
* stage: Ready for checkin => Accepted
Comment:
Python 3 looks good, but now I have test failures on Python 2 and Windows.
Can you reproduce?
--
Ticket URL: <https://code.djangoproject.com/ticket/23271#comment:7>
Comment (by Ramiro Morales <cramm0@…>):
In [changeset:"002a8ffe478b7520a64c9176f9f640633f643b9c"]:
{{{
#!CommitTicketReference repository=""
revision="002a8ffe478b7520a64c9176f9f640633f643b9c"
Fixed breakage by 6fb9dee4 under Python2+Windows.
Refs #23271
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/23271#comment:8>
* status: new => closed
* resolution: => fixed
--
Ticket URL: <https://code.djangoproject.com/ticket/23271#comment:9>
* status: closed => new
* resolution: fixed =>
Comment:
Regression fails on my machine (Windows 7 64, Django development version
head, with
{{{
locale.getpreferredencoding()=='cp1255'
}}}
{{{
======================================================================
ERROR: test_po_file_encoding_when_updating
(i18n.test_extraction.BasicExtractorTests)
Update of PO file doesn't corrupt it with non-UTF-8 encoding on
Python3+Windows (#23271)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\data\dev\django\django\tests\i18n\test_extraction.py", line
407, in test_po_file_encoding_when_updating
management.call_command('makemessages', locale=['pt_BR'], verbosity=0)
File "C:\data\dev\django\django\django\core\management\__init__.py",
line 118, in call_command
return command.execute(*args, **defaults)
File "C:\data\dev\django\django\django\core\management\base.py", line
398, in execute
output = self.handle(*args, **options)
File
"C:\data\dev\django\django\django\core\management\commands\makemessages.py",
line 322, in handle
self.write_po_file(potfile, locale)
File
"C:\data\dev\django\django\django\core\management\commands\makemessages.py",
line 455, in write_po_file
msgs, errors, status = gettext_popen_wrapper(args)
File
"C:\data\dev\django\django\django\core\management\commands\makemessages.py",
line 39, in
gettext_popen_wrapper
stdout, stderr, status_code = popen_wrapper(args,
os_err_exc_type=os_err_exc_type)
File "C:\data\dev\django\django\django\core\management\utils.py", line
27, in popen_wrapper
output, errors = p.communicate()
File "C:\Python34\lib\subprocess.py", line 959, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "C:\Python34\lib\subprocess.py", line 1234, in _communicate
stdout = stdout[0]
IndexError: list index out of range
}}}
Also, this seems also to be the source of #21928 which isn't resolved for
other people.
The problem is that Popen still tries to encode msgmerge output stream
using DEFAULT_LOCALE_ENCODING, when the data is actually in UTF-8.
A way around it is to call Popen with universal_newlines=true, and then
take care of line ending inside gettext_popen_wrapper. I'll post a patch
soon.
(BTW the current solution is for Windows only. Linux uses have to set
undocumented environment variables. I think we should strive to solve this
for Linux users as well).
--
Ticket URL: <https://code.djangoproject.com/ticket/23271#comment:10>
Comment (by daphshez):
Well, I patched the code and sent a pull request at
[https://github.com/django/django/pull/4532]
This is my first attempt to contribute code and I seem to have messed up
with the process. I am sorry about it. I hope my fix will get considered
anyway...
--
Ticket URL: <https://code.djangoproject.com/ticket/23271#comment:11>
* status: new => closed
* resolution: => fixed
Comment:
In [changeset:"57202a112a966593857725071ecd652a87c157fb" 57202a11]:
{{{
#!CommitTicketReference repository=""
revision="57202a112a966593857725071ecd652a87c157fb"
Fixed #23271 -- Fixed makemessages crash/test failure for some locales.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/23271#comment:12>
Comment (by Tim Graham <timograham@…>):
In [changeset:"c45fd57f682499af0e72f163b87b261dfaf0fbbc" c45fd57f]:
{{{
#!CommitTicketReference repository=""
revision="c45fd57f682499af0e72f163b87b261dfaf0fbbc"
[1.8.x] Fixed #23271 -- Fixed makemessages crash/test failure for some
locales.
Backport of 57202a112a966593857725071ecd652a87c157fb from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/23271#comment:13>