{{{
CommandError: errors happened while running msguniq
C:\dev\xxx\locale\django.pot:2738: C:\dev\xxx\locale\django.pot: input is
not valid in "ASCII" encoding
}}}
This is because some of my translatable strings contain non-ASCII
characters. I've checked the code in `makemessages.py` and found the
culprit:
{{{
for line in pot_lines:
if not found and not header_read:
found = True
line = line.replace('charset=CHARSET', 'charset=UTF-8')
if not line and not found:
header_read = True
lines.append(line)
}}}
Since `found` is set to `True` on the first iteration, charset is never
updated as it's usually on line 17.
--
Ticket URL: <https://code.djangoproject.com/ticket/29452>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* owner: nobody => Bartosz Grabski
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/29452#comment:1>
* type: Uncategorized => Bug
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/29452#comment:2>
* has_patch: 0 => 1
Comment:
PR: https://github.com/django/django/pull/9997
--
Ticket URL: <https://code.djangoproject.com/ticket/29452#comment:3>
* needs_better_patch: 0 => 1
Comment:
I would add a test for this.
--
Ticket URL: <https://code.djangoproject.com/ticket/29452#comment:4>
Comment (by Bartosz Grabski):
Will do.
--
Ticket URL: <https://code.djangoproject.com/ticket/29452#comment:5>
Comment (by Ramiro Morales):
I'm the one who introduced this bug in
6ab0d1358fc78077064aab88a4fb0a47ca116391. ''Mea culpa''.
I can contribute a test case (also attached):
{{{
#!diff
diff --git a/tests/i18n/commands/templates/test.html
b/tests/i18n/commands/templates/test.html
index cac034e..3868dc1 100644
--- a/tests/i18n/commands/templates/test.html
+++ b/tests/i18n/commands/templates/test.html
@@ -105,3 +105,5 @@ Plural for a `trans` and `blocktrans` collision case
{% endblocktrans %}
{% trans "Non-breaking space :" %}
+
+{% trans "Nón-ÁSCÍÏ text" %}
diff --git a/tests/i18n/test_extraction.py b/tests/i18n/test_extraction.py
index d9ce3b4..e7557fc 100644
--- a/tests/i18n/test_extraction.py
+++ b/tests/i18n/test_extraction.py
@@ -394,6 +394,14 @@ class BasicExtractorTests(ExtractorTests):
po_contents = fp.read()
self.assertMsgStr("Größe", po_contents)
+ def test_pot_charset_header_is_utf8(self):
+ self.assertFalse(os.path.exists(self.POT_FILE))
+ management.call_command('makemessages', locale=[LOCALE],
verbosity=0, keep_pot=True)
+ self.assertTrue(os.path.exists(self.POT_FILE))
+ with open(self.POT_FILE, 'r', encoding='utf-8') as fp:
+ contents = fp.read()
+ self.assertIn(r'; charset=UTF-8\n"', contents)
+
class JavascriptExtractorTests(ExtractorTests):
}}}
Problem is I can't reproduce the error condition.
In the added test case:
* There is a translatable literal with non-ASCII characters (in a template
file)
* The intermediate POT file is created (and preserved for examination)
* When the POT file is created, the header `"Content-Type: text/plain;
charset=?????\n"` is verified and it already has the `UFT-8` value for the
charset.
Am I missing something? How it comes the created POT file has a `"Content-
Type: text/plain; charset=CHARSET\n"` header?
* Is the fact tha OP is running on Windows?
* Does this happen when extracting literal from .py files? Javascript?
--
Ticket URL: <https://code.djangoproject.com/ticket/29452#comment:6>
* Attachment "29452-test.diff" added.
Test case
* Attachment "29452-test.diff" added.
Test case
--
Comment (by Claude Paroz):
I think you could just unit test the write_pot_file method with some
content like:
{{{
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE\'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2018-06-07 17:21+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <L...@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"
#: django/contrib/gis/apps.py:8
msgid "GIS"
msgstr ""
--
Ticket URL: <https://code.djangoproject.com/ticket/29452#comment:7>
Comment (by Bartosz Grabski):
Thanks Claude, that was actually my idea. Will do.
Ramiro: thanks for the tip.
--
Ticket URL: <https://code.djangoproject.com/ticket/29452#comment:8>
Comment (by Bartosz Grabski):
Updated PR with test: https://github.com/django/django/pull/9997
--
Ticket URL: <https://code.djangoproject.com/ticket/29452#comment:9>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"2bc014750adb093131f77e4c20bc17ba64b75cac" 2bc01475]:
{{{
#!CommitTicketReference repository=""
revision="2bc014750adb093131f77e4c20bc17ba64b75cac"
Fixed #29452 -- Fixed makemessages setting charset of .pot files.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29452#comment:10>
Comment (by Claude Paroz):
As this was a regression (read comment:6), I'd be willing to backport this
to the 2.1 branch. Any opposition?
--
Ticket URL: <https://code.djangoproject.com/ticket/29452#comment:11>
Comment (by Tim Graham <timograham@…>):
In [changeset:"c7d59825d738650e87173d4f9c8781b2e8b8c0c5" c7d59825]:
{{{
#!CommitTicketReference repository=""
revision="c7d59825d738650e87173d4f9c8781b2e8b8c0c5"
[2.1.x] Fixed #29452 -- Fixed makemessages setting charset of .pot files.
Backport of 2bc014750adb093131f77e4c20bc17ba64b75cac from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29452#comment:12>