When initializing DB with a yaml fixture that contains russian characters,
like this:
{{{
- model: testapp.City
fields:
name: Санкт-Петербург
}}}
or unicode escaped sequences, like this:
{{{
- model: testapp.City
fields:
name:
"\u040c\u00ae\u045e\u00ae\u0431\u0401\u040e\u0401\u0430\u0431\u0404"
}}}
in a 'name' column appears garbage.
It seems that this happens because a fixture file doesn't properly opened
in utf-8 encoding, line 122 of the source file
'django/core/management/commands/loaddata.py' (missing parameter
'encoding="utf-8"').
Python discussions there:
https://mail.python.org/pipermail/python-ideas/2013-June/021230.html
--
Ticket URL: <https://code.djangoproject.com/ticket/22399>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* keywords: loaddata utf-8 => loaddata utf-8 python3
* needs_better_patch: => 0
* needs_docs: => 0
* needs_tests: => 0
* stage: Unreviewed => Accepted
Comment:
`encoding="utf-8"` is a Python 3 addition to the `open()` method (that
only makes sense when reading the file in text mode).
I think that for best compatibility with other open methods (gzip, zip,
bzip), it would be easier to simply force opening the file in binary mode
('rb'), then the deserializing step should automatically care for decoding
the file in 'utf-8'. Could you test if using `fixture =
open_method(fixture_file, 'rb')` is solving your issue?
--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:1>
* stage: Accepted => Unreviewed
Comment:
This fixes the first case (characters in the yaml file), but doesn't fixes
second (unicode escaped sequences).
--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:2>
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:3>
Comment (by claudep):
As for the escaped sequence, what are you expecting? If I'm looking at
your proposed sequence, the result is really "Ќ®ў®бЁЎЁабЄ"... (\u040c = Ќ,
\u00ae = ®, etc.)
--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:4>
Comment (by bacilla):
Oh you're right it is my fault. Right string is
'\u041d\u043e\u0432\u043e\u0441\u0438\u0431\u0438\u0440\u0441\u043a' and
it works perfectly.
--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:5>
Comment (by claudep):
OK, then there is an obvious fix, always reading in binary mode:
{{{
diff --git a/django/core/management/commands/loaddata.py
b/django/core/management/commands/loaddata.py
index 44583bd..44946fe 100644
--- a/django/core/management/commands/loaddata.py
+++ b/django/core/management/commands/loaddata.py
@@ -125,7 +125,7 @@ class Command(BaseCommand):
for fixture_file, fixture_dir, fixture_name in
self.find_fixtures(fixture_label):
_, ser_fmt, cmp_fmt =
self.parse_name(os.path.basename(fixture_file))
open_method = self.compression_formats[cmp_fmt]
- fixture = open_method(fixture_file, 'r')
+ fixture = open_method(fixture_file, 'rb')
try:
self.fixture_count += 1
objects_in_fixture = 0
}}}
Or a more elaborate patch that try to take advantage of reading in text
mode on Python 3:
{{{
diff --git a/django/core/management/commands/loaddata.py
b/django/core/management/commands/loaddata.py
index 44583bd..5938770 100644
--- a/django/core/management/commands/loaddata.py
+++ b/django/core/management/commands/loaddata.py
@@ -14,7 +14,7 @@ from django.core.management.base import BaseCommand,
CommandError
from django.core.management.color import no_style
from django.db import (connections, router, transaction,
DEFAULT_DB_ALIAS,
IntegrityError, DatabaseError)
-from django.utils import lru_cache
+from django.utils import lru_cache, six
from django.utils.encoding import force_text
from django.utils.functional import cached_property
from django.utils._os import upath
@@ -76,13 +76,14 @@ class Command(BaseCommand):
self.models = set()
self.serialization_formats =
serializers.get_public_serializer_formats()
+ kwargs = {'encoding': 'utf-8'} if six.PY3 else {}
self.compression_formats = {
- None: open,
- 'gz': gzip.GzipFile,
- 'zip': SingleZipReader
+ None: (open, kwargs),
+ 'gz': (gzip.GzipFile, kwargs),
+ 'zip': (SingleZipReader, {}),
}
if has_bz2:
- self.compression_formats['bz2'] = bz2.BZ2File
+ self.compression_formats['bz2'] = (bz2.BZ2File, kwargs)
with connection.constraint_checks_disabled():
for fixture_label in fixture_labels:
@@ -124,8 +125,8 @@ class Command(BaseCommand):
"""
for fixture_file, fixture_dir, fixture_name in
self.find_fixtures(fixture_label):
_, ser_fmt, cmp_fmt =
self.parse_name(os.path.basename(fixture_file))
- open_method = self.compression_formats[cmp_fmt]
- fixture = open_method(fixture_file, 'r')
+ open_method, kwargs = self.compression_formats[cmp_fmt]
+ fixture = open_method(fixture_file, 'rb', **kwargs)
try:
self.fixture_count += 1
objects_in_fixture = 0
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:6>
* status: new => closed
* resolution: => fixed
Comment:
In [changeset:"ed532a6a1ee675432940e69cec866b52aca96575"]:
{{{
#!CommitTicketReference repository=""
revision="ed532a6a1ee675432940e69cec866b52aca96575"
Fixed #22399 -- Forced fixture reading in binary mode
This might help on systems where default encoding is not UTF-8 (and
on Python 3).
Thanks bacilla for the report.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:7>
Comment (by Claude Paroz <claude@…>):
In [changeset:"8d7023dc714acc957fac7ef422ccee4d83429b09"]:
{{{
#!CommitTicketReference repository=""
revision="8d7023dc714acc957fac7ef422ccee4d83429b09"
[1.7.x] Fixed #22399 -- Forced fixture reading in binary mode
This might help on systems where default encoding is not UTF-8 (and
on Python 3).
Thanks bacilla for the report.
Backport of ed532a6a1 from master.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:8>
Comment (by Claude Paroz <claude@…>):
In [changeset:"275811a93c1e5bc6505605967cf2da01f1c038fe"]:
{{{
#!CommitTicketReference repository=""
revision="275811a93c1e5bc6505605967cf2da01f1c038fe"
Adapted fixture read mode to file type
Binary mode added in ed532a6a1e is not supported by ZipFile.
Refs #22399.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:9>
Comment (by Claude Paroz <claude@…>):
In [changeset:"13340df76984d019ff9d4612ed6f38507546aade"]:
{{{
#!CommitTicketReference repository=""
revision="13340df76984d019ff9d4612ed6f38507546aade"
[1.7.x] Adapted fixture read mode to file type
Binary mode added in ed532a6a1e is not supported by ZipFile.
Refs #22399.
Backport of 275811a93 from master.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:10>