[Django] #22399: loaddata doesn't work correctly when importing utf-8 encoded files

22 views
Skip to first unread message

Django

unread,
Apr 7, 2014, 10:22:19 AM4/7/14
to django-...@googlegroups.com
#22399: loaddata doesn't work correctly when importing utf-8 encoded files
--------------------------------------------+----------------------------
Reporter: bacilla | Owner: nobody
Type: Bug | Status: new
Component: Core (Management commands) | Version: 1.6
Severity: Normal | Keywords: loaddata utf-8
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------------+----------------------------
Environment: Windows 7, Python 3.3, Django 1.6.2, PyYAML 3.11

When initializing DB with a yaml fixture that contains russian characters,
like this:

{{{
- model: testapp.City
fields:
name: Санкт-Петербург
}}}

or unicode escaped sequences, like this:

{{{
- model: testapp.City
fields:
name:
"\u040c\u00ae\u045e\u00ae\u0431\u0401\u040e\u0401\u0430\u0431\u0404"
}}}

in a 'name' column appears garbage.

It seems that this happens because a fixture file doesn't properly opened
in utf-8 encoding, line 122 of the source file
'django/core/management/commands/loaddata.py' (missing parameter
'encoding="utf-8"').

Python discussions there:
https://mail.python.org/pipermail/python-ideas/2013-June/021230.html

--
Ticket URL: <https://code.djangoproject.com/ticket/22399>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Apr 12, 2014, 6:11:38 AM4/12/14
to django-...@googlegroups.com
#22399: loaddata doesn't work correctly when importing utf-8 encoded files
-------------------------------------+-------------------------------------

Reporter: bacilla | Owner: nobody
Type: Bug | Status: new
Component: Core (Management | Version: 1.6
commands) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: loaddata utf-8 | Needs documentation: 0
python3 | Patch needs improvement: 0
Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by claudep):

* keywords: loaddata utf-8 => loaddata utf-8 python3
* needs_better_patch: => 0
* needs_docs: => 0
* needs_tests: => 0
* stage: Unreviewed => Accepted


Comment:

`encoding="utf-8"` is a Python 3 addition to the `open()` method (that
only makes sense when reading the file in text mode).

I think that for best compatibility with other open methods (gzip, zip,
bzip), it would be easier to simply force opening the file in binary mode
('rb'), then the deserializing step should automatically care for decoding
the file in 'utf-8'. Could you test if using `fixture =
open_method(fixture_file, 'rb')` is solving your issue?

--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:1>

Django

unread,
Apr 12, 2014, 8:10:03 AM4/12/14
to django-...@googlegroups.com
#22399: loaddata doesn't work correctly when importing utf-8 encoded files
-------------------------------------+-------------------------------------

Reporter: bacilla | Owner: nobody
Type: Bug | Status: new
Component: Core (Management | Version: 1.6
commands) | Resolution:
Severity: Normal | Triage Stage:
Keywords: loaddata utf-8 | Unreviewed
python3 | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by bacilla):

* stage: Accepted => Unreviewed


Comment:

This fixes the first case (characters in the yaml file), but doesn't fixes
second (unicode escaped sequences).

--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:2>

Django

unread,
Apr 12, 2014, 9:46:06 AM4/12/14
to django-...@googlegroups.com
#22399: loaddata doesn't work correctly when importing utf-8 encoded files
-------------------------------------+-------------------------------------

Reporter: bacilla | Owner: nobody
Type: Bug | Status: new
Component: Core (Management | Version: 1.6
commands) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: loaddata utf-8 | Needs documentation: 0
python3 | Patch needs improvement: 0
Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by claudep):

* stage: Unreviewed => Accepted


--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:3>

Django

unread,
Apr 12, 2014, 10:28:54 AM4/12/14
to django-...@googlegroups.com
#22399: loaddata doesn't work correctly when importing utf-8 encoded files
-------------------------------------+-------------------------------------

Reporter: bacilla | Owner: nobody
Type: Bug | Status: new
Component: Core (Management | Version: 1.6
commands) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: loaddata utf-8 | Needs documentation: 0
python3 | Patch needs improvement: 0
Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by claudep):

As for the escaped sequence, what are you expecting? If I'm looking at
your proposed sequence, the result is really "Ќ®ў®бЁЎЁабЄ"... (\u040c = Ќ,
\u00ae = ®, etc.)

--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:4>

Django

unread,
Apr 12, 2014, 11:45:39 AM4/12/14
to django-...@googlegroups.com
#22399: loaddata doesn't work correctly when importing utf-8 encoded files
-------------------------------------+-------------------------------------

Reporter: bacilla | Owner: nobody
Type: Bug | Status: new
Component: Core (Management | Version: 1.6
commands) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: loaddata utf-8 | Needs documentation: 0
python3 | Patch needs improvement: 0
Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by bacilla):

Oh you're right it is my fault. Right string is
'\u041d\u043e\u0432\u043e\u0441\u0438\u0431\u0438\u0440\u0441\u043a' and
it works perfectly.

--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:5>

Django

unread,
Apr 13, 2014, 2:36:31 PM4/13/14
to django-...@googlegroups.com
#22399: loaddata doesn't work correctly when importing utf-8 encoded files
-------------------------------------+-------------------------------------

Reporter: bacilla | Owner: nobody
Type: Bug | Status: new
Component: Core (Management | Version: 1.6
commands) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: loaddata utf-8 | Needs documentation: 0
python3 | Patch needs improvement: 0
Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by claudep):

OK, then there is an obvious fix, always reading in binary mode:
{{{
diff --git a/django/core/management/commands/loaddata.py
b/django/core/management/commands/loaddata.py
index 44583bd..44946fe 100644
--- a/django/core/management/commands/loaddata.py
+++ b/django/core/management/commands/loaddata.py
@@ -125,7 +125,7 @@ class Command(BaseCommand):
for fixture_file, fixture_dir, fixture_name in
self.find_fixtures(fixture_label):
_, ser_fmt, cmp_fmt =
self.parse_name(os.path.basename(fixture_file))
open_method = self.compression_formats[cmp_fmt]
- fixture = open_method(fixture_file, 'r')
+ fixture = open_method(fixture_file, 'rb')
try:
self.fixture_count += 1
objects_in_fixture = 0
}}}

Or a more elaborate patch that try to take advantage of reading in text
mode on Python 3:
{{{
diff --git a/django/core/management/commands/loaddata.py
b/django/core/management/commands/loaddata.py
index 44583bd..5938770 100644
--- a/django/core/management/commands/loaddata.py
+++ b/django/core/management/commands/loaddata.py
@@ -14,7 +14,7 @@ from django.core.management.base import BaseCommand,
CommandError
from django.core.management.color import no_style
from django.db import (connections, router, transaction,
DEFAULT_DB_ALIAS,
IntegrityError, DatabaseError)
-from django.utils import lru_cache
+from django.utils import lru_cache, six
from django.utils.encoding import force_text
from django.utils.functional import cached_property
from django.utils._os import upath
@@ -76,13 +76,14 @@ class Command(BaseCommand):
self.models = set()

self.serialization_formats =
serializers.get_public_serializer_formats()
+ kwargs = {'encoding': 'utf-8'} if six.PY3 else {}
self.compression_formats = {
- None: open,
- 'gz': gzip.GzipFile,
- 'zip': SingleZipReader
+ None: (open, kwargs),
+ 'gz': (gzip.GzipFile, kwargs),
+ 'zip': (SingleZipReader, {}),
}
if has_bz2:
- self.compression_formats['bz2'] = bz2.BZ2File
+ self.compression_formats['bz2'] = (bz2.BZ2File, kwargs)

with connection.constraint_checks_disabled():
for fixture_label in fixture_labels:
@@ -124,8 +125,8 @@ class Command(BaseCommand):
"""
for fixture_file, fixture_dir, fixture_name in
self.find_fixtures(fixture_label):
_, ser_fmt, cmp_fmt =
self.parse_name(os.path.basename(fixture_file))
- open_method = self.compression_formats[cmp_fmt]
- fixture = open_method(fixture_file, 'r')
+ open_method, kwargs = self.compression_formats[cmp_fmt]
+ fixture = open_method(fixture_file, 'rb', **kwargs)
try:
self.fixture_count += 1
objects_in_fixture = 0
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:6>

Django

unread,
Apr 18, 2014, 11:56:02 AM4/18/14
to django-...@googlegroups.com
#22399: loaddata doesn't work correctly when importing utf-8 encoded files
-------------------------------------+-------------------------------------
Reporter: bacilla | Owner: nobody
Type: Bug | Status: closed

Component: Core (Management | Version: 1.6
commands) | Resolution: fixed

Severity: Normal | Triage Stage: Accepted
Keywords: loaddata utf-8 | Needs documentation: 0
python3 | Patch needs improvement: 0
Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by Claude Paroz <claude@…>):

* status: new => closed
* resolution: => fixed


Comment:

In [changeset:"ed532a6a1ee675432940e69cec866b52aca96575"]:
{{{
#!CommitTicketReference repository=""
revision="ed532a6a1ee675432940e69cec866b52aca96575"
Fixed #22399 -- Forced fixture reading in binary mode

This might help on systems where default encoding is not UTF-8 (and
on Python 3).
Thanks bacilla for the report.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:7>

Django

unread,
Apr 18, 2014, 11:56:54 AM4/18/14
to django-...@googlegroups.com
#22399: loaddata doesn't work correctly when importing utf-8 encoded files
-------------------------------------+-------------------------------------
Reporter: bacilla | Owner: nobody

Type: Bug | Status: closed
Component: Core (Management | Version: 1.6
commands) | Resolution: fixed
Severity: Normal | Triage Stage: Accepted
Keywords: loaddata utf-8 | Needs documentation: 0
python3 | Patch needs improvement: 0
Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by Claude Paroz <claude@…>):

In [changeset:"8d7023dc714acc957fac7ef422ccee4d83429b09"]:
{{{
#!CommitTicketReference repository=""
revision="8d7023dc714acc957fac7ef422ccee4d83429b09"
[1.7.x] Fixed #22399 -- Forced fixture reading in binary mode

This might help on systems where default encoding is not UTF-8 (and
on Python 3).
Thanks bacilla for the report.

Backport of ed532a6a1 from master.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:8>

Django

unread,
Apr 18, 2014, 12:55:32 PM4/18/14
to django-...@googlegroups.com
#22399: loaddata doesn't work correctly when importing utf-8 encoded files
-------------------------------------+-------------------------------------
Reporter: bacilla | Owner: nobody

Type: Bug | Status: closed
Component: Core (Management | Version: 1.6
commands) | Resolution: fixed
Severity: Normal | Triage Stage: Accepted
Keywords: loaddata utf-8 | Needs documentation: 0
python3 | Patch needs improvement: 0
Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by Claude Paroz <claude@…>):

In [changeset:"275811a93c1e5bc6505605967cf2da01f1c038fe"]:
{{{
#!CommitTicketReference repository=""
revision="275811a93c1e5bc6505605967cf2da01f1c038fe"
Adapted fixture read mode to file type

Binary mode added in ed532a6a1e is not supported by ZipFile.
Refs #22399.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:9>

Django

unread,
Apr 18, 2014, 1:25:28 PM4/18/14
to django-...@googlegroups.com
#22399: loaddata doesn't work correctly when importing utf-8 encoded files
-------------------------------------+-------------------------------------
Reporter: bacilla | Owner: nobody

Type: Bug | Status: closed
Component: Core (Management | Version: 1.6
commands) | Resolution: fixed
Severity: Normal | Triage Stage: Accepted
Keywords: loaddata utf-8 | Needs documentation: 0
python3 | Patch needs improvement: 0
Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by Claude Paroz <claude@…>):

In [changeset:"13340df76984d019ff9d4612ed6f38507546aade"]:
{{{
#!CommitTicketReference repository=""
revision="13340df76984d019ff9d4612ed6f38507546aade"
[1.7.x] Adapted fixture read mode to file type

Binary mode added in ed532a6a1e is not supported by ZipFile.
Refs #22399.

Backport of 275811a93 from master.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/22399#comment:10>

Reply all
Reply to author
Forward
0 new messages