Does anyone know why this would happen? I'm really at my wits end about this
one. It seems rather standard to me?
Best regards,
Niklas
-----------------------------------
The code:
class Resource(models.Model):
text = models.TextField(default=u'', max_length=200, core=True)
f = open('%s/testdata/utf8.html' % os.path.dirname(__file__), 'rb') # Open
utf8-encoded file in binary mode (for window's benefit)
# utf8.html is a saved html dump from
http://www.columbia.edu/~kermit/utf8.html
utf8_string = f.read()
f.close()
exotic_text = unicode(utf8_string,'utf-8')
r = Resource(text = exotic_text )
r.save()
--------------------------------------
The exception
Exception raised:
Traceback (most recent call last):
File
"S:\nc\Projects\Inventive\Ticmate\svn\django-trunk\django\test\_doctest.py",
line 1267, in __run
compileflags, 1) in test.globs
File "<doctest ticmate.djangorelated.apps.resources.tests[84]>", line
1, in <module>
(r, is_created) = o.get_or_create_resource(u'os',
languages.by_iso['RU'].id, type = Resource.TYPE_HTML, text = exotic_text)
File
"S:\nc\Projects\Inventive\Ticmate\svn\ticmate\trunk\source\python\ticmate\dj
angorelated\apps\resources\models.py", line 159, in get_or_create_resource
return Resource.objects.get_or_create(language_id = language_id,
ownership = p, defaults=defaults)
File
"S:\nc\Projects\Inventive\Ticmate\svn\django-trunk\django\db\models\manager.
py", line 72, in get_or_create
return self.get_query_set().get_or_create(**kwargs)
File
"S:\nc\Projects\Inventive\Ticmate\svn\django-trunk\django\db\models\query.py
", line 291, in get_or_create
obj.save()
File
"S:\nc\Projects\Inventive\Ticmate\svn\django-trunk\django\db\models\base.py"
, line 264, in save
','.join(placeholders)), db_values)
File "C:\Python25\Lib\site-packages\MySQLdb\cursors.py", line 168, in
execute
if not self._defer_warnings: self._warning_check()
File "C:\Python25\Lib\site-packages\MySQLdb\cursors.py", line 82, in
_warning_check
warn(w[-1], self.Warning, 3)
File "c:\python25\lib\warnings.py", line 62, in warn
globals)
File "c:\python25\lib\warnings.py", line 102, in warn_explicit
raise message
Warning: Incorrect string value: '\xF0\x90\x8C\xBC\xF0\x90...' for
column 'text' at row 1
Hi, I'm trying to store a large utf-8 string with many different language
chars into a MySQL utf-8 table but get a strange exception: Incorrect string
value.
Does anyone know why this would happen? I'm really at my wits end about this
one. It seems rather standard to me?
Best regards,
Niklas
-----------------------------------
The code:
class Resource(models.Model):
text = models.TextField(default=u'', max_length=200, core=True)
f = open('%s/testdata/utf8.html' % os.path.dirname(__file__), 'rb') # Open
utf8-encoded file in binary mode (for window's benefit)
# utf8.html is a saved html dump from
http://www.columbia.edu/~kermit/utf8.html
utf8_string = f.read()
f.close()
exotic_text = unicode(utf8_string,'utf-8')
r = Resource(text = exotic_text )
r.save()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
# -*- coding: utf-8 -*- from django.db import models class Utf8Test(models.Model): ''' I'm developing on Windows but the db host is on FreeBSD: * My max_length of text field is now max_length=256000 and the loaded file is below 64k * The SQL shows a utf-8 table platform._sys_version() ('2.5.1', 'r251:54863', 'Apr 18 2007 08:51:08', 'MSC v.1310 32 bit (Intel)') platform._syscmd_ver() 'Microsoft Windows', 'XP', '5.1.2600') MySQLdb MySQL-python-1.2.2.win32-py2.5.exe MySQL server Server version: 5.0.45 FreeBSD port: mysql-server-5.0.45_1 Django svn This is the SQL output by cursor.execute("show create table foo_utf8test;") during a unit test 'CREATE TABLE `foo_utf8test` (`id` int(11) NOT NULL auto_increment, `text` longtext NOT NULL, PRIMARY KEY (`id`)) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=utf8') >>> import types >>> import os LOAD UTF-8 ENCODED FILE >>> f = open('%s/../resources/testdata/utf8.html' % os.path.dirname(__file__), 'rb') # Open utf8-encoded file in binary mode (for window's benefit) # utf8.html is a saved html dump from http://www.columbia.edu/~kermit/utf8.html >>> utf8_string = f.read() >>> f.close() >>> stored_unicode_text = unicode(utf8_string,'utf-8') CREATE MODEL OBJECT AND SAVE >>> u = Utf8Test() >>> u.text = stored_unicode_text >>> u.save() RELOAD OBJECT FROM DB >>> u2 = Utf8Test.objects.get(id = 1) >>> u2.id == u.id True >>> retrieved_text = u2.text COMPARE DATA >>> type(retrieved_text) == types.UnicodeType True >>> type(stored_unicode_text) == types.UnicodeType True >>> retrieved_text == stored_unicode_text True ''' text = models.TextField(default=u'', max_length=256000) #====================================================================== #FAIL: Doctest: ticmate.djangorelated.apps.foo.models.Utf8Test #---------------------------------------------------------------------- #Traceback (most recent call last): # File "S:\nc\Projects\Inventive\Ticmate\svn\django-trunk\django\test\_doctest.py", line 2180, in runTest # raise self.failureException(self.format_failure(new.getvalue())) #AssertionError: Failed doctest test for ticmate.djangorelated.apps.foo.models.Utf8Test # File "S:\nc\Projects\Inventive\Ticmate\svn\ticmate\trunk\source\python\ticmate\djangorelated\apps\foo\models.py", line 10, in Utf8Test # #---------------------------------------------------------------------- #File "S:\nc\Projects\Inventive\Ticmate\svn\ticmate\trunk\source\python\ticmate\djangorelated\apps\foo\models.py", line 42, in ticmate.djangorelated.apps.foo.models.Utf8Test #Failed example: # u.save() #Exception raised: # Traceback (most recent call last): # File "S:\nc\Projects\Inventive\Ticmate\svn\django-trunk\django\test\_doctest.py", line 1267, in __run # compileflags, 1) in test.globs # File "<doctest ticmate.djangorelated.apps.foo.models.Utf8Test[8]>", line 1, in <module> # Destroying test database... # u.save() # File "S:\nc\Projects\Inventive\Ticmate\svn\django-trunk\django\db\models\base.py", line 264, in save # ','.join(placeholders)), db_values) # File "C:\Python25\Lib\site-packages\MySQLdb\cursors.py", line 168, in execute # if not self._defer_warnings: self._warning_check() # File "C:\Python25\Lib\site-packages\MySQLdb\cursors.py", line 82, in _warning_check # warn(w[-1], self.Warning, 3) # File "c:\python25\lib\warnings.py", line 62, in warn # globals) # File "c:\python25\lib\warnings.py", line 102, in warn_explicit # raise message # Warning: Incorrect string value: '\xF0\x90\x8C\xBC\xF0\x90...' for column 'text' at row 1 #---------------------------------------------------------------------- #File "S:\nc\Projects\Inventive\Ticmate\svn\ticmate\trunk\source\python\ticmate\djangorelated\apps\foo\models.py", line 47, in ticmate.djangorelated.apps.foo.models.Utf8Test #Failed example: # u2.id == u.id #Expected: # True #Got: # False #---------------------------------------------------------------------- #File "S:\nc\Projects\Inventive\Ticmate\svn\ticmate\trunk\source\python\ticmate\djangorelated\apps\foo\models.py", line 57, in ticmate.djangorelated.apps.foo.models.Utf8Test #Failed example: # retrieved_text == stored_unicode_text #Expected: # True #Got: # False # # #---------------------------------------------------------------------- #Ran 7 tests in 3.141s # #FAILED (failures=1) |
From: django...@googlegroups.com [mailto:django...@googlegroups.com] On Behalf Of Karen Tracey
Sent: den 21 mars 2008 20:49
To: django...@googlegroups.com
Subject: Re: Problem storing utf-8 in MySQL
Hi, KarenIt's definitely a utf-8 table and I've increased the max_length but still seem to get the same problem. I've compiled information on the platform and a docstring test to easily reproduce this problem on which is included below or more easily browsed at:Please let me know if I can provide you with better information and thanks a lot for looking at this since I'm really stumped at the moment :-)
From: django...@googlegroups.com [mailto:django...@googlegroups.com] On Behalf Of Karen Tracey
Sent: den 21 mars 2008 20:49
To: django...@googlegroups.com
Subject: Re: Problem storing utf-8 in MySQL
One of the reasons MySQL doesn't support 4-byte UTF-8 codepoints is
because they're not needed for pretty much all practical purposes.
Except for some very ancient scripts and a few of the very rare CJK
characters, everything you'll need is inside the Basic Multilingual
Plane (which requires only three bytes in UTF-8). Not having access to
those rare CJK codepoints isn't going to be a showstopper; they're rare
for a reason.
Malcolm
--
Save the whales. Collect the whole set.
http://www.pointy-stick.com/blog/
On Sun, 2008-04-13 at 13:28 -0700, Rob Hudson wrote:
[...]
> For those of us having a very similar problem elsewhere, what's a good
> general solution?
Depends on the general problem. :-)
Do you mean that you are coming across a case of trying to store 4-byte
UTF-8 sequences in MySQL? Or do you mean that you've managed to create a
database table with the incorrect encoding and only realised this after
you've stored data in the table? Or is there a third possibility?
Regards,
Malcolm
--
Why be difficult when, with a little bit of effort, you could be
impossible.
http://www.pointy-stick.com/blog/
Might work. Depends if it really is all latin1 encoded or really UTF-8
that has been shoved into a latin1 table (PHP has known problems with
trying to do that, apparently).
This thread -- [1] -- might be of some interest to you if you're
looking to re-encode your data. Both the suggestions there are
essentially the same process.
Regards,
Malcolm
--
Remember that you are unique. Just like everyone else.
http://www.pointy-stick.com/blog/