I may be joining the translation discussion shortly; I have a site that is using Russian, French, English and Indonesian.
I'm importing pages from Wordpress (not blog entries - pages) and I get the dreaded "UnicodeDecodeError: 'ascii' codec can't decode byte..." error in Mezzanine code that joins up the titles and that gets the 'description_from_content' when I save the RichTextPage object created from the wordpress page.
USE_I18N is True in settings.
Obviously I don't want to lose the Cyrillic characters and I need to get these posts imported. I've tried various options and the best one so far seems to be using kitchen's to_unicode and to_bytes e.g:
from kitchen.text.converters import to_bytes, to_unicode
...
def import_page(self, page, pages):
title = to_unicode(page['post_title'])
self.vprint("BEGIN Importing page '{0}'".format(to_bytes(title)), 1)
mezz_page = self.get_or_create(RichTextPage, title=title)
if page['post_parent'] > 0: # there is a parent
mezz_page.parent = self.get_mezz_page(page['post_parent'], pages)
mezz_page.created = page['post_modified']
mezz_page.updated = page['post_modified']
mezz_page.content = to_unicode(page['post_content'])
mezz_page.save()
The parent bit is w.i.p. but this works for the content and title - it retains the cyrillic characters correctly. However it seems unwieldy. Is this approach a good one or should I be doing something else?