About multilingual models

Marc Garcia

unread,

Aug 17, 2009, 2:54:25 PM8/17/09

to django-d...@googlegroups.com, djang...@googlegroups.com

Hi folks,

finally I had no time to start coding on multilingual models, as part of
my GSoC project. I did some more analysis on the problem, and possible
solutions; let me share them with you.

Basically, I arrived to the conclusion that there are two different
approaches, both valid, and everyone more suitable depending on the
website. Let me name these methods "model based" and "gettext like".

Summarizing, the model based idea is two define in every model the
structure for translating necessary fields. The gettext like method
would implement a catalog, and the translations would be decoupled from
the models.

Let's explain both methods in more detail:

model based method
---------------------------------

This method is specially interesting in websites where all translations
are provided at the same time. The idea is that doesn't exist a main
language, and we don't want to show another language if the string
doesn't have a value for current language. Imagine you have a virtual
shop build in Django, and you sell products to the US and China. I don't
think it's useful displaying Chinese texts to Americans, or English
texts to Chinese users. Person inputting data on Django probably will
have product name and description in both languages in paper, Excel...
or any other media, so it makes more sense filling all data (in all
languages) at the same time, than entering the product in English, and
then translate in another place.

In this case the admin should allow filling all translations at the same
time, and if a field is required, it should be required for all languages.

In this case I would specify this syntax to let Django know that we want
this field translated:

class MyModel(models.Model):
my_i18n_field = models.CharField(max_length=32, translate=True)

Main advantage of this method is that we have the translate property
together with the field definition. This makes easy to know if a field
will be translated or not after coding the models.

From the database point of view I would create an extra table for every
model, with next structure:

* id
* main_table_id
* language_code
* field1
* field2
* ...

So, to get data would be necessary to join both tables filtering by
current language code. That would make easy to filter, sort or search by
any of the translated fields.

gettext like method
-------------------------------

This method would be more suitable for websites where we provide a
content in one language, and then, we want to offer this content in as
much languages as possible. Imagine a kind of wiki. We write articles in
English, and then we allow users, or we hire translators, to make this
articles available in other languages.

In this case we pretty much emulate the way gettext works. We provide
the content in the main language (on the admin for example), and then
translators access those contents to provide translators. In some cases
it won't be strictly like in gettext, where you usually don't care much
what the text is used for. It would be great having the ability to
provide a link on every article saying "translate it to your language"
if it's not.

While the other method would also work for marking fields as able to be
translated, in this case I would choose something more decoupled from
models. I would use a syntax more close to the admin one. Just
specifying outside the models, which ones we want to translate, and
which fields. Main advantage of this syntax is that we can translate
fields from existing applications without modifying them.

class MyModelTranslation(multilingual.Translation):
translate = ('my_i18n_field',)

multilingual.register(MyModel, MyModelTranslation)

A database structure to support this functionality could be just having
a table named "catalog" where all translations are set. It would be like
a .po file:
* language_code
* msgid
* msgstr

also it would be interesting to provide information about the places
where this string is located:

* msgid
* model/field/id

There are two important problems with this structure. First one is that
filter, sort by translatable fields will be almost impossible. Searching
would be possible (but slow). Second problem is that we would have to
store all values as strings, or just allow translating strings, because
same field would be used to store all translations on the system.

Main advantage of this method is that is quite easy to decouple the
whole translation engine from Django. Modifying an existing application
to allow translating database content could be set up in minutes,
without modifying the existing code.

----------------------------------------

These are my thoughts about that. Both ideas still need more discussion
and improvements.

Regards,
Marc

Branko Vukelic

unread,

Aug 18, 2009, 4:04:39 AM8/18/09

to djang...@googlegroups.com

Hi,

I'm still having trouble wrapping my mind around the second approach.

And also, I don't think it's a matter of how you use the system. The
first solution would just work for your second problem. Here's an
example.

(Correct me if I'm wrong, but:)

1. you create the first model (say in English)
2. there are 3 more languages, but since you didn't create any models
for them, the required field validation is not triggered
3. later on you add 3 translations

Also, your first approach can also be made pluggable, just like the
second one. Here's an example:

http://code.google.com/p/django-pluggable-model-i18n/

Personally, I think the 2-table solution with the 'pluggable' syntax
as per the URL above would be the best solution:

Example of the pluggable syntax:

# translations.py
import translations

class ItemTranslation(translations.ModelTranslation):
fields = ('description', 'title')

translations.register(Item, ItemTranslation)

Best regards,

--
Branko

eml: bg.b...@gmail.com
alt: fox2...@yahoo.co.uk
blg1: http://sudologic.blogspot.com/
img: http://picasaweb.google.com/bg.branko
twt: http://www.twitter.com/foxbunny/

Marian Andre

unread,

Sep 18, 2009, 10:38:44 AM9/18/09

to djang...@googlegroups.com

On Tue, Aug 18, 2009 at 10:04 AM, Branko Vukelic <bg.b...@gmail.com> wrote:
>
> Hi,
>
> I'm still having trouble wrapping my mind around the second approach.
>
> And also, I don't think it's a matter of how you use the system. The
> first solution would just work for your second problem. Here's an
> example.
>

Hi,
well I would say that second approach is used mainly when preparing
complete web in different languages at once.
I can easily imagine the first approach as a solution to requirement
of providing continuously updated web in different
languages - e.g. web with articles/posts in more than one language.