About multilingual models

2 views

Skip to first unread message

Marc Garcia

unread,

Aug 17, 2009, 2:54:25 PM8/17/09

to django-d...@googlegroups.com, djang...@googlegroups.com

Hi folks,

finally I had no time to start coding on multilingual models, as part of
my GSoC project. I did some more analysis on the problem, and possible
solutions; let me share them with you.

Basically, I arrived to the conclusion that there are two different
approaches, both valid, and everyone more suitable depending on the
website. Let me name these methods "model based" and "gettext like".

Summarizing, the model based idea is two define in every model the
structure for translating necessary fields. The gettext like method
would implement a catalog, and the translations would be decoupled from
the models.

Let's explain both methods in more detail:

model based method
---------------------------------

This method is specially interesting in websites where all translations
are provided at the same time. The idea is that doesn't exist a main
language, and we don't want to show another language if the string
doesn't have a value for current language. Imagine you have a virtual
shop build in Django, and you sell products to the US and China. I don't
think it's useful displaying Chinese texts to Americans, or English
texts to Chinese users. Person inputting data on Django probably will
have product name and description in both languages in paper, Excel...
or any other media, so it makes more sense filling all data (in all
languages) at the same time, than entering the product in English, and
then translate in another place.

In this case the admin should allow filling all translations at the same
time, and if a field is required, it should be required for all languages.

In this case I would specify this syntax to let Django know that we want
this field translated:

class MyModel(models.Model):
my_i18n_field = models.CharField(max_length=32, translate=True)

Main advantage of this method is that we have the translate property
together with the field definition. This makes easy to know if a field
will be translated or not after coding the models.

From the database point of view I would create an extra table for every
model, with next structure:

* id
* main_table_id
* language_code
* field1
* field2
* ...

So, to get data would be necessary to join both tables filtering by
current language code. That would make easy to filter, sort or search by
any of the translated fields.

gettext like method
-------------------------------

This method would be more suitable for websites where we provide a
content in one language, and then, we want to offer this content in as
much languages as possible. Imagine a kind of wiki. We write articles in
English, and then we allow users, or we hire translators, to make this
articles available in other languages.

In this case we pretty much emulate the way gettext works. We provide
the content in the main language (on the admin for example), and then
translators access those contents to provide translators. In some cases
it won't be strictly like in gettext, where you usually don't care much
what the text is used for. It would be great having the ability to
provide a link on every article saying "translate it to your language"
if it's not.

While the other method would also work for marking fields as able to be
translated, in this case I would choose something more decoupled from
models. I would use a syntax more close to the admin one. Just
specifying outside the models, which ones we want to translate, and
which fields. Main advantage of this syntax is that we can translate
fields from existing applications without modifying them.

class MyModelTranslation(multilingual.Translation):
translate = ('my_i18n_field',)

multilingual.register(MyModel, MyModelTranslation)

A database structure to support this functionality could be just having
a table named "catalog" where all translations are set. It would be like
a .po file:
* language_code
* msgid
* msgstr

also it would be interesting to provide information about the places
where this string is located:

* msgid
* model/field/id

There are two important problems with this structure. First one is that
filter, sort by translatable fields will be almost impossible. Searching
would be possible (but slow). Second problem is that we would have to
store all values as strings, or just allow translating strings, because
same field would be used to store all translations on the system.

Main advantage of this method is that is quite easy to decouple the
whole translation engine from Django. Modifying an existing application
to allow translating database content could be set up in minutes,
without modifying the existing code.

----------------------------------------

These are my thoughts about that. Both ideas still need more discussion
and improvements.

Regards,
Marc

David Danier

unread,

Aug 19, 2009, 3:28:27 AM8/19/09

to django-d...@googlegroups.com

Hi Marc, hi list,

First to get my email into context, I wrote a similar email some time
ago, which does list some more options on how to do model translations
and offers some kind of hybrid data model as a solution:
http://groups.google.com/group/django-developers/browse_thread/thread/ca5987ea80120c63/cfffc43b9ec29738?#cfffc43b9ec29738

> Basically, I arrived to the conclusion that there are two different
> approaches, both valid, and everyone more suitable depending on the
> website. Let me name these methods "model based" and "gettext like".

I like to dismiss the gettext-like approach, as it causes to much
trouble. Starting with not being usable inside QuerySet's it may not
give django users what they need and want when talking about model
translations. A big problem I currently see is that changes to the
original text may result in a lost translation as long as you only save
the msgid<->msgstr-relation inside the database (as the msgid
"changes"). So you would need to include hints like the model, it's pk
and fieldname (as you suggest, too), but this makes the approach kind of
hacky.

What I think makes using a gettext like approach impossible is the need
to translate fields you use to _find_ a row in the database. A slug
might be needed to be translated for example. So you should be able to
query by a slug, depending on your current language setting. A gettext
like approach will probably definitely fail here.

> model based method
> ---------------------------------
>
> This method is specially interesting in websites where all translations

> are provided at the same time. [...]

> In this case the admin should allow filling all translations at the same
> time, and if a field is required, it should be required for all languages.

I don't think this is what makes a model based approach interesting.
It's the "searchability".

How you present your underlying data structure (database or gettext) to
the user/admin should not be coupled to the data structure. You could
make translations optional in both cases, with the limitation to force
the user/admin to create at least one translation when using a model
based approach (even this might not be needed for data-driven models).

> In this case I would specify this syntax to let Django know that we want
> this field translated:
> class MyModel(models.Model):
> my_i18n_field = models.CharField(max_length=32, translate=True)
> Main advantage of this method is that we have the translate property
> together with the field definition. This makes easy to know if a field
> will be translated or not after coding the models.

I still don't like putting the info about the translated fields inside
the model. Why not use the registry based approach you used for your
gettext like idea? As you can create dynamic models in django and by
this can create the additional table. django-pluggable-model-i18n uses
this, it seems to work.

> From the database point of view I would create an extra table for every
> model, with next structure:
> * id
> * main_table_id
> * language_code
> * field1
> * field2
> * ...
> So, to get data would be necessary to join both tables filtering by
> current language code. That would make easy to filter, sort or search by
> any of the translated fields.

Whats the big problem here, from a usability perspective, is the way
you need to search the translated fields. Thinking about where the field
is saved (model or translation), needing to join by yourself and still
don't be able to fetch the translation itself with the model doesn't
seem to be a real solution.

Example:
Book.objects.get(translation__language='en', translation__field='...')
-> just fetches the book, without the translation
(fetching the book will cause an additional query, for every book)
-> needing to decide where your field lives may cause errors
(and will make adding translated field more complex)
-> you will always need to specify the language, this may cause errors, too

Thats why I suggested a more easy usage in my original email:
Book.objects.get(field='...')
-> should know that "field" is translated and use the translation table
-> should join the translation and select the translated fields
(see original email about how this join needs to be done)
-> should use the language of the request or some other default language
without the need to tell it so
(sites usually only display one translation at a time, of course
there needs so exist some way to get all translations)