I'm writing this email, because I need some kind of future proof model
translation in django in the near future. I tried many solutions and
came up with two of my own (not released anywhere), but nothing seems to
fit all needs translations might impose.
So I thought about how model translations in django could work, and how
this should be used by the developer. I send this email directly to
Marc, because he is currently working on some i18n-stuff (GSoC). Gonzalo
seems to be interested in an official solution
(http://groups.google.com/group/django-developers/msg/0be7de2c154aa49d?utoken=lffASzEAAABAWiol0apP9nFxQklB-_Jkc1-9iUSAb45YspdDeB6HhPWDe8H4lvYuxY4dyvcAHxqMKN2o9YZfQ4UFyks_AAUi),
so you might be interested in some discussion about this. Perhaps
Malcolm should be asked about how my proposal fits into the django ORM,
as he has (re)written most of the ORM (-> queryset-refactor).
I know Django 1.1 is on its way and developers might not have the time
to really thnink about this now, but I needed to write it down. If
someone answers and we can start working on this for post-1.1 would be
great, of course. I, personally, don't need to wait. Anyway, according
to the bugtracker theres not much delaying Django 1.1:
http://code.djangoproject.com/query?status=new&status=assigned&status=reopened&component=!Translations&component=!Documentation&milestone=1.1&order=priority
First to summarize the needs. It has to ...
a) be fast, no heavy database-load should be needed
b) work for third-party apps, as third party apps get unusable if you
need i18n and they don't support it (this also means the existing
third-party code must still work for translated models)
c) be transparent for the user, you shouldn't need to think where your
fields come from
d) support missing translations without skipping the whole database-row,
if you for example JOIN two tables (optional)
e) be searchable through the normal ORM, translated fields should not be
hidden in some serialized format
f) be extendable, new fields shouldn't be a big problem
g) convert existing data without a hassle (like sync_transmeta_db)
h) keep context, if I fetch a german model from the database all
relations should fetch german models, too (default-context should come
from request and/or settings)
i) be optional, not every project needs model translations, this is
especially important for third-party-apps, supporting translations
should not mean you have to use it
j) be generic, different people have different needs
k) support translating central fields (like slug), this goes
hand-in-hand with easy searchability of the translated fields
l) support getting all translations for every object in the database
m) integrate well into the admin
All existing solutions fail at some of these points, my own two
solutions failed many of these points, too. So first, lets take a look
at what exists:
1. Put every translation in extra fields inside one big table:
I'm not a fan of this, tends to get messy, especially for big sites
with many translations. Developers need to think where their fields come
from (Book.objects.filter(title_de=...)). Does not support third party apps.
2. Put all fields into its own model, use normal ORM for access.
I you look at this from a database perspective this is the best
solution. But the django ORM does not really support you here. Access to
the fields need an extra query, for every object. Even worse: If you use
some translated fields in your query, you have to do this extra query
anyway (Book.objects.filter(translations__slug='foobar',
translations__language='de'), to get the german translation you have to
execute an extra query, although the JOIN was already done).
3. Use serialization to save all fields into one big BLOB.
Do I need to say anything here? Not searchable, no way to really work
with the translations on the DB-layer.
4. Special case: Create an own model for translations (like 2), but use
the original table for some kind of DEFAULT_LANGUAGE. (pluggable-model-i18n)
I kind of like this, not only because you are able to enhance
third-party-apps. Independent from that I think translations.register()
seems to be a nice idea. But(!) this complicates things to much. You
have to choose whether you want a field out of the DEFAULT_LANGUAGE or
not for every field access/query.
5. gettext like approach.
Save a translation for possible string in the database, fallback to
gettext. DB might explode here, as DB-load should be heavy, really heavy.
So, what to make of these ideas people put into their code here, all
projects have some advantage, while delivering some disadvantages. From
a DB-perspective the second solutions seems to be best, from the
usability-perspective the fourth solutions looks great. The first
solution really shown its advantages when you look at queries
(Book.objects.filter(name_en='this is easy')).
First I want to throw one of my own solutions in, as I think this is
interesting when looking at the admin. What I did was reversing the way
we currently look at solution two, meaning the translation itself gets
the model you work with and the rest is put into some "common" model:
class BookCommon(models.Model):
some_field = models.CharField(...)
class Book(models.Model):
common = models.ForeignKey(BookCommon)
language = models.CharField(...)
translated_field = ...
Whats interesting here are two things:
* We can optimize DB-access using select_related():
Book.objects.select_related('common')
...to we can cache the common attributes, great benefit
* The admin lists every translation for every objects, common fields
need to be "copied" to BookForm. New translations can be created by just
changing the language and hit "save as".
Anyway, this solution did not work well, as relations between objects
get a mess now (you normally have a ForeignKey to/from the Common-Model,
so you are left with all problems of solution two when accessing related
objects). Additionally you cannot access objects that have no
translation in your current language, as
Book.objects.filter(language='de') filters out all rows that doesn't
have a translation. But I think this shows how the second solution can
be enhanced to get this - from a DB-perspective good solution - to be
more usable. It also is the solution that brought me to my current idea,
even if not directly.
So, we have this nice solution, that your database likes, but we have no
way of using it in a way we can benefit from it. Making your database
happy just isn't enough, but how could be use solution two the right
way, when just dealing with SQL? The answer is pretty easy, use a JOIN:
SELECT ... FROM book OUTER JOIN book_trans ON
(book.id=book_trans.book_id AND book_trans.language='en') WHERE ...
Let's explain this, most of it is quite obvious. But why the language
inside the JOIN, could we not just use WHERE language='en'? Thats true,
if we want to force our result to have a translation in the selected
language. If you don't need this (for example because the fields are
optional, the the whole translation is optional) you cannot use WHERE.
But you can actually just change the query above to use a LEFT INNER JOIN:
SELECT ... FROM book LEFT INNER JOIN book_trans ON
(book.id=book_trans.book_id AND book_trans.language='en') WHERE ...
Not you get a result, even if there is no row for the translation in
book_trans. I think this is something model translations needs to
support, solution two kind of supports this, too.
Now lets digg into what pluggable-model-i18n does, as we want to support
third-party apps, right? Why not just _remove_ the fields from the
original model and dynamically create a new translation-model? This way
you could change your database-layout if and only if needed, while
keeping defining models easy. I think of some kind of
translation.register() like pluggable-model-i18n uses. One example:
class SomeObj(models.Model):
foo = models.CharField(...)
bar = models.CharField(...)
class SomeObjTranslation(translation.ModelTranslation):
class Meta:
fields = ('foo',)
translation.register(SomeObj, SomeObjTranslation)
could be converted to:
class SomeObj(models.Model):
bar = models.CharField(...)
class SomeObjTranslation(models.Model):
language = models.CharField(max_length=5, \
choices=settings.LANGUAGES, \
default=settings.LANGUAGE_CODE)
object = models.ForeignKey(SomeObj, related_name='translations')
foo = models.CharField(...)
class Meta:
# only one translation per language and object
unique_together = (('language', 'object'),)
This was you can change third-party apps without needing the developers
to even _care_ about translations, while still keeping the news models
pretty easy. Of course you have to provide some kind of convert-script
that manages to copy all values from the old table to the new one, but
that should not be a big problem.
No lets look at what usage should look like. As we want third-party apps
to keep working no fields should be "renamed" ('field' ->
'translations__field'). In addition I think a query normally should get
one translation of your object, this could be some DEFAULT_LANGUAGE or
the request language by default. The model-object itself needs to allow
transparent access to the fields, too. As obj.title will be translated
you could say, your object represents one language at a time. I would
suggest adding obj.switch_language('en') to load and transparently
replace alle attributes (while keeping the old ones in some cache).
Saving the objects must of course save all translations, too.
Now you might notice, that this sounds familiar. Yes, model inheritance
works similar. We have fields, that transparently are rewritten to the
right table on queries, save() UPDATEs many tables and attributes just
live in one object (obj.parent_field, instead of
obj.parent.parent_field). If you look at this right, you will see, that
the proposed translations are something like models using "reverse
inheritance", meaning behavior is like with inheritance, but the
semantics are reverse. The biggest difference is the changed JOIN, but
django should provide for most of the technics for this, even they need
to be enhanced.
So, what about the other stuff django model translation should provide:
a) fast: Only one JOIN involved, as you only need one language most of
the time. Otherwise its like solution two.
b) third-party-apps: Work like a charm, no fields changed and - because
of some default language in the query - only return objects that are in
the current site language.
c) transparent: If done like inheritance, this should be like
inheritance, so perfectly transparent.
b) missing translation: Supported by using LEFT JOIN.
e) searchable: Like inheritance.
f) extendable: Like normal models, south, django-evolution or similar
perhaps needed.
g) convert: Script needed for this, like sync_transmeta_db does.
h) keep context: Relations need to mind obj.language, should definitely
be possible.
i) optional: If you don't use translation.register() no translation is
done, not even the table is created.
j) generic: If you have some use-case I'm missing tell me.
k) central fields: slug can be translated, access is simple as fwhen
using inheritance.
l) all translations: Just leave the "AND xxx.language='yy'" out of the
JOIN and you get every translation. Similar to using Book.objects.all()
with my approach.
m) admin: Like solution two, I think people have come up with something
here. I still like the idea of viewing every possible translation ans
being able to edit this like one distinct object. But there might be
better solutions.
I have attached some sample usage example, perhaps this gives you some
more detail on the API I suggest.
I have looked into the code and think implementing this should be
possible, but needs some changes in the django-ORM itself. If should be
possible to implement this creating some TranslationQuery-object, but
you would have to copy many code to keep behavior in sync with the
normal Query.
If you read down until here, thank you. I know this is a lot of text
(hey, it only took me about 2 hours to write this down, after thinking
about a solution for the last weeks). I would like to get some input on
this topic, about what you think model translations could look like.
Marc, I don't know if you have some proposal of your own. Perhaps we can
share ideas and even start implementing this together. I am willing to
spend some time with this topic, because I need some solution flexible
enough (aka "fits my needs") for a client. Additionally I think django
would very much benefit from a official solution on this topic.
Greetings, David Danier
I'm working on some other i18n parts right now, but I'll review your
email before working on model translations.
Thanks for sharing your idea, and I'll be back with comments soon.
Cheers,
Marc
I read some of the code around Queryset, Model Inheritance and the
Query-object itself. I _think_ the parts where django needs some
adjustments can be limited to two patches:
1. conditional joins:
Add the ability to use more complex conditions for the ON-clause in
JOINs, meaning allow JOINs to be more than just "left_table.left_field =
right_table.right_field". Perhaps WhereNode
(django.db.sql.where.WhereNode) can be used, as this seems pretty
generic (used for WHERE and HAVING already). Not sure about the
dependencies WhereNode has on JOINs, so perhaps this ends in an chicken
and egg problem.
2. foreign (model) fields:
Add the ability to use fields from other tables as if they are present
in the current Model. Model inheritance currently uses this. Foreign
fields by design always need some JOIN related to them, so this will
depend on conditional joins. If this gets implemented perhaps model
inheritance can be rewritten to use foreign fields, as this looks like a
more generic approach.
Did I miss something?
Greetings, David Danier
From the database perspective it is similar, meaning it uses the same
database structure. What I tried to write down was mostly some usage and
API ideas to solve some things which pop up when using
django-multilingual and others:
* Make it possible to use third party apps in i18n environments even if
the app was not designed to do this (This idea was stolen from
pluggable-model-i18n.)
* Don't add to much overhead for db performance and others (One JOIN,
nothing more, this JOIN should be transparent to the user. This idea was
stolen from model inheritance.)
* Support getting results if no translation is available (sometimes you
don't need to have a translation, for example if all fields are
optional. This is possible in most model translation projects, even if
it involves hammering the database with extra queries for each
translation there. Conditional JOIN solves this in my proposal.)
In conclusion I try to use the database structure django-multilingual
proposes (which should be the best for the job anyway), keep usage as
simple as using model inheritance (keep working with translations as
simple as possible) while using a register approach to keep this
application independent (thats some kind of killer feature).
Hope this helps to see the differences here. Perhaps the file I attached
helps to see some usage examples.
One big advantage of my proposal over any existing solution is the
possibility to use third party apps without changing their code. I still
think this is very important as developers should not need to worry
about internationalization when writing third party apps, because you
should not need to use some complex database layout if you don't need
translations.
pluggable-model-i18n solves this, too, but it has some
limitations/flaws. Using the pluggable-model-i18n you cannot optimize
the SQL query when using translations and you run into many choices
where to find a value, which are most significant if you want to query
your database by some translated field (slug is translated:
Book.objects.get(slug=...) vs. Book.objects.get(translation__slug=...,
translation__language=...)). This are the two most significant
disadvantages, others might appear when using pluggable-model-i18n in a
productive environment.
Greetings, David Danier
first sorry for my late answer, I missed your email somehow.
The SomeObjTranslation-model is what should be dynamically created by
some registry. This registry is a little more difficult than your
suggestion, but should keep things simple enough:
-------------------------------------------------------------
class SomeObjTranslation(translation.ModelTranslation):
class Meta:
fields = ('some_field',)
translation.register(SomeObj, SomeObjTranslation)
-------------------------------------------------------------
Putting this into it's own class makes adding new attributes more easy.
Using the Meta-subclass allows future ModelTranslation's to add/override
fields to/of the original model and keeps this in sync with normal
models (ModelTranslation could be a subclass of models.Model, using its
own metaclass).
Greetings, David Danier
Hi David, sorry for the late answer.
I like your proposal! You simplified some over designed or complex
stuff about my original proposal and added some very neat ideas, good
work.
> 4. Special case: Create an own model for translations (like 2), but use
> the original table for some kind of DEFAULT_LANGUAGE. (pluggable-model-i18n)
> I kind of like this, not only because you are able to enhance
> third-party-apps. Independent from that I think translations.register()
> seems to be a nice idea. But(!) this complicates things to much. You
> have to choose whether you want a field out of the DEFAULT_LANGUAGE or
> not for every field access/query.
This is the only thing I think deserves a second thought. It might
complicate thinks a little bit but it's very nice to be able to have
the model's default language translation loaded without needing the
JOIN, you don't have to check if its the default language or not, what
this does it to check if the translation is loaded (cached), defaults
will allways be loaded and this should be seen as an optimitation for
the common case .
If users have all models translations configured using
DEFAULT_LANGUAGE as default language and this matches the most used
lang (as it should) the content i18n overhead won't be that big most
of the time, this is very important. Also It could optionally be used
as an efficient fallback for missing traslations (a very common
configuration).
I look forward to hearing more ideas and continue this discussion here
and on IRC and since I will have more free time to work on this now, I
will start to study some django internals and take out the useful
stuff from pluggable-model-i18n.
Greetings,
--
Gonzalo Saavedra
> [...] sorry for the late answer.
Sorry for my late answer, too.
> I like your proposal! You simplified some over designed or complex
> stuff about my original proposal and added some very neat ideas, good
> work.
Thanks!
>> 4. Special case: Create an own model for translations (like 2), but use
>> the original table for some kind of DEFAULT_LANGUAGE. (pluggable-model-i18n)
>> I kind of like this, not only because you are able to enhance
>> third-party-apps. Independent from that I think translations.register()
>> seems to be a nice idea. But(!) this complicates things to much. You
>> have to choose whether you want a field out of the DEFAULT_LANGUAGE or
>> not for every field access/query.
>
> This is the only thing I think deserves a second thought. It might
> complicate thinks a little bit but it's very nice to be able to have
> the model's default language translation loaded without needing the
> JOIN, you don't have to check if its the default language or not, what
> this does it to check if the translation is loaded (cached), defaults
> will allways be loaded and this should be seen as an optimitation for
> the common case .
I thought some time about this when I wrote my original email. Because
of the way I proposed how translations should work I don't see any real
drawback when having to use the JOIN in every query:
* If you don't use the translations, no JOIN will be done, as the model
is not changed at all (like with pluggable-model-i18n)
* When using translations you cannot assume, that the language you
first started your project will be used most. I can think of a german
site adding an english translation, which probably will be used most
after some time. If you add even more possible translations the benefit
gets smaller and smaller. When having a site that supports ten languages
you save the JOIN only on 10% of the queries (normal distribution). Not
all scenarios allow you to choose the "right" language as a default.
* I see this similar to using model inheritance. If I need it, I have
to live with a (little) overhead. Trying to put optimization in here,
which might not even be really useful in all cases, is not a great idea
I think.
> If users have all models translations configured using
> DEFAULT_LANGUAGE as default language and this matches the most used
> lang (as it should) the content i18n overhead won't be that big most
> of the time, this is very important. Also It could optionally be used
> as an efficient fallback for missing traslations (a very common
> configuration).
The language fallback is one of the point I don't have a nice solution
for so far. But I don't think having one DEFAULT_LANGUAGE as the
fallback language is enough.
Think about some german site for example. What this site could do is add
an english translation (most common step for non-english sites, as
english really increases the potential userbase). Now after a while
the same site might want to add a spanish translation. As some content
is still only available in german this site might want to use a
multi-level fallback for their spanish version: spanish -> english ->
german (aka "spanish falls back to english, english falls back to german").
Other sites may not even need a fallback, for example when only listing
contents that fit the user language (objects that have no translation
get filtered out) or only having translated fields that are not
mandatory (objects that have no translation get None for every
translated field).
All these reasons did drive me to the conclusion that we can support
users when needing some fallback, but we may not be able to provide a
solution that works for all use cases. Because of this I dismissed the
idea of using a DEFAULT_LANGUAGE, which is used for this only because
its already in the result of the SQL-query.
> I look forward to hearing more ideas and continue this discussion here
> and on IRC and since I will have more free time to work on this now, I
> will start to study some django internals and take out the useful
> stuff from pluggable-model-i18n.
One thing I'm not sure about how to do is tracking different versions of
the translations. A very common scenario is to have an object with some
translations. If you update the objects itself (including one default
translation) all other translations of the model get marked as outdated.
This way you can track whenever an update to some translation is needed.
pluggable-model-i18n could easily solve this by just using some
modified-field in the base-model and all translations. To decide whether
your translation is outdated you only need to compare its modified-field
with the one of the base object. Having two tables this is not as easy,
because you don't have a base translation. You could of course just mark
the first translation as "is_base=True".
Anyway I'm not sure if this deserves to much attention. When writing my
proposal the idea was to implement translation.ModelTranslation (see
idea.txt) as a derivate of models.Model (This is one of the reason I
used a Meta subclass). When doing so the translation could itself add
fields that are not included in the original model (or perhaps even
overwrite some attributes in the original model?):
>>> class SomeObj(models.Model):
>>> foo = models.CharField(...)
>>> bar = models.CharField(...)
>>>
>>> class SomeObjTranslation(translation.ModelTranslation):
>>> is_base = models.BooleanField(default=False)
>>> class Meta:
>>> fields = ('foo',)
This way the translation system stays very extendable and may be used
for some version-tracking-scenario while not supporting this itself.
btw.: This is another reason why I want to always use a JOIN.
Greetings, David Danier