Model translation

247 views
Skip to first unread message

hejsan

unread,
Aug 4, 2010, 2:58:40 PM8/4/10
to Django developers
Hi.
As promised I hereby bump this old thread about model translation:
http://groups.google.com/group/django-developers/browse_thread/thread/c6fdd3abea0c7f0e/8cd990e2e1f98e22?lnk=gst&q=hejsan#8cd990e2e1f98e22

I hope there is time to discuss this now even if it will not be
targeted before the 2.0 release or later.

Best,
hejsan

Jacob Kaplan-Moss

unread,
Aug 4, 2010, 3:17:05 PM8/4/10
to django-d...@googlegroups.com
Hi Hejsan --

Thanks for the reminder, and for your round-up of the solutions in the
previous thread.

Having used all three of the projects you looked at, and having spent
a bunch of time thinking about the problem, I've come to the
conclusion that nothing's really fully baked enough to consider as an
addition to Django itself. So an obvious prereq is a candidate piece
of software, and none of the existing tools feel all that ready.

Further, I'm far from convinced that Django *needs* to ship a model
translation library. I'm not convinced it's possible to do in a way
that'll work for every -- or even most -- cases.

I'd be a lot more interested in a patch or patches making the creating
of tools like django-multilingual, -transmeta, and -modeltranslation
easier, but I'd be fairly against including any one of them in Django.

Jacob

stefanw

unread,
Aug 5, 2010, 3:13:25 AM8/5/10
to Django developers
Hi Hejsan,

we discussed this topic at the sprints of DjangoCon.eu some time ago.
There is a page in the wiki for this topic where we summarized some
ideas:
http://code.djangoproject.com/wiki/ModelInterNationalization

Instead of one of the existing solutions (which all have serious
drawbacks), I am in favor of approach number 4 on the wiki page:
http://code.djangoproject.com/wiki/ModelInterNationalization#Multilingualmodelwithoneobjectperlanguage4

It avoids schema changes when a language is added and doesn't need
JOINs for translating content. It also has drawbacks, but I would
argue there are manageable. Please have a look at the proposal and
it's API discussion.

I agree with Jacob in that ModelTranslation will not be necessarily
something for the Django core. We should rather identify issues or
existing tickets which need to be resolved in order to go forward. The
wiki page would be a good place to start collecting tickets (there's
already one).

Cheers
Stefan

JK Laiho

unread,
Aug 5, 2010, 6:22:03 AM8/5/10
to Django developers
Hi all,

Having popped my head in to the previous model translation thread in
December, I'll do so here as well. I apologize for the length of this
post, but the issue is complex, so it can't really be helped.

Last time around I mentioned having some ideas on how to maybe do
model translation in a different way than the currently available
alternatives. In the intervening time, I've started hacking on a proof-
of-concept type project, tentatively named django-modelinguistic, but
it's only partially functional and nowhere near a releasable state.

I'd like to present some general considerations here for public
scrutiny, as well as describe the approach django-modelinguistic is
currently taking. The project, while having started promisingly, has
been stuck for a good while due to my limited understanding of Django
internals.

First, here's an incomplete list of things a theoretical optimal model
translation approach should achieve (with the assumption that it's a
reusable app instead of a Django core component, in line with what
Jacob said):

1. It should Just Work as a drop-in component in any existing project,
no matter what apps that project is composed of, with minimal
configuration. It must not be mandatory to build your app from scratch
with model translation in mind. You need to be able to translate the
models of translation-unaware third party reusable apps as well as
your own.

2. It must not require changes to existing models. No extra fields,
nothing. One obvious approach is the admin-style (and django-
modeltranslation-style) registration of models, where translation
functionality is added dynamically to live alongside the untranslated
bits in some way.

3. Reads need to be transparent by default. Fetching the data a of
translated model field should return the language version
corresponding to the active language. In case a model instance doesn't
have translated data for a field in the active language, it must
gracefully fall back to the default language. Of course, sometimes
you'll want to retrieve a specific language version regardless of the
active language, so that must also be possible.

4. Writes need to be intuitive by default. Creating new model
instances and updating existing ones must work sensibly and without
breaking translation-unaware apps.

5. It must work well with schema migration tools, which in practice
means South.

6. It needs to integrate well with contrib.admin.

Some specific issues and examples follow.

Regarding point 1: it's unlikely that any translation solution could
really work with all existing projects and combinations of third-party
apps, especially those that do some funky model-level hackery
themselves. I have a feeling that the best one can do is to attempt an
80/20 solution that works in the common case. For example, the use of
raw SQL is one thing that a translation solution based around the ORM
really can't work around in any way that I can see.


Regarding point 2: crucially, you don't want to start tweaking the
model classes of third-party apps that you've probably installed into
a virtual environment with pip and have no desire to fork. You need to
be able to translate them, but altering their models is not the way to
go. Maintaining your translation-related model changes with upstream
changes would be horrible.


Regarding point 3: some examples are in order. Say we have a model
class called Animal with a "name" CharField, and the default language
is English. The instance with a PK of 1 is a dog, thus "name" equals
"Dog" in the default language.

The "name" field of Animal is then marked for translation into Swedish
and Finnish, and the dog instance is updated with new language
versions using whatever mechanism is appropriate (TBD).

After this, if you activate Swedish, Animal.objects.get(pk=1).name
will return "Hund". Activate Finnish, and it'll return "Koira".

In the case of filtering, if the active language is Finnish,
Animal.objects.filter(name="Koira") should return the correct Animal
instance. This probably means that .filter(name="Dog") will return an
empty set when the active language is not English (workarounds to get
the correct object through any language version may be possible).

Should you want a specific language version instead of the active one,
that can be done with a custom manager that the translation app can
provide for registered models. An example of this follows later.


Regarding point 4: this is TBD as far as my forays into the topic and
django-modelinguistic go. I haven't yet thought through the
relationship of the active language and what gets written where.


Regarding point 5: I had a discussion about this with Andrew Godwin on
the South Users mailing list. I'll summarize the main points here. At
work, we've used django-modeltranslation on a few sites that use the
same internally developed apps, but different project-level language
configurations. South migrations are app-level, and if you know django-
modeltranslation, you may guess where this is going.

Two of the sites (call them A and B) use Finnish and English, and one
of them (C) only uses Finnish. A is the master site against which the
main development is done, including migrations. The same migrations
apply cleanly on B, but fail on C.

The reason? Imagine a model called Product with a CharField called
"name" that is marked for translation. With django-modeltranslation's
dynamic field generation approach, Product has the fields "name",
"name_fi" and "name_en" on A and B, but just "name" and "name_fi" on
C. The migrations are done on A and therefore refer to "name_en",
which doesn't exist on C. South quite obviously doesn't like this, and
porting new stuff from A to C always means nasty hackery.

In our case, we could just have django-modeltranslation also create
"name_en" on C and just leave it empty for all model instances, but
that's beside the point: the problem is that with django-
modeltranslation, project-level language settings affect app-level
table schemas and therefore South migrations. This is bad for reusable
apps in general, and a proper model translation approach can't do
this. For the Product model, the translation data simply cannot live
as dynamically generated name_* fields in the appname_product database
table.


Regarding point 6: this is really hard. Good translation interfaces
are not trivial to create. One of django-modeltranslation's advantages
is that the translated fields are visible to the add/change view of a
model instance: "name_fi" and "name_en" are right there along with
"name". We've hacked a DOM-altering active language switching UI into
the change view using custom admin JS/CSS so that only one name field
is visible at a time, and it works OK. But if the translation data is
to live outside the main model table, a completely different approach
is needed. If Django is to be modified in a way to make translation
apps feasible, some sort of admin hooks for translation interfaces may
be necessary.

So that's the ideal, theoretical solution. More requirements for such
a beast probably exist, but those are the ones I could think of right
now.

The long-dormant django-modelinguistic is not anywhere near that. In
its current larval stage it achieves parts of goals 1, 2, and 3. This
post is already too long, but I'll describe the current approach and
an alternative that seems interesting but which I don't know how to
do.

Modelinguistic relies on an admin-like registration approach. It
creates language-specific copies of all the registered model classes,
replaces their managers (custom ones, too) with descriptors that can
retrieve correct language versions transparently. It also adds a
"callable descriptor" (a wrapper around a "manager factory" callable,
really), used like this: Animal.translated_objects('fi').get(...),
which gets you a Finnish Animal object regardless of what the active
language is. Animal.objects.get() would get you the active language
version transparently, as would Animal.my_custom_manager.get().

The translated model class copies and the original managers live in a
global translation registry dictionary keyed by the original model
class. Thanks to ModelBase metaclass magic, the type() invocation to
create the class copies register the new models in Django's app cache,
through which they can be seen by South, syncdb, sqlall etc.

In the database, the model copies live as suffixed extra tables.
animals_animal is the default English table, animals_animal_fi its
Finnish version that may or may not have translated data in it. All
the fields are copied, not just the translated ones, which is
wasteful, unfortunately.

So, if you do Animal.objects.get(pk=1) with Finnish active, you
actually get an Animal_fi instance, with all the untranslated field
data the same as in the Animal instance, but the translated field
data, well, translated. Yes, you need not even mention the problems of
writing and updating data across these table copies. I know.

That's django-modelinguistic right now. It's got a bunch of TDD
developed code that works in a very limited set of read-only
circumstances. I hate how hacky it is, and I hate not being capable of
making it better. I probably won't ever complete it, but if someone is
interested in the approach, I can publish the code somewhere for what
little it's worth as a jumping-off point. The good part is that it can
be dropped in with existing code and won't require model changes.


But.

Jacob mentioned the possibility of making changes to Django to make
model translation apps feasible. One thing that could *possibly*
enable a more elegant translation solution would be the ability of
inherited models to shadow the fields of their parents.

OneToOneField is almost there. I'd try and subclass it to allow for
shadowing, but the code of related fields is too complex and I don't
understand it. But I love how the OneToOne relation between, say,
auth.User and a Customer model that inherits from it enables
transparent access to User fields through a Customer instance.

Assuming the shadowing-enabled subclass of OneToOneField was called
ShadowingOneToOneField, something like this could happen:

--

>>> class Animal(models.Model):
... name = models.CharField(max_length=255)
... trinomial_name = models.CharField(max_length=255)

>>> class AnimalTranslationOptions(TranslationOptions):
... translated_fields = ('name',)

>>> register(Animal, AnimalTranslationOptions)

# The register() function living in the hypothetical translation app
# would create an in-memory model in the app cache that corresponds to
a model
# like this, represented in the database as the animals_animal_fi
table:
#
# class Animal_fi(models.Model):
# name = models.ShadowingOneToOneField(Animal)

>>> animal = Animal.objects.create(name='Dog',
... trinomial_name="Canis lupus familiaris")

# ... time passes, the Animal instance gets a Finnish and Swedish
translation
# for the "name" field, perhaps through a custom admin interface ...

>>> activate('en-us')
>>> animal = Animal.objects.get(name='Dog')
>>> animal.name
"Dog"
>>> activate('fi')
>>> animal.name
"Koira"
>>> activate('sv')
>>> animal.name
"Hund"
>>> animal.trinomial_name # not marked for translation, so not in Swedish here
"Canis lupus familiaris"
>>> from django.ponies import pony; pony.fly()
"Whee!"
--

There would need to be a lot of descriptor action or something going
on there so that "name" would resolve to either Animal.name,
Animal_fi.name or Animal_sv.name depending on the active language.

Sadly, I'm not sure if the South migration problem described earlier
is solvable with this approach, either.

Anyway, no need to pile on me calling me stupid for all the
shortcomings that my ideas inevitably have :-). Just throwing things
out there, maybe someone smarter will be inspired to create something
that actually works.

In a perfect world, databases wouldn't suck this much as a means of
holding a variable number of translated versions of a column's data.
Instead, a TRANSLATED_VARCHAR(255) column called "name" could have any
number of translations stored along with the default language, all of
which could be 255 characters long, and you could access them with
standard syntax: "SELECT `name` IN 'fi' FROM animals_animal WHERE
id=1;" or something, and the ORM could just work with that. One can
dream. Perhaps NoSQL databases and their Django backends will make
something like this possible one day.

- JK Laiho

JK Laiho

unread,
Aug 5, 2010, 7:43:07 AM8/5/10
to Django developers
Oops. A mistake here:

# class Animal_fi(models.Model):
# name = models.ShadowingOneToOneField(Animal)

The "name" field wouldn't be a ShadowingOneToOneField, but a CharField
like that in the original Animal model. We'd rather need a model
inheritance-like pointer field to be the ShadowO2O.

- JK Laiho

Beres Botond

unread,
Aug 5, 2010, 2:09:16 PM8/5/10
to Django developers
Hi JK,

Actually there is a model translation app which uses a very similar
approach to what you describe and already covers a good chunk of your
6 points.
A few months ago I needed to add dynamic translation to a fairly large
project, I looked into most of the existing model translation apps,
some of them mentioned in this thread,
but I didn't really like any of them. Then I accidentally stumbled
upon http://github.com/citylive/django-datatrans. I immediately took a
liking to it's approach and decided to use it.
It was pretty smooth to integrate, ran into a few issues but nothing
too serious.
Of course it's not perfect, it needs a test suite (!!) and it still
needs a lot of work to be considered "fully baked" but it looks
promising to me.

Botond


On Aug 5, 1:22 pm, JK Laiho <jkla...@iki.fi> wrote:

> That's django-modelinguistic right now. It's got a bunch of TDD
> developed code that works in a very limited set of read-only
> circumstances. I hate how hacky it is, and I hate not being capable of
> making it better. I probably won't ever complete it, but if someone is
> interested in the approach, I can publish the code somewhere for what
> little it's worth as a jumping-off point. The good part is that it can
> be dropped in with existing code and won't require model changes.
>
> But.
>
> Jacob mentioned the possibility of making changes to Django to make
> model translation apps feasible. One thing that could *possibly*
> enable a more elegant translation solution would be the ability of
> inherited models to shadow the fields of their parents.
>
> OneToOneField is almost there. I'd try and subclass it to allow for ...
>
> read more »

JK Laiho

unread,
Aug 6, 2010, 2:55:30 AM8/6/10
to Django developers
> Actually there is a model translation app which uses a very similar
> approach to what you describe and already covers a good chunk of your
> 6 points.

Huh. That's interesting, I hadn't heard of that. Will take a look.
Thanks!

- JK Laiho

JK Laiho

unread,
Aug 6, 2010, 5:01:22 AM8/6/10
to Django developers
On preview, django-datatrans looks really good, and the approach is
certainly better than any of the existing implementations, including
my abortive one. Still need to give it a run for its money properly to
see what issues remain. Whatever they are, they're probably solvable.
I'm not a betting man, but I'd be surprised if that didn't grow into
the de facto model translation approach in time.

I'm just glad I don't have to think about the model translation
problem anymore, I was exhausted just writing that monster post
yesterday :-)

- JK Laiho

Beres Botond

unread,
Aug 6, 2010, 9:17:52 AM8/6/10
to Django developers
Your monster post was very useful, gave me a few ideas. Also it's a
good set of requirements to compare against for things that should be
fulfilled by a potential de facto model translation approach/app.
Combined with the excellent wiki article mentioned in this thread
(http://code.djangoproject.com/wiki/ModelInterNationalization) it's a
good basis for evaluating existing model translation apps, to see what
is missing or "wrong".
Regarding django-datatrans, I've been planning to take a more serious
look at it ever since I started using it. This thread might just be
the "kick in the butt" needed, we'll see :).

A few issues I'm aware of:
- _pre_save handler can cause problems on concurrent inserts
- integration with Django admin (or any other functionality that
presents the "translatable" fields in an editable context in the
frontend - for ex. in a form text input) - the editable widget will
get filled with the translated string and if you save it, the
*original* will get overwritten. On the "admin" of datatrans, there is
no problem, since it works directly with KeyValue-s. This is usually
not a problem on frontend because in most cases it makes no sense to
have translated content for values which are editable in frontend by
users. And admin is a controlled environment, so it's not a
"showstopper" (at least in my experience so far). But a transparent
solution is definitely needed.

In any case I don't think we'll see any model translation app in
django.contrib any time in the foreseeable future, taking for example
South as a reference on it's progress in this regard (considering it
is miles ahead in terms of maturity/stability/usage etc. than any
modeltrans app and it has pretty much achieved it's de facto status in
it's problem domain).
I'm not saying this is a bad thing, just an observation of facts. As
long as a 3rd party app is well known, easy to integrate with existing
projects, it's stable/mature, well maintained etc., I don't think it
makes very much difference if it is actually within django.contrib or
not.

Botond

Jef Geskens

unread,
Aug 6, 2010, 10:06:07 AM8/6/10
to Django developers
Hi

I am the original developer of django-datatrans, and Gert Van Gool
(gvangool) also contributes some code.
Datatrans has been developed for MobileVikings.com, a cellphone
operator (MVNO) in Belgium that I work for.
Yes, our entire business depends on Python/Django, even the backend
calls to our network supplier,
customer support tools, ...

We needed a simple way to translate models without messing with
existing infrastructure.
I took concepts from different existing model translation apps like
django-multilingual, django-modeltranslation, etc.
Some of them looked very promising but none of them were really
usable. I like the registration approach of django-multilingual but
it changes the structure of your database. As a telco operator, we
cannot effort this risk of changing the database structure of
live, constantly changing data.
We needed some simple, easy-deployable and existing data friendly and
backwards compatible app. This is why I came up with a
system where we have a single dictionary containing all the
translations of our data.

Thanks for all the attention to this app, and yes it needs some kick
in the butt indeed.

Beres, to answer some of your concerns:

> Regarding django-datatrans, I've been planning to take a more serious
> look at it ever since I started using it. This thread might just be
> the "kick in the butt" needed, we'll see :).

That's why I put it on github, so others can have a look at it and
contribute to it.
We find it very useful at our company and, although it lacks some
serious tests (working on it...),
but we can safely ensure it is stable, as our business depends on it.

>
> A few issues I'm aware of:
>  -  _pre_save handler can cause problems on concurrent inserts

Yes, I'm looking for an alternative way to handle this. Maybe some
sort of locking mechanism will help...

>  -  integration with Django admin (or any other functionality that
> presents the "translatable" fields in an editable context in the
> frontend - for ex. in a form text input) - the editable widget will
> get filled with the translated string and if you save it, the
> *original* will get overwritten.

This has been, believe it or not, taken care of ;-)
It may sound confusing at first, but Datatrans is smart enough to see
that,
when you are not in the original language, it actually modifies the
_translation_, and not the original data.
So when your current language is Dutch, when you edit something that
has been datatransed, you edit the Dutch value.
When your current language is English, and it is the default, you edit
the original data, and it marks the translation as fuzzy
so translator can verify if the translation is still valid, in the
same fashion as poedit.


I will follow this discussion closely from now on.

-Jef Geskens
Software Engineer at MobileVikings.com

David Danier

unread,
Aug 7, 2010, 4:34:06 PM8/7/10
to django-d...@googlegroups.com
Hi all,

sorry if this gets very long, but I will try to write down my current
opinion and experience with so called "model translation". I have put up
multiple sites using translatable content and I have written some apps
to help me doing so (none of which are public so far, as I hated my
first approaches after a few days/weeks and I'm not sure about these
apps so far). Anyway I tried to focus on some kind of 80%-solution,
which does only work for some (most) cases, but only needed 20% of the
work (http://en.wikipedia.org/wiki/Pareto_principle). I'll write a
little about these solutions at the end, but it is not be the focus of
this email.

Currently I think there are two completely different approaches for
doing model translations:

1. You have an object for each language. This object contains some
language-attribute which makes it easy to filter stuff. Admin and views
are no-brainers.

Of course this solution is only suitable for some special cases. News
for example might only exist in one language, so its perfectly fine (and
even preferred) to have different content for each language.

Language switching might become a problem, if you want to link to the
current object in a different language. Fixing this can be done by
adding a some kind of group-model, to group all the translations of one
object into one translation group. This can even be done using generic
foreign keys, which makes this an easy and reusable solution. I created
one generic app for this, but so far this is not public and I'm pretty
sure it missed many things.

Of course having one object for each language will become nasty if you
need common data to be equal for each object translations. You could
sync this, but...

2. Having some kind of common data, which needs to be equal for every
translation should really be solved inside the database. There exist
many solutions for this, which all fix some problems and create new. The
three most common solutions use language-suffixes, putting translations
into separate models or using some dict/pickle approach.

First I will try to write down what I think is important for providing a
solid solution.

a) Getting out of the users way

If I want to fetch some object I don't want to care about translations.
This is even true if I need to filter/order by some translated
attribute. I don't want to write stuff like (cur_lang ==
translation.get_language()):
Entry.objects.get(**{'slug_%s' % cur_lang: 'something'})
or:
Entry.objects.get(common__language=cur_lang, common__slug='something')
What I want is the plain old:
Entry.objects.get(slug='something')
or perhaps:
Entry.objects.localize().get(slug='something')

The same thing is true for accessing the attributes, but most approached
solve this, so no need to bring this up.

btw dict/pickle solutions fail to provide access to the data in the
query, regardless of how hard you try. So they fail big here.

b) The should not be too much overhead involved

You currently can choose between loading all translations, needing an
additional SQL query or unpickling some data. None of this is ideal. I
personally think a JOIN could be acceptable, but of course this also is
some overhead.

c) It should allow (not support) special cases

Sometime you need strange things like some field is optional in one
language while being needed in some other language. There might even be
fields that do not need to exist at all for one out of 10 languages. The
needs might of course be much simpler. As this cases are somewhat
esoteric they should not be a show-stopper for model translations. But
having heard about this might prevent some solution being to might
tightened up.

This btw is one thing about the "put all common data into it's own model
and JOIN away" I don't like. All common data needs to follow the same
rules, this may not be possible in all cases.

d) Managing languages should be easy

I don't think this needs to be the huge problem everybody likes to call
it. For me south solves this pretty well. If we get something like south
to be in core or the so called "official solution" managing changes in
translations becomes easy.

e) One might add it should be possible to add translations for third
party apps or create translations for your apps without changing the
basis. I think this is only part true.

As this adds a ton of new dependencies and side-effect I personally
think you should be able to do something to use translateable models. Of
course the changes you need should be as minimal as possible.

Third-party-apps are a special case as you probably cannot maintain your
own copy. But I think really thinking avout third-party-apps should be
done when a solid solution is ready. Trying to solve all problems at
once just makes you go crazy. (Of course keeping third-party-apps in
mind is preferred)

e) Model relations should not become to nasty. Creating a translatable
Many2Many/ForeignKey inside your translations will get ugly with most
solutions. I think currently this is only easy when adding all fields to
one model (suffix). But reverse relations will suck this way. Usecase:
Every translations needs different tags.

=> So this only leaves adding fields with language suffixes, at least if
we look at what Django currently provides (I'll write down some things
about this later). It is easy, fast enough and yes, it will eat your
RAM.

So, about the other solutions: dict/pickle fails for using the fields
inside queries, so for me they are out of question. I think many people
use them (I like to call this the "gettext" approach), but I certainly
do not see any point in this. Of course it might probably be handy when
translating legacy apps.

What about "putting common data into its own model"? I like this
solution, I even like this solution so much I tried to implement it
several times. BUT you cannot get it to use a nice query. Most of the
time you will need to fetch the translation inside an separate query as
select_related() cannot fetch the translation even if the JOIN is unique
(qs.filter(common__language='xx') will create a unique JOIN). This
certainly could be improved.

Of course there's the thing about "getting into my way", which currently
every implementation of using multiple models currently has. I don't
think we should need to think about the different models here. Actually
model inheritance solves this, so perhaps the best approach might be to
get model inheritance more generic, so it could be used for other
things, too. Allowing users to define their own JOINs while keeping all
attributes inside the same object and not needing to do something
special inside queries is definitely a nice feature (and there might be
more use-cases for this, versioning models for example).

SQL could be something like:
SELECT ... FROM entry OUTER JOIN entry_trans ON
(entry.id=entry_trans.entry_id AND entry_trans.language='en') WHERE ...

I don't know the Django internals enough, but if this could be done
externals model translation should be possible without much hassle.


Other Django enhancements:

Add some LanguageField! Why not add some Language field? This should be
pretty easy. Currently I use some field in every project which basicly
only is a CharField with predefined max_length. This would certainly
make things more easy and allow multiple (third-party-)apps to share
some generics.

Virtual fields? Adding support for some kind of virtual fields might
enhance things. This just came into my mind, so it might be wrong.

Extendible QuerySets: I prefer to put new filters into QuerySets (and
adding an Manager for each new method), so I can choose to use
Entry.localized.all() or Entry.objects.localize() how I want. Adding
there methods to the QuerySet also allows to use it with related
managers (User.entries.localize()), which really is great. But having
some Manager for every possible QuerySet while allowing stacking of
QuerySets gets complicated fast. This probably only is true if you need
to add parameters to your Manager which get passed to the QuerySet.


Further problems:

Language selection: This is about how Django detects the user language
and how the user is able to select the language. Django could provide
more defaults here, which might be detecting the language based on the
request path, request domain or some other practical informations. I
currently use the request path for translations.

Only having the option to change language by cookie is bad for most
cases. Every public site needs to provide different URLs, so people can
link to one translation, search engines can crawl all translations, ...

...which brings me to i18n URLs. I currently have an urls.py for every
translation and use something like {% url foo_bar language=... %}. This
could certainly be improved, I think.


My solution(s):

Currently I have two apps, which help me do translations.

The first one allows me to group translations together, this is only
useful when having different content for each language (->
language-attribute). The app itself is pretty easy, but helps me get
translations organized (admin integration) and enhance the user
experience (language links go directly to translations).

The second app is my solution to adding language-suffix-fields to a
model. It is as simple as it gets, by not providing any help adding
these fields, you have to define all fields yourself (which is useful,
as all fields may have different options and even type). The app
provides a class to implement the "access the right attribute" glue
(name_en = CharField(...)\n name = I18NAttribute()).

In addition I have developed a QuerySet which provides a
localize()-method, that does:
* If the model has some language-field it just returns
filter(language=cur_lang)
* If you have I18NAttributes inside your model it will rewrite
calls to filter/order_by() to use the right field:
filter(name='...') -> filter(name_xx='...')
filter(name__contains='...') -> filter(name_xx__contains='...')
order_by('name') -> order_by('name_xx')
...models.Q-filters do not work of course

These apps are as simple as I could implement them, but they both helped
me a lot more than any other full blown solution. This is why I think we
should create better tools for doing such things inside Django instead
of trying to provide a solution to solve everything.

I hope I haven't missed something essential. Model translations really
touches most of the parts of Django (urls.py, QuerySet, views and of
course models). I intentionally have left out some aspects, because they
are not relevant to most users (for example translated content and full
text search (haystack)).


Thanks for reading this far,
David Danier

Anssi Kaariainen

unread,
Aug 7, 2010, 7:38:31 PM8/7/10
to Django developers
On Aug 7, 11:34 pm, David Danier <goliath.mailingl...@gmx.de> wrote:

> What about "putting common data into its own model"? I like this
> solution, I even like this solution so much I tried to implement it
> several times. BUT you cannot get it to use a nice query. Most of the
> time you will need to fetch the translation inside an separate query as
> select_related() cannot fetch the translation even if the JOIN is unique
> (qs.filter(common__language='xx') will create a unique JOIN). This
> certainly could be improved.
*SNIP*
> SQL could be something like:
> SELECT ... FROM entry OUTER JOIN entry_trans ON
> (entry.id=entry_trans.entry_id AND entry_trans.language='en') WHERE ...
>
> I don't know the Django internals enough, but if this could be done
> externals model translation should be possible without much hassle.

I have hacked (for learning purposes) the Django queryset to allow
using of arbitrary joins, the API was something like this:
qs.join(qs_or_model, allow_null=False, additional_filters). If
allow_null is set to true then the query will generate left outer
join, otherwise it will generate normal join. The additional filters
will be applied to the join condition, not to where condition. The
qs_or_model could be reverse related model, or just a queryset which
will be joined using the additional_filters.

I don't know if I still have the code somewhere lying around, but as I
said it was a ugly hack that was just barely working for the easy
cases. There are plenty of corner cases, so doing it properly isn't an
easy task. However it was surprisingly easy to write the proof of
concept join method once you got the hang of the sql.query stuff. It
is well written but to my taste not sufficiently commented.

As to the model translations in core, I am of the opinion that it
shouldn't be done. There are too many different requirements. For
example I live in a country where there are two official languages, so
translations inside the table are perfectly sensible in this setting.
But we also create software which needs to be usable EU wide, which
means over ten languages easily. That means a complex model would have
somewhere around 200 db columns, which is not acceptable.

- Anssi

stefanw

unread,
Aug 8, 2010, 4:58:59 AM8/8/10
to Django developers
Hi,

> 1. You have an object for each language. This object contains some
> language-attribute which makes it easy to filter stuff. Admin and views
> are no-brainers.

I think this is the right approach. A sample API is open for
discussion here:
http://code.djangoproject.com/wiki/ModelInterNationalization#Multilingualmodelwithoneobjectperlanguage4

Two points against other approaches:
1. In my opinion JOINs and serialization are a no-go for just querying
a translatable model. Of course you can cache results of JOINs, but
then you could just use a different solution.

2. The model has to scale with the number of languages. Sure, there's
a finite number of languages and your project might only need three of
them, but others may need 50. Having a db table for each language or
having a column for each translatable value for each language is
becoming a mess at the db level, even if it is abstracted nicely
behind south and magic. There are use cases where the db must be
accessible to other non-Django applications and good db design is
important for that.

I know that hacking that stuff in Django is fun and it might also
work, but have a look at it from the non-abstracted site. It's not
nice.

Adding a language field and a grouping id for same objects in
different languages to a model solves many problems.
1. Queries stay easy.
2. Object must not exist in language.
3. Falling back to other languages if object doesn't in desired
language is easy.
4. Database stays easy

Disadvantage: duplicated non-translatable fields across rows. This is
not nice. However, syncing is still a no-brainer and scales with the
number of languages, since you can update all non-translatable fields
for the same objects in one query: UPDATE stuff WHERE group_id=123.

Think of it as denormalization, something we do all the time anyways.
This duplication can be abstracted away and the database still stays
clean.

There are definitely use cases where another approach might be faster
(only three languages, one model). But we should go for the general
use cases, with many languages.

Concerning third party apps and their models
These apps might add data to objects of my tables via Generic Foreign
Keys. This isn't a problem for translation, because the third-party
object "inherits" the language from my translated object.

Concerning Ease of Integration
I don't think that you shouldn't have to modify code in order for
translation to work. A registration approach is nice, but when you
build in translation of models as an afterthought there are many
things you need to change anyways. Plug-and-Play can't really be a
goal, you never know if it really works with a third party app.

A translatable model is a design decision. It should be really easy to
integrate a solution (like inheriting your models from e.g.
TranslationModel), but code changes are probably unavoidable when you
decide for translatable models too late in the development process.

If you have other ideas/apps, please contribute to the wiki page,
because this thread will end some time and will then be lost in the
mailing list archives.

Django Wiki on Model Internationalization:
http://code.djangoproject.com/wiki/ModelInterNationalization

Getting the ideas in a structured way, listing solutions and even
linking to this thread, saves us from having this discussion over and
over again.

Cheers
Stefan

Anssi Kaariainen

unread,
Aug 9, 2010, 7:41:08 AM8/9/10
to Django developers


On Aug 8, 2:38 am, Anssi Kaariainen <akaar...@cc.hut.fi> wrote:

> I don't know if I still have the code somewhere lying around, but as I
> said it was a ugly hack that was just barely working for the easy
> cases.

If somebody is interested there is a patch available at
http://users.tkk.fi/akaariai/join.diff

Sorry, no tests, no documentation...

A sample project is available at http://users.tkk.fi/akaariai/left_join.tar.bz2

There is one example query in tester/models.py. The API is slightly
verbose... I said it was an ugly hack ;)

- Anssi

hejsan

unread,
Aug 17, 2010, 12:05:20 PM8/17/10
to Django developers
How about this:
The django main code implements a KISS version of model translation
that works for most cases.

It will still be possible for others that need more sophisticated ways
to use one of the plethora of third party apps.

This is in line with many of the contrib apps, for example most people
can make flatpages work for them, but many prefer to use django-cms or
django-page-cms or whatever to get every possible bell and whistle.

Do not forget, to quote the djangoproject.com contrib app site:
Django aims to follow Python’s “batteries included” philosophy. It
ships with a variety of extra, optional tools that solve common Web-
development problems.

And to further rub that point in, on the frontpage of
djangoproject.com Internationalization is boasted as a main feature:
"Django has full support for multi-language applications, letting you
specify translation strings and providing hooks for language-specific
functionality."

Currently this is, as I see it, false and I felt rather let down after
I started learning django and found out that in reality only static
text could be translated.

For most countries other than the English speaking ones this is one of
the most important features.

My proposal:
I say we go with the "Multilingual model with one object per language"
way, as stefanw proposed above, mainly because it seems to be the
simplest one to adapt to the ORM's query language, and I don't see why
it wouldn't comply with cases 1, 2, 3, 4 and 6 of JK Laiho's
considerations. As for number 5, working well with South, I have no
idea if it does, but since South's only purpose in life is to migrate
Django apps, then I guess South would simply have to adapt to changes
in the django core.

I propose a setting be added, in line with USE_I18N and USE_L10N:
USE_MODEL_I18N # false by default

And another tuple setting specifying which models should be
translated:
I18N_MODELS = (
"myblog.entry",
"django.contrib.flatpages",
...
)

If USE_MODEL_I18N is true then django.db.models.Model adds two fields:
language and group_id. I don't think I care if it adds these fields to
all models in the project or just the ones specified by I18N_MODELS.

Admin support and widgetry can be added gradually because off the
shelf you just create a new object instance to create a new
translation. (Keep It Simple Stupid) and filtering in the admin is
trivial.

The only blazingly obvious caveat I can see is that
modelname.objects.all() returns every possible language. This means
for example that the generic views in their current form list all
languages.
To solve this, either change the functionality of
modelname.objects.all() to return only objects from current locale if
USE_MODEL_I18N is True or add a localized() function as David Danier
mentions above:

If a modelname.objects.localized() filtering function is added then
backwards compatibility becomes trivial, by just replacing for example
{% for entry in articles %}
with
{% for entry in articles.localized %}

in your templates. The localized function should probably do nothing
if USE_MODEL_I18N is False.


I haven't seen django-datatrans before. Does it play well with the
Django ORM querying style? The other registration based apps don't.


- hejsan

Jef Geskens

unread,
Aug 19, 2010, 5:20:08 AM8/19/10
to Django developers
> I haven't seen django-datatrans before. Does it play well with the
> Django ORM querying style? The other registration based apps don't.
>
> - hejsan

It does nothing at the ORM layer, it only comes in action on the model
instances, when accessing their fields.

django-datatrans was kept very simple: it mimics the behavior of the
gettext po-files, like the i18n app from django. It looks for
translatable strings, puts them in a lookup table, that can in turn be
easily translated. Instead of putting {% trans "" %} around all your
strings, you say "this model contains these translatable fields", and
datatrans scrapes the contents of those fields and make them
translatable. In fact, our first model translation solution was a
script that scraped the model content and put all translatable strings
in a .txt file with {% blocktrans %} around them.

Creating this app proved more efficient to our company than
restructuring and possibly rewriting our apps as well as third party
apps. The only disadvantage is the extra lookup, but by using django's
caching mechanism we try to overcome that inefficiency. Django's i18n
app and gettext also have to lookup and cache translations. I don't
see the difference. Our ISP website, mobilevikings.com, is required to
be completely multilingual, in three languages. In the releasing
process of a new feature, we have a translators team that use the
included admin tool from datatrans intensively.

I think that if you want to create a website with 30 or more
languages, the approach of one-instance-for-each-language is more
usable indeed. But if you have an huge, live system with a lot of apps
that work together, and only need to support a handful of languages,
django-datatrans might be a good option.

I suggest you take a look at the source, especially here:

http://github.com/citylive/django-datatrans/blob/master/datatrans/utils.py

Mathieu Leduc-Hamel

unread,
Aug 19, 2010, 5:44:06 AM8/19/10
to django-developers
Hi jef,

I really the approach you took with your datatrans application. Do you have an example application of your solution. I would like to implement in our solution but i need to convince my colleagues ! ;-) 

Does that mean that with your solution, these translated fields will be in the general .po file of your project and will be translated like any other static fields ?


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.


Mathieu Leduc-Hamel

unread,
Aug 19, 2010, 5:46:48 AM8/19/10
to django-developers
Oh no I think i didn't understand correctly your points.

You created a lookup table, then it means you'll to translate the differents fields values in one specific section of the admin backend of django which contain every translatable fields of your project ?

Jef Geskens

unread,
Aug 20, 2010, 2:50:55 AM8/20/10
to Django developers

On Aug 19, 11:46 am, Mathieu Leduc-Hamel <marra...@gmail.com> wrote:

> I really the approach you took with your datatrans application. Do you have
> an example application of your solution. I would like to implement in our
> solution but i need to convince my colleagues ! ;-)

You can see a live application running at http://mobilevikings.com,
information like price plans is translated by datatrans.

For example code, there is only the registration which is illustrated
in the README file.

Here are some screenshots of the admin module:

http://dl.dropbox.com/u/634220/datatrans-overview.png

and

http://dl.dropbox.com/u/634220/datatrans-detail.png


> Oh no I think i didn't understand correctly your points.
>
> You created a lookup table, then it means you'll to translate the differents
> fields values in one specific section of the admin backend of django which
> contain every translatable fields of your project ?
>

Yes. In fact, because datatrans is meant to be transparent, you can
even translate in the default Django admin itself, provided that you
can easily switch languages from within the admin. Just override its
base template. If your current language (say English) is the default,
you just change the original strings and the translations are marked
as fuzzy. If your current language is not the default, for example
German, the translation will be modified.

I think that a management command for just generating a .po file from
your registered models is also a nice option, especially when
translating .po files is already a part of the release procedure. It's
also a nice fallback, and proven infrastructure. We're considering
adding support for that in datatrans.
Reply all
Reply to author
Forward
0 new messages