Multilingual content

15 views
Skip to first unread message

Piotr Majewski

unread,
Jul 27, 2008, 12:45:02 PM7/27/08
to Django developers
Hi,
I found django-multilingual and transdb very usefull, but those
projects don't fully integrate with django.
I was thinking and i found a way to introduce multilingual content to
django with very few lines of code and ingerention into django.db.
Actualy all it needs is to add a boolean multilingual flag, if this
flag it is set to True then the field will be "multilingual".

The desighn is very simple.
having a list of languages and language codes we can create extra
colums for "Translation" data.

instead of having:
id, name
we would have
id, name_en, name_de , name_fr.

the Api would look like (default language english):

class Article(models.Model):
body = models.TextField(multilingual=True)


>>> a = Article.objects.create(body='I am a berliner', body_de='Ich bin ein berliner')
>>> a.body
u'I am a berline'
>>> a.body_de
u'Ich bin ein berliner'


this can be done by:
>adding the multilingual flag to Field.__init__() kwargs
>modifying a bit Field.contribute_to_class()
>in contribute_to_class would have to check if multilingual is set to true, if not resume oldway, if yes cls._meta.add_field(FIELD) for each translation FIELD (the list of the field will be generated dynamicly, onlything that differs is the name of the field and collumn that the field is regarding to
>next thing is adding a field that will refer to current language.


this way no super hard modification is needed.

onlything that has to be considered is admin layout.

About the Design.

I'm fully aware of the fact that models with many multilingual fields,
this and multiple languages will generate many columns, but:

>myslq:There is a hard limit of 4096 columns per table
>postgresql: Maximum number of columns in a table? : 1600
>sqlite: (don't know)

Also the fact of adding a language will need altering table, but that
can be done by simple ALTER sql or basic python script.

fetching the bigtables can be inproductive: We don't have to fetch all
languages ("Translations"), this way we fetch the same ammount of data
if the model were singlelingual.

+ for performance : Instead of having extra "translation" table (the
way django-multilingual works right now) and having to do Inner joins,
you have the same data without that stuff, faster.

You are allowed to fetch data from other languages than current by
adding _LANGCODE suffix.

no extra hooks are required to do find(name) order_by(name_en) etc..


I think that this is so simple that it can be easily integrated to
django.db

What are your ideas?
Django developers have to discuss "multilingual" content issue becouse
in the near future it will be vital for projects like django.

Kind regards,

Piotr Majewski

alex....@gmail.com

unread,
Jul 27, 2008, 2:43:02 PM7/27/08
to Django developers
Amongst other design issues, which I will write up in full, another
just occurred to me. With the current django model scheme, you would
need to create every field immediatly, which means either django needs
to create something like 50 fields, or the user needs to specify which
languages the field should be available in. This is because django
itself does not handle altering tables at present, and I don't think
we want a solution that implicitly requires the user to do something
that django itself doesn't handle.

Piotr Majewski

unread,
Jul 27, 2008, 3:40:30 PM7/27/08
to Django developers
Adding a language is the same issue like adding a field in a Model,
django will not handle both of them.
and moreover how often do you add a language to website? very rarely
What do you mean by :
"With the current django model scheme, you would
need to create every field immediatly, which means either django needs
to create something like 50 fields, or the user needs to specify which
languages the field should be available in."
?

On 27 Lip, 20:43, "alex.gay...@gmail.com" <alex.gay...@gmail.com>
wrote:

alex....@gmail.com

unread,
Jul 27, 2008, 4:23:39 PM7/27/08
to Django developers
Ok, I'm an idiot ignore that post, I assume by default it would create
fields for whatever you have have in settings.LANGUAGES?

Piotr Majewski

unread,
Jul 27, 2008, 5:13:14 PM7/27/08
to Django developers
Yes,

for example if you have
name = models.TextField(multilingual=True)

and in settings.py list or tuple of aviable languages
('de','en','fr','ru','pl')

it will create in the Model table columns :

name_de
name_en
name_fr
name_ru
name_pl

On 27 Lip, 22:23, "alex.gay...@gmail.com" <alex.gay...@gmail.com>
wrote:
> > > > Piotr Majewski- Ukryj cytowany tekst -
>
> - Pokaż cytowany tekst -

Malcolm Tredinnick

unread,
Jul 27, 2008, 9:27:25 PM7/27/08
to django-d...@googlegroups.com

On Sun, 2008-07-27 at 14:13 -0700, Piotr Majewski wrote:
> Yes,
>
> for example if you have
> name = models.TextField(multilingual=True)
>
> and in settings.py list or tuple of aviable languages
> ('de','en','fr','ru','pl')
>
> it will create in the Model table columns :
>
> name_de
> name_en
> name_fr
> name_ru
> name_pl

So if I have an application that is only going to be translated into a
couple of languages, it's still going to create 100 columns in the
database when Django's settings contains 100 locales? That's not very
nice.

This plan also means that every single time we add a new language to,
e.g, Django, or some open source application accepts a new translation,
you need to alter the database. You're basically denormalising the
database here in a very explicit fashion. It's exactly equivalent to
trying to store an array in the database by enumerating all the columns
and these sorts of problems are the reason that isn't a good idea. The
one-many approach used by the current packages at the database level is
a lot more maintainable in this respect.

Regards,
Malcolm


alex....@gmail.com

unread,
Jul 27, 2008, 9:34:29 PM7/27/08
to Django developers
Here is my opinion of how it should look, because IMO that is too
magical(if not actual magic than at least implicit, it is doing
something you aren't telling it to do). http://dpaste.com/67798/ I'm
not sure how the performance of this would be, since it either
requires some joins or a query for each translation you try to get.

On Jul 27, 8:27 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
> On Sun, 2008-07-27 at 14:13 -0700, Piotr Majewski wrote:
> > Yes,
>
> > for example if you have
> > name = models.TextField(multilingual=True)
>
> > and in settings.py list or tuple of aviable languages
> > http://dpaste.com/67798/('de','en','fr','ru','pl')

David Cramer

unread,
Jul 28, 2008, 12:31:22 AM7/28/08
to Django developers
Taking from Curse's setup, we used simply inserted one row per
language. There's a pretty big difference though in localization and
internationalization. This solution seems to focus on localization,
where everything get's translated.

I believe that, for the solution above, storing them how it does is
good, but it'd make more sense to just provide a LocalizedTextField,
and LocalizedCharField solution as it's less magical and doesnt
require any crazy stuff.

For internationalization, however, it makes more sense to just have a
language field, which is a key in settings.LANGUAGES (e.g. enUS or
something), because you're most likely not going to be translating
each field, but rather provide a manager which gets rows for the
current users language (use threadlocals).

On Jul 27, 8:34 pm, "alex.gay...@gmail.com" <alex.gay...@gmail.com>
wrote:
> Here is my opinion of how it should look, because IMO that is too
> magical(if not actual magic than at least implicit, it is doing
> something you aren't telling it to do).http://dpaste.com/67798/ I'm

David Cramer

unread,
Jul 28, 2008, 12:32:12 AM7/28/08
to Django developers
Oh, and a comment in regards to Alex's response. It's a very good
solution for a true relational db, but doing things like this can
cause massive slowdowns if you have a lot of content. The table sizes
gets to be outrageous and joins are impossible on it :)

On Jul 27, 8:34 pm, "alex.gay...@gmail.com" <alex.gay...@gmail.com>
wrote:
> Here is my opinion of how it should look, because IMO that is too
> magical(if not actual magic than at least implicit, it is doing
> something you aren't telling it to do).http://dpaste.com/67798/ I'm

Piotr Majewski

unread,
Jul 28, 2008, 2:00:22 AM7/28/08
to Django developers
No,
I think no website will have 100 languages support. generaly websites
use about 2-4 languages max. there should be a tuple or a list in
settings.py telling what languages to enable.
this will create 4 cols per field. and again, i see adding a language
same as modifying the model, both need custom sql.

Piotr Majewski

unread,
Jul 28, 2008, 2:31:58 AM7/28/08
to Django developers
Is there any decision made on the design of multilingual content
support by django-dev ?

and Malcolm, using nomalized tables isn't so bad, and imho in this
case it is a good solution, let me tell you why.

+generaly websites will have 2 languages supported, even websites
booking.com have maximum 12 languages support, that does not create us
100 unused columns as you said.
+denormalizing will bring great performance bonus.
+no super hard hacks needed to implement this to django.
+adding a language isn't a thing that will be done very often, in fact
i think that most of the websites will not change their list of
supported languages.

now, why custom sql (altering) is good when you modify model, and
custom sql is bad when you add a language? for me is the same.

the way Alex presented can't be curently implemented into django.
(look into django.db code)

Malcolm Tredinnick

unread,
Jul 28, 2008, 1:23:46 PM7/28/08
to django-d...@googlegroups.com

On Sun, 2008-07-27 at 23:00 -0700, Piotr Majewski wrote:
> No,
> I think no website will have 100 languages support. generaly websites
> use about 2-4 languages max. there should be a tuple or a list in
> settings.py telling what languages to enable.
> this will create 4 cols per field. and again, i see adding a language
> same as modifying the model, both need custom sql.

This shouldn't be necessary. You're imposing too high a burden on the
users. Whilst you are obviously comfortable using this for yourself,
it's not an approach that should be incorporated into Django. We like to
use the "relational" part of relational databases.

Your situation immediately runs into problems with any distributed
application. Person A releases Rocket-Powered-App-1.0 and Person B
installs it and wants to use it in their website that supports 5
languages. Now they have to alter some of the models of
Rocket-Powered-App to install four or five extra columns, which
completely cuts across the ability to use unaltered upstream code and
synchronise with the released version from time to time. Databases are
*designed* to store multi-valued content in related tables. Ignoring
that feature is saying you don't want to use a relational database. You
might as well store your content in flat files.

Regards,
Malcolm


Piotr Majewski

unread,
Jul 28, 2008, 1:35:52 PM7/28/08
to Django developers
So maybe this approach is bad, what do you propose?
there are 2 things that need to be discused.

1. API
2. Storage

1. i think that currently it is imposible to add multilingual flag to
a field,
imho there should be a seperate "translation" class with FK to parent
object, but there should be also an API that would enable refering to
translation object by:

Article.title
Article.title['en'] or Article.title_en

curently we can create table and use:

Article.translations[0].title

I also was thinking of creating "reference field" or sth, that this
field would refer to example custom sql, or extra join or just other
field in related object.

2. As Malcolm said before, column storing is bad, so the only way is
to keep translation in relational db.
What should be the scheme of the table?

Manuel Saelices

unread,
Jul 28, 2008, 2:25:52 PM7/28/08
to Django developers
First, sorry for my bad english...

I think usually users doesnt want to "install" any language. In webs,
languages will be supported are usually defined in design time. Of
course, in production customer may add more languages, but is the same
case that adding more fields to a model. Only with an alter table is
enough, like in adding fields case.

Also, with precondition of pre-estabished languages, putting in table
columns does not break db normalization because all language data
*refers* to a single object. For example takes not sense for other
model to have a foreign key only to english version of a object. The
link will be to object itself. If you have a requirement to point a
model only for a language version of another model, you will have to
use a related table to keep normalization.

But in the other hand, I think that using a related table to store
could be catastrophic for scaling point of view. Think in a table with
100.000 objects in 5 languages. *Every* SQL query will have to join a
100.000 objects table with a 500.000 objects table. This is
unacceptable (of course you can denormalizate, but denormalization
implies create again 5 columns in main table). Of course if you will
deploy a small web... using a related table have the advantage of no
predefined language limit.

And for keep info about transdb and django-multilingual:

* django-multilingual store translation data in related tables.
* transdb uses a "mix" aproach, for simplicity. This store only
translated char and text field, using a dictionary in one column:
{'es': 'Mi noticia', 'en': 'My news item'}
* i don't know for any django app that creates columns for store
data.

Regards,
Manuel Saelices

>
> Regards,
> Malcolm

Piotr Majewski

unread,
Jul 28, 2008, 6:44:19 PM7/28/08
to Django developers
Still I think that better "column storing" and uncomfortable adding
languages than nothing. especialy that my idea isn't so hard to
implement and does not need any changes in django.

Joost Cassee

unread,
Jul 28, 2008, 7:42:53 PM7/28/08
to Django developers
On 28 jul, 19:23, Malcolm Tredinnick <malc...@pointy-stick.com> wrote:

> Your situation immediately runs into problems with any distributed
> application. Person A releases Rocket-Powered-App-1.0 and Person B
> installs it and wants to use it in their website that supports 5
> languages. Now they have to alter some of the models of
> Rocket-Powered-App to install four or five extra columns, which
> completely cuts across the ability to use unaltered upstream code and
> synchronise with the released version from time to time. Databases are
> *designed* to store multi-valued content in related tables. Ignoring
> that feature is saying you don't want to use a relational database. You
> might as well store your content in flat files.

Just to make one point clear: in Piotr's proposal the list of
languages would not be stored 'implicitly' in the model. He means the
columns would be dynamically generated during syncdb, using
settings.LANGUAGES. Changing the language list would require a
database schema change, though. Depending on the implementation it
could be something that django_evolution would handle automatically.
So the specific problem you mention above would not exist, but there
are related issues.

I'd also like to link to the current discussion on similar issues on
the django-multilingual mailinglist:
http://groups.google.com/group/django-multilingual/browse_thread/thread/afd7c37efff7dd4f


Regards,

Joost
Reply all
Reply to author
Forward
0 new messages