[play x] advice on how to handle error messages internationalization whit words that need inflection

151 views
Skip to first unread message

sas

unread,
Oct 17, 2012, 11:00:12 AM10/17/12
to play-fr...@googlegroups.com
This is not strictly play related, but I guess many of you may hava had the same trouble

I'm trying to internationalize error messages like this:

validate.empty={0} not specified.
validate.duplicate=There already exists a {0} with the {1} "{2}".

If I have a duplicate entity named "idea" the message I get is

There already exists a idea with the name "new idea"

for example

The problem is that it should say "an idea" instead of "a idea"

In spanish the problem is even worse, because sustantives have gender, and you should use different articles (instead of "a" you use "una" / "un")

How do you handle this kind of situations? Is there some kind of inflector library that can handle different laguages? And would you integate it with the i18n.Messages functions?

thanks a lot

saludos

sas

notalifeform

unread,
Oct 17, 2012, 4:09:30 PM10/17/12
to play-fr...@googlegroups.com
Hi,

I was thinking myself  to implement something like the 'quant' function that is available in perl's Locale::Maketext,

e.g 

"[quant,_1,file,files,No files] matched your query.\n"

of course if will only be able deal with quantities..

for genders you could maybe do something like

[gender,_1,un,una]

and have this information your messages file too 

idea=idée
idea.gender=female

building this logic into a play plugin will not be rocket science
(of course there are languages that will need more sophisticated rules than this..

I must admit it feels a bit like reinventing the wheel, but I didn't came across a solution yet that seems easy to integrate myself...

if you find any decent solution - please let me know.

cheers,

Robert

Marconi

unread,
Oct 17, 2012, 5:23:52 PM10/17/12
to play-fr...@googlegroups.com
If you know the words beforehand, you could adopt a simpler strategy:

en:
idea=an idea
fact=a fact

pt-BR:
idea=uma idéia
fact=um fato

But I believe that the problem discussed here is that the words are not known beforehand, i.e., they may come from user input. This is a very tricky problem to solve, indeed. Even if you had a list of nouns and gender, there could be ambiguities. For instance, in Brazilian Portuguese "a cara" means "the face", but "o cara" means "the guy" ("a" and "o" are the feminine and masculine definite articles, respectively).

I really don't have a good answer to give you, but I can point you to a few resources that may or may not do a little better than Java messages:

http://icu-project.org/
http://code.google.com/p/gettext-commons/
http://www.gnu.org/savannah-checkouts/gnu/gettext/manual/html_node/Plural-forms.html
http://angularjs.org/#create-components

Tom Carchrae

unread,
Oct 17, 2012, 5:54:01 PM10/17/12
to play-fr...@googlegroups.com
I replaced the messages String.format function with a template render.  No doubt it costs a lot more CPU but it lets me do all sorts of crazy stuff that string format could not handle.  It was also annoying to explain to client/user that some templates work like this, and these message files are different.  Now it all uses the same template engine.  

I mention because if you have this kind of logic, you probably want to use it in your templates as well.  So, grab the hammer and smash the world. ;)  

Tom



--
You received this message because you are subscribed to the Google Groups "play-framework" group.
To view this discussion on the web visit https://groups.google.com/d/msg/play-framework/-/vOmGfINCnaUJ.

To post to this group, send email to play-fr...@googlegroups.com.
To unsubscribe from this group, send email to play-framewor...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/play-framework?hl=en.

sas

unread,
Oct 17, 2012, 7:39:12 PM10/17/12
to play-fr...@googlegroups.com
the list of words is known before hand, but I want to use the same for

validate.empty={0} not specified.
validate.duplicate=There already exists a {0} with the {1} "{2}".

In the first case it would be "Idea not specified", and in the second "There already exists an idea ..."

With that solution I would have to define

idea=idea
anIdea=an idea

which would make everyting more comlicated than necessary...

James Roper

unread,
Oct 18, 2012, 12:33:40 AM10/18/12
to play-fr...@googlegroups.com
Internationalisation is hard.  If you could translate any and every phrase just by replacing all the dynamic parts with place holders, then learning a new language would just be a simple case of mapping words from one language to another, and voilà!  You could speak that language.  But as we all know, learning a language isn't that simple, so you can't translate any and every phrase just by replacing all the dynamic parts with place holders.

So when it comes to issues like this, you have two options:

1) Minimise the number of dynamic parts.  This means having a keys like:

validate.duplicate.idea=An idea already exists with the name "{0}"

and so on for each thing.  Doing this gives you the most natural language, but is more work.  If the list of things themselves are dynamic, it gets even harder, you end up needing to have a translation screen in your app for providing translations for each thing (for an example, see JIRAs admin section... ever wonder why it was so complex?  This is one reason.)

2) Refer to the dynamic parts in a "distant" way, ie, don't use them directly as part of the sentence.  I don't know the context of your app, but that might mean doing something like:

validate.duplicate=A thing of type "{0}" already exists with the property "{1}" set to "{2}"

Now this should probably translate well to any language, and is very simple to maintain.  But, now the language used in your app is more awkward.

So there you go.  Internationalisation is hard, it can be done, but you have to make trade offs.

sas

unread,
Oct 18, 2012, 9:21:46 AM10/18/12
to play-fr...@googlegroups.com
excellent answer, james

I think there's also a third option, that's a little more involved

That is build a more intelligent and language specific system that would adjust (inflect) the needed parts.

For example, 

validate.duplicate=There already exists a {0} with the name "{1}".

In english, the function that replaces the placeholder should recognize that the {0} is preceded by an "a", and replace it with "an" if the value to put starts with a vowel.

In spanish it would be

validate.duplicate=Ya existe un {0} con el nombre "{1}".

In that case the function should replace "un" with "una" depending on the gender of {0}

I know it's not that easy (there are always exceptions that should be handled by some getGender function), but it would give us the best of both worlds.

Nevertheless, for the time being I think I'll go with option 2

Thanks a lot for your help

Saludos

Sas

Marconi

unread,
Oct 18, 2012, 6:36:52 PM10/18/12
to play-fr...@googlegroups.com
On Thursday, October 18, 2012 12:33:40 AM UTC-4, James Roper wrote:
Internationalisation is hard.  If you could translate any and every phrase just by replacing all the dynamic parts with place holders, then learning a new language would just be a simple case of mapping words from one language to another, and voilà!  You could speak that language.  But as we all know, learning a language isn't that simple, so you can't translate any and every phrase just by replacing all the dynamic parts with place holders.

Hi James,

While I completely agree with you that internationalization is hard and localization is even harder,  I do not think you are portraying a fair scenario above.

We are not talking about full automatic translation. No one is suggesting creating message files containing:

this=This
belongs=belongs
to=to

And expecting Messages("{0} {1} {2} {3} {4}.", i18n.this, object.name, i18n.belongs, i18n.to, person.name) to produce "This book belongs to John." across all spoken languages, given a proper messages file. The entire sentence is still to be translated wholesale by a human:

message=This {0} belongs to {1}.

Usually, dynamic substitution of placeholders works reasonably well because often most placeholders are nouns. The problems are limited then to properly handling number (plurals) and gender and the impact they have on surrounding articles, adjectives, verbs, and pronouns.

That is not, by any means, an easy and simple task, no doubt about it. But it is also nowhere near the complexity of something as difficult as full automatic translation of human languages. And it does not imply that languages could be translated by word mapping. The problem of number and gender agreement is much more limited in scope.

Of course, languages have all kinds of bizarre nuances. For instance, in English the possessive pronoun agrees with the subject, while in Portuguese (and maybe in French and Spanish, too?) it agrees with the object. Yes, as weird as it may sound, in some languages inanimate objects have gender, too:

A girl and her book. A girl and her bike.
A boy and his book. A boy and his bike.

A garota e seu livro. A garota e sua bicicleta.
O garoto e seu livro. O garoto e sua bicicleta.

There is no way to handle this scenario without knowing both subject and object genders and using language-specific conjugation engines, even if one of the pairs girl-boy or book-bike isn't a dynamic placeholder. No messages file format and i18n engine I know of is expressive enough to handle such cases.

Of course, the list of problems and weirdness goes on and on. Think about "her book" and "I like her". Even though they are spelled the same, "her" and "her" are technically two different words. The first is equivalent to "his book", the second to "I like him".

My best piece of non-technical advice is: learn grammar. And learn it well, pretty well. Understanding how your own native language works is the fundamental first step to good i18n. And if you can, try to also learn a second language as best as you can.

(My apologies if I didn't get all grammar technical terms right, I'm still trying to learn English as best as I can.)

Marconi

unread,
Oct 18, 2012, 8:06:16 PM10/18/12
to play-fr...@googlegroups.com
BTW, I agree with the rest of your message. My comments regard only your first paragraph.

Brian Smith

unread,
Oct 18, 2012, 8:28:16 PM10/18/12
to play-fr...@googlegroups.com
If you really need enhanced internationalization that handles plurality and other more complex language constructs and prefer not to live with some of the workarounds mentioned already, you might take a look at icu4j.  It offers a lot of rich replacement for/extension of the core java i18n apis.



It should be fairly straightforward to integrate this into Play in much the same way as the existing Messages wrapper.

regards

Brian



sas

--
You received this message because you are subscribed to the Google Groups "play-framework" group.
To view this discussion on the web visit https://groups.google.com/d/msg/play-framework/-/ljrWTFR7Se4J.

Dave

unread,
Oct 19, 2012, 3:25:58 AM10/19/12
to play-fr...@googlegroups.com
I think the simplest way is to avoid articles and other words that have inflection and make the sentence easier to translate..
You could disregard 'name' and reformulate.
I think there is always a way to avoid inflective words..


There already exists a idea with the name "new idea"
There already exists a {0} with the {1} "{2}"
into
"idea" and "new idea" are duplicates in name
"{0}" and "{1}" are duplicates in {2}.
or
"new idea" already exists as "idea" in name
"{0"} already exists as "{1}" in {2}

So basically it comes down to: avoid details





Op woensdag 17 oktober 2012 17:00:12 UTC+2 schreef sas het volgende:

James Roper

unread,
Oct 21, 2012, 9:21:07 PM10/21/12
to play-fr...@googlegroups.com
The point I was attempting to make was that you can't expect, given any arbitrary phrase with dynamic parts, to be able to translate it cleanly to every language.  If you could, then automatic translation would be trivial.  Rather, you have to structure your phrases to allow them to be cleanly translated.  And I'm well aware of how complex the gramatical rules can be, I speak German (badly) :)

James Roper

unread,
Oct 21, 2012, 9:22:04 PM10/21/12
to play-fr...@googlegroups.com
The point I was attempting to make was that you can't expect, given any arbitrary phrase with dynamic parts, to be able to translate it cleanly to every language.  If you could, then automatic translation would be trivial.  Rather, you have to structure your phrases to allow them to be cleanly translated.  And I'm well aware of how complex the gramatical rules can be, I speak German (badly) :)

On Friday, 19 October 2012 09:36:52 UTC+11, Marconi wrote:
Reply all
Reply to author
Forward
0 new messages