I'd lean towards #1 myself, and add a way to specify a set of plural
rule text at the same time for a given locale if I want to override
the default platform set of rules. This would allow a significant
amount of variability between locales without requiring any programmer
intervention.
For instance:
@DefaultText("{0,plural, zero{No balls found!}, one{One ball found},
two{Two balls found, great}, other{More than two balls!}}", "zero: n
is 0; one: n is 1; two: n is 2")
A translation tool can provide a more structured editor for plural
forms if desired, but tools that don't can just allow the translator
to edit the string in-line. I imagine that the intent of the plural
form can be communicated well enough in the description that the
translator won't damage the string.
Wouldnt support for cascading property names be simplier than the
long lists quoted above.
IE
how-many-balls-0 searches for how-many-balls-0 > how-many-balls-0 >
how-many-balls-n > how-many-balls.
and
how-many-balls-1 searches for how-many-balls-1 > how-many-balls-n >
how-many-balls.
or something like that. The long list quoted above seems like a lot of
noise and in effect is a simple language in itself.
On Dec 2, 2007 3:54 PM, mP <miroslav...@gmail.com> wrote:Wouldnt support for cascading property names be simplier than the
long lists quoted above.
IE
how-many-balls-0 searches for how-many-balls-0 > how-many-balls-0 >
how-many-balls-n > how-many-balls.
and
how-many-balls-1 searches for how-many-balls-1 > how-many-balls-n >
how-many-balls.
or something like that. The long list quoted above seems like a lot of
noise and in effect is a simple language in itself.
I don't see how this solves the problem at hand. What is needed is that each translated language is able to describe different sets of rules for choosing the correct plural form, and the corresponding text for each form. Each language has different rules. For example, Arabic (using right-to-left numbers):
0 - none form
1 - singular form
2 - dual form
3-10 - few form
11-99 - many form
other - plural form
Russian uses the singular form if the count ends in 1 but not 11, the few form for numbers ending in 2-4 except for 12-14, and the plural for everything else.
So, in a property file for English, I would have:
widgets={0} widgets.
widgets[one]=A widget.
Arabic rules might result in (with English text):
widgets={0} widgets.
widgets[none]=No widgets.
widgets[one]=A widget.
widgets[two]=Both widgets.
widgets[few]={0} widgets, which are few.
widgets[many]={0} widgets which are many.
Japenese doesn't distinguish plural forms, and would have only widgets={0} widgets.
There are many other languages with very unusual (to an English speaker :) plural rules, and we need to be able to support all of them equally well. GNU gettext has been doing this for a long time, but one limitation is numbering the states and there is no convention across apps for what those numbers mean and existing translations break if they add new states, such as when they correct the plural rules for Arabic. ChoiceFormat can't solve the problem because you would have an infinite list of rules since the patterns repeat (for example, in Russian and Polish, among others).
--
John A. Tamplin
Software Engineer, Google
I don't see how this solves the problem at hand. What is needed is that each translated language is able to describe different sets of rules for choosing the correct plural form, and the corresponding text for each form. Each language has different rules. For example, Arabic (using right-to-left numbers):
0 - none form
1 - singular form
2 - dual form
3-10 - few form
11-99 - many form
other - plural form
In this case the property names are a good match. You can easily express
-0 for none
-x01 for one
-x1 for 11, 21, 31 .. 91.
I dont see how my proposal fails to express all these numbering cases. SOme languages will have a single entry as per japanese or a 2 for English or 6 for Arabic.
For arabic you would need the following entries
hello-0=No widgets
hello-1=One widget.
hello-2=Two widgets
hello-x=Few widgets
hello-xx=Many widgets
hello-xxx=Lots of widgets
x becomes a wildcard for any digit.--On Dec 4, 2007 3:34 AM, John Tamplin <j...@google.com> wrote:On Dec 3, 2007 4:26 AM, Miroslav Pokorny <miroslav...@gmail.com> wrote:I don't see how this solves the problem at hand. What is needed is that each translated language is able to describe different sets of rules for choosing the correct plural form, and the corresponding text for each form. Each language has different rules. For example, Arabic (using right-to-left numbers):
0 - none form
1 - singular form
2 - dual form
3-10 - few form
11-99 - many form
other - plural form
In this case the property names are a good match. You can easily express
-0 for none
-x01 for one
-x1 for 11, 21, 31 .. 91.
I dont see how my proposal fails to express all these numbering cases. SOme languages will have a single entry as per japanese or a 2 for English or 6 for Arabic.
Exactly how does your proposal choose the proper string to use based on the value? Are you suggesting for Arabic you have -3 -4 -5 -6 -7 -8 -9 -10 entries that all contain the same string?
--
John A. Tamplin
Software Engineer, Google
mP
For arabic you would need the following entries
hello-0=No widgets
hello-1=One widget.
hello-2=Two widgets
hello-x=Few widgets
hello-xx=Many widgets
hello=xxx=Lots of widgets
x becomes a wildcard for any digit.
On Dec 3, 2007 3:25 PM, Miroslav Pokorny <miroslav...@gmail.com> wrote:For arabic you would need the following entries
hello-0=No widgets
hello-1=One widget.
hello-2=Two widgets
hello-x=Few widgets
hello-xx=Many widgets
hello=xxx=Lots of widgets
x becomes a wildcard for any digit.
I don't believe this can represent all the necessary plural forms -- for example, in this case you would be choosing the Many widgets form for 10, which is incorrect. Aside from that, you are now requiring the translators, who generally are linguists not programmers, to encode the plural rules in the name of the string.
I'm not sure why you are wanting to simplify the solution below the point where it no longer solves the problem at hand. Plural forms have been around for a while and people have developed ways to handle them properly, and they all involve some form of arbitrary expression support to choose the proper
--
John A. Tamplin
Software Engineer, Google
I think John is suggesting that all of the rules for every locale be
written once, so that no one need write the rules. Granted, anything
that can be encoded as an annotation can be encoded as a property.
Annotations have the advantage of working better with Java tool chains
(refactoring, etc) plus GWT generators out of the box, whereas
property files would require a little extra step to trigger the
generator.
I think John is suggesting that all of the rules for every locale be
On Dec 3, 2007 11:49 PM, Miroslav Pokorny <miroslav...@gmail.com> wrote:
>
>
> And how is using annotation any easier ? The last thing a non programmer
> should be touching is a *.java.
>
written once, so that no one need write the rules. Granted, anything
that can be encoded as an annotation can be encoded as a property.
Annotations have the advantage of working better with Java tool chains
(refactoring, etc) plus GWT generators out of the box, whereas
property files would require a little extra step to trigger the
generator.
I think John is suggesting that all of the rules for every locale be
written once, so that no one need write the rules. Granted, anything
With properties files you would only be writing the rules once. Each language would get its own properties file and have the same entries with their own set of plural entries.
Refactoring has nothing to do with this. This isnt a code artefact like a method.
Its text pure and simple you cant list all the rules for all the lingos for each method on the message definition interface.
I think this proposal would put a lot of power in the hands of the
developer regarding i18n. The general direction is definitely good.
Here are my initial thoughts for discussion:
* The distinction between @Description and @Meaning is a bit fuzzy. I
noticed that one translates to <note annotates="source"> while the
other translates to <note annotates="general"> in your XLIFF example.
However, I am not sure that the distinction would be clear to the GWT
developer. Also, it does not seem necessary to have both. If one
description may not be seen by the translator, what good is it? It
might as well just be an inline comment in the source code. However,
having both options opens up the possibility that a GWT developer
could expect his @Descriptions to be shown to the translator, without
realizing that only Meanings may be shown to the translator.
* What happens if @DefaultText is not specified for one of the message
functions? Would the lack of this annotation lead to a compiler error
or warning? Would the function default to return an empty string or
some default text? If so, would message key collision be an issue?
* My first reaction upon seeing your example code was something
like... "What the hell?" I have never seen code with such a liberal
use of annotations, though this may speak more to my lack of
experience than anything. I do not know whether this is a real
problem, but I wonder if this format is a bit too foreign to the
average Java developer. Does anyone else think this is a concern?
(If not, just ignore this comment, because it may just be me.)
Plural Forms
While supporting plural forms is not planned initially, here are some ideas as to how they might be supported. We feel that more work needs to be done before we understand the problem well enough to choose a solution, but it seems the proposed annotation framework would be sufficiently flexible to allow a reasonable extension for plural forms in the future.
John can you please tell me why my idea is worse ?
Where and how does it fail, ?
Surely its simpler than what you have proposed.
With your approach the person doing the translation work needs to ask
a coder to verify what "few" or "many" are. This is overly complex
separating half the message in one and the definition of these plural
count words in another class. Why cant they be together ?
What purpose does the @PluralText annotation serve? Shouldn't the
text to be inserted in the plural form be determined by the PluralRule
implementation?
Putting the plural rules in the properties file means that the rules
are duplicated in a mutable form all over the place. Leaving them as
annotations in the GWT library code means that they're essentially
immutable and they're only in one place. Sounds like a case of DRY to
me and is, for me, an argument in favour of annotations all by itself,
but it also lowers the maintenance burden. If there's a bug in the
plural rules, then storing them in the properties files means updating
all the properties files, whereas storing them in the GWT library code
means upgrading and recompiling.
> If the translator ends up printing the list and then goes to do their
> work why not just include the selectors in the properties file, even
> if a coder setup the rules for them...
Seems like a recipe for disaster to me. A translator can't break the
build by scribbling on a print-out. He could break the build if he
inadvertently/maliciously changes one of the plural-rules properties.
Ian
--
Tired of pop-ups, security holes, and spyware?
Try Firefox: http://www.getfirefox.com
Putting the plural rules in the properties file means that the rules
are duplicated in a mutable form all over the place. Leaving them as
annotations in the GWT library code means that they're essentially
immutable and they're only in one place. Sounds like a case of DRY to
me and is, for me, an argument in favour of annotations all by itself,
but it also lowers the maintenance burden. If there's a bug in the
plural rules, then storing them in the properties files means updating
all the properties files, whereas storing them in the GWT library code
means upgrading and recompiling.
My main principal idea was trying to keep it simple and move the
"rules" out of java annotations into a text file. Your idea of
defining the plural rules once,(in your case in annotations )to
selectors thing is perfectly valid...
I will rephrase my question why cant the plural selectors be defined
in the properties file ?
one=1
two=2
few=3-10
many=11+
How do they know which plural selectors are valid without looking at
the code or getting someone to look for them ?
If the translator ends up printing the list and then goes to do their
work why not just include the selectors in the properties file, even
if a coder setup the rules for them...
I got some time to chew your specification carefully. In your example,should be
@GenerateFile( Xliff.class, "MyMessages.xliff")
@Generate(Xliff.class, "MyMessages.xliff")
to be consistent with earlier specification.
This part:
@DefaultText("The amount due is {0,number,currency}.")is also unclear to me. Is there any typo here? After stripping annotation, Java code should remain valid, right? The above code does not seem so.
String amountDue(@Example("$5.00") @Replace("{0,number,currency}" amount);
John, did you consider dropping key generator part? User can either provide a key, or use the key GWT generating. I couldn't foresee a user case where user need to generate key in their own way. That only create more incompatibilities among GWT code. If everybody is using the same scheme, sharing messages among applications will be a breeze.
To be more restrictive, I don't see if we ever want to support message format besides property file and xliff. One option is to leave those choice to the tools. Again, that will bring benefit to message sharing.
John, did you consider dropping key generator part? User can either provide a key, or use the key GWT generating. I couldn't foresee a user case where user need to generate key in their own way. That only create more incompatibilities among GWT code. If everybody is using the same scheme, sharing messages among applications will be a breeze.
Anyone who has an existing translation system that aggregates translation strings across multiple applications (including non-GWT cases) will need that functionality, such as internal to Google.
Sorry I am so late in jumping in on this. Thanks for taking the time to put this RR together. Looks like you put a lot of thought into it.
-I generally like the specification that you have come up with, but I would prefer it if the developer could use less annotations where possible. With that in mind, here are my comments/questions?
-Is the @Generate annotation optional? If not, I think it should be. The default behavior would be to generate a properties file in the same directory in which the MyMessages interface is found. If you agree with this change, the behavior when the @Lookup annotation is unspecified should be changed so that .properties files take precedence over .xlf files.
-We should get rid of the @Description annotation. If the MyMessages interface is being generated by some tool, the description of each method can be inserted- as a javadoc comment.
Take these next two comments with a grain of salt, as I am not a huge fan of the way parameter-level annotations look:
-Instead of having the @Optional annotation, why not force the user to declare another interface method with the alternative number of parameters? This would lead to less ambiguity.
-Instead of having the @Example annotation, could we have the @DefaultText look something like: Access Denied: {John} does not have access to {autoexec.bat}.
@John: Is there some way that we could make it easy to specify that the keys be generated using a default hash function that we've come up with? Maybe if the user specifies @GenerateKeysUsing with no arguments, then it uses a default hash function that we've come up with. Otherwise, if they pass in an argument, then we use the key generator that they specified?