RR: i18n generator and annotations

126 views
Skip to first unread message

John Tamplin

unread,
Nov 30, 2007, 7:20:15 PM11/30/07
to Google Web Toolkit Contributors, Shanjian Li, Emily Crutcher, Rajeev Dayal
Now that we have annotation support, here is a proposal for using annotations in the i18n generator.

Goals
  • Support a richer expression of the text to be translated.  Ie, we need to have a way of specifying the meaning of the text to the translator (for example, orange can be a fruit or a color), examples for placeholders, ways to specify that certain arguments need not be present in the translated string, etc.
  • Support interaction with standards-based translation tools.
  • Support both automatic generation of translation source files from the Java source and vice-versa.
  • Enable future support of different plural forms.
Non-Goals
  • Add support for plural forms.  However, the flexibility of the annotation interface should make this easy to support, and ideas for a potential future implementation are included.
Overview of Current Process
  1. User writes a Java subinterface of Messages or Constants containing methods for each string to be translated.
  2. User writes default strings for each method and stores in the default property file
    (It is possible the authoritive version is kept in some proprietary format file and the Java and properties files are generated from that).
  3. Properties file is sent off for translation to various other locales
  4. Translated properties files are stored in the proper place with the proper name
  5. GWT compiliation process invokes i18n generator which pulls in the strings for each compiled locale
Proposed Changes
The above process would still be supported.  In addition:
  • Annotations could be used to store the default translation strings, as well as additional metadata such as meanings, descriptions, placeholders, etc
  • Generation of XLIFF or properties files (and extensible to other formats) from the annotated Java files would be supported - these files would then be sent for translation
  • Translated files would be accepted in XLIFF or properties files (and extensible to other formats)
  • The i18n generator would read these translated files during the compilation process
Advantages
  • One fewer file to keep synchronized
  • Better expressiveness for translation metadata
  • Support for standards-based translation tools
  • Easily extensible for new formats
  • Potential use of messages interface outside of GWT possible using reflection and java.lang.reflect.Proxy at runtime
Proposed Annotations
These annotations would all be kept in the com.google.gwt.i18n.client.annotations package.
  • Interface Level
    These annotations are applied to the Messages/Constants interface
    • @DefaultLocale(String localeName)
      Specifies that text in this file is of the specified locale.  If not specified, the default is en_US for compatibility with existing behavior.
    • @GeneratedFrom(String filename)
      Indicates that this file was generated from the supplied file.  Note that it is not required that this file name be resolvable at compile time, as this file may have been generated on a different machine etc -- if the generator does check the source file, such as for staleness, it must not give any warning if the file is not present or if the name is not resolvable.
    • @GenerateKeysUsing(Class<? extends KeyGenerator> generator)
      Requests that the keys for each method be generated with the specified generator (see below).  If this annotation is not supplied, it will be treated exactly as if it were supplied with MethodName.class.  By specifying a class literal, this will be extensible to other formats not in the GWT namespace -- the user just has to make sure the specified class is on the class path at compilation time.  This allows integration with non-standard or internal tools that may use its own hash function to coallesce duplicate translation strings between multiple applications or otherwise needed for compatibility with external tools.
    • @Generate(Class<? extends MessageCatalogFormat> format, String filename)
      Requests that a message catalog file is generated during the compilation process.  Exact semantics of non-absolute pathnames to be determined.  For example, this could generate an XLIFF or properties file based on the information contained in this file.
    • @Lookup(Class<? extends MessageCatalogFinder> finder)
      Species that translated messages are found by the specified MessageCatalogFinder.  If this annotation is not supplied, .xlf and .properties files (in that order) are searched for on the class path.  Each MessageCatalogFinder may define additional annotations the provide search parameters, such as a directory prefix, a database name, etc.
  • Method Level
    These annotations are applied to individual methods and contain information about a single message.
    • @DefaultText(String)
      The default text to use if no better locale match is found.  The default text is expected to be in the default locale (see @DefaultLocale).  The format of the text is identical to the current properties file format for the Messages/Constants interface, except that Java quoting is used instead of property file quoting (ie, \" inside strings vs \-escaping #, !, trailing spaces, etc).  Also, the encoding matches that of the source file, which will typically be UTF8.
    • @Description(String)
      A description of the text.  Note that this is not included in a hash of the text and depending on the file format may not be included in a way visible to a translator.
    • @Key(String)
      Specifies the key to use in the external format for this particular method.  If not supplied, it will be generated based on the @GenerateKeysUsing annotation above.
    • @Meaning(String)
      Supplies a meaning associated with this text.  This information is provided to the translator to distinguish between different possible translations -- for example, orange might have meaning supplied as "the fruit" or "the color".
  • Parameter Level
    These annotations are applied to a single parameter in a method with arguments (and are therefore only relevant to Messages subinterfaces).
    • @Example(String)
      An example for this variable.  Many translation tools will show this to the translator in place of the placeholder -- ie, Hello {0} with @Example("John") will show as Hello John with "John" highlighted to indicate it should not be translated.
    • @Optional
      Indicates that this parameter need not be present in all translations.  If this annotation is not supplied, it is a compile time error if the translated string being compiled does not include the parameter.
    • @Replace(String)
      Indicates that this parameter will replace the specified string in the default text.  If not specified, it defaults to {0}, {1}, etc based on the position of the parameter in the argument list.
  • Related Classes/Interfaces
    • KeyGenerator {
        String generateKey(String className, String methodName, String text, String meaning);
      }
      Supplied implementations for MethodName (which just returns methodName), FullyQualifiedMethodName (which just returns className + "." + methodName), and MD5 (which returns the MD5 hash of text and meaning as a hex string).
    • MessageCatalogFormat {
        // to be determined, but API sufficient to implement any reasonable message
        // catalog format
      }
      Supplied implementations for Java property files and XLIFF; perhaps GNU gettext if there is interest.
    • MessageCatalog {
        String getMessage(String key);
      }
      Supports lookup of an individual message by its key.
    • MessageCatalogFinder {
         MessageCatalogFinder(Class <? extends Messages> messages); // constructor called via reflection
         String[] localesAvailable(); // could be added later
         MessageCatalog getCatalog(String locale);
      }
      Knows how to find a catalog for a given locale and to enumerate available locales.  This could involve lookup in a database or an aggregated message file, for example.
Example

package foo;

import com.google.gwt.i18n.client.annotations.*;

@DefaultLocale("en_US")
@GenerateFile(Xliff.class, "MyMessages.xliff")
@GenerateKeysUsing(KeyGenerator.MD5.class)
public interface MyMessages extends com.google.gwt.i18.client.Messages {
  @DefaultText("red")
  String red();

  @DefaultText("orange")
  @Meaning("the color")
  String orange();

  @DefaultText("Access denied: {0} does not have access to {1}")
  @Description("The specified user does not have access to the specified file")
  String accessDenied(@Example("user name") String user, @Example("file name") String file);

  @DefaultText("The amount due is {0,number,currency}.")
  String amountDue(@Example("$5.00") @Replace("{0,number,currency}" amount);
}

This would instruct the generator to produce a MyMessages.xliff file in a to-be-determined location.  A meaning is supplied for orange to clarify to the translator that the color is desired, not the fruit.    A description, perhaps useful to explain why the message is being used or what it is trying to convey is added on accessDenied.  Note that no @Replace tags are needed as the default values of {0} and {1} are sufficient.  For amountDue, the string to be replaced by the supplied argument is supplied.  This would generate an XLIFF file containing:

<? xml version="1.0" ?>
< xliff version="1.2">
 <file original="foo/MyMessages.java" source-language="en-US" datatype="x-gwt-java">
  <body>
    <trans-unit id="{md5 hash of 'red'}">
      <source>red</source>
    </trans-unit>
    <trans-unit id="{md5 hash of 'orangethe color'}">
      <source>orange</source>
      <note annotates="source">the color</note>
    </trans-unit>
    <trans-unit id="{md5 hash of ' Access denied: {0} does not have access to {1}'">
      <source>Access denied: <ph id="0">user name</ph> does not have access to <ph id="1">file name</ph></source>
      <note annotates="general"> The specified user does not have access to the specified file</note>
    </trans-unit>
    <trans-unit id="
{md5 hash of 'The amount due is {0,number,currency}.'">
      <source>
The amount due is <ph id="0">$5.00</ph>.</source>
    </trans-unit>
  </file>
</xliff>


Plural Forms
While supporting plural forms is not planned initially, here are some ideas as to how they might be supported.  We feel that more work needs to be done before we understand the problem well enough to choose a solution, but it seems the proposed annotation framework would be sufficiently flexible to allow a reasonable extension for plural forms in the future.
  1. Use @DefaultText as above, using ICU's PluralFormat.   For example, {0,plural,singular{1 widget} other{# widgets}}.  The advantage is no new annotations or other support is required, but the disadvantage is that it exposes the rules to the translator, who might not know what to translate or might damage the format.  ChoiceFormat could also be used, but is insufficient to implement proper plural forms in many languages.
  2. @PluralText(Class<? extends PluralFormRule> rule, String[] texts)
    @PluralText(String[] texts) // using a default rule

    Calls the specified class at compile time to determine which plural form to use.  Each text in the array is a string exactly like @DefaultText above, and is translated separately -- the number of strings required will depend on the locale and the plural rules supported.  GWT could supply a standard rule class with implementations for most locales, making it easy for users to support plural forms in many languages directly.  One disadvantage would be that translators might need to create an additional plural form and tool support may make that harder than it should be.  For example, properties files have no explicit support for plural forms and would likely use something like key[0] key[1] key[2] etc for the plural forms.  The English (for example) text supplied to the translator might only use two plural forms, so for Polish the translator would have to know to add a key[2] in addition to key[0] and key[1].  Alternatively, and additional array of strings describing the names of each plural case could be passed, and the PluralFormRule would return a string identifying the case might make it a bit easier to tie the rule result to the plural form.   The complication comes that the PluralFormRule will need to be accessible both at compile time and compiled to JS, since for non-constants the rule must be evaluated at runtime.  A final complication is dealing with keys in this case -- perhaps the base key should only be the "other" text in the default locale, and then the set of keys would be key (for anything not matching more specifically), key[zero], key[one], key[paucal], etc.  This would also provide compatibility with non-plural form usage of the same string, and in fact we could just use @DefaultText for that string.
  3. @PluralText(String rule, String[] texts)
    In this case, the rule would be specified as an expression similar to GNU gettext.  The rule would go into the translated strings just like any other, and would be retrieved from the translated text and applied to the value to select the proper form.  Compared to the above, the advantage is that simple rules are easy to specify without having to generate a subclass for each locale which delegates to a common implementation, but the downside is that the rules have to be processed by a translator (or shown comments so they leave it alone, and then a programmer "translates" the rules to the desired locale.  Also, the expression language for the rule would necessarily be limited compared to supplying a Java class.
I personally would lean towards #2 above and supplying a standard rule for almost every locale (these would not get used if each locale were not in the locale property's set at compile time).  In fact, we might want to consider common things used across many applications (such as thousands and decimal separators, text direction, default currency symbol, etc) and supply implementations of them by default -- the default plural rule could just be one thing supplied by this standard interface.  If so, and this standard rule is used, the exporter could include boilerplate comments to the translator indicating which plural forms to use for which situation.

Comments?

--
John A. Tamplin
Software Engineer, Google

Matthew Mastracci

unread,
Nov 30, 2007, 10:40:19 PM11/30/07
to Google Web Toolkit Contributors
I'd lean towards #1 myself, and add a way to specify a set of plural
rule text at the same time for a given locale if I want to override
the default platform set of rules. This would allow a significant
amount of variability between locales without requiring any programmer
intervention.

For instance:

@DefaultText("{0,plural, zero{No balls found!}, one{One ball found},
two{Two balls found, great}, other{More than two balls!}}", "zero: n
is 0; one: n is 1; two: n is 2")

A translation tool can provide a more structured editor for plural
forms if desired, but tools that don't can just allow the translator
to edit the string in-line. I imagine that the intent of the plural
form can be communicated well enough in the description that the
translator won't damage the string.

John Tamplin

unread,
Nov 30, 2007, 11:19:04 PM11/30/07
to Google-Web-Tool...@googlegroups.com
On Nov 30, 2007 10:40 PM, Matthew Mastracci <mmas...@gmail.com> wrote:
I'd lean towards #1 myself, and add a way to specify a set of plural
rule text at the same time for a given locale if I want to override
the default platform set of rules.  This would allow a significant
amount of variability between locales without requiring any programmer
intervention.

For instance:

@DefaultText("{0,plural, zero{No balls found!}, one{One ball found},
two{Two balls found, great}, other{More than two balls!}}", "zero: n
is 0; one: n is 1; two: n is 2")

It would need to be a different annotation, due to the limits on arguments.  If you have more than one argument, they all have to be named, ie @PluralText(text="{0,plural,...}", rules="zero: n==0; one: n==1; two: n==2").  I also like the idea of the "other" case being in @DefaultText so plural rules can be easily added to something that "mostly" works as a refinement.

Also, the more I think about it the more I like the idea of providing standard plural rules through deferred binding and using type-safe enums for extensibility.  Ie, PluralRuleChooser { PluralRule[] validValues(); PluralRule choose(double n) }; PluralRuleChooser01n { PluralRule[] validValues() { return new PluralRule[]{ PluralRule.ZERO, PluralRule.ONE, PluralRule.OTHER }; PluralRule choose(int n) { return n == 0 ? PluralRule.ZERO : (n == 1 ? PluralRule.ONE : PluralRule.OTHER); } } PluralRuleChooser_en extends PluralRuleChooser01n {} etc.

If you need to add a new plural form that we didn't include, you would be able to do so, and as long as we provided implementations for most languages (which would mostly delegate to a small set of rules) then the typically programmer would never have to worry about the rules.  We already to this to some extend for NumberFormat and DateTimeFormat.

A translation tool can provide a more structured editor for plural
forms if desired, but tools that don't can just allow the translator
to edit the string in-line.  I imagine that the intent of the plural
form can be communicated well enough in the description that the
translator won't damage the string.

I have limited personal experience with dealing with translators (in a previous job we never bothered with plural forms), but the general rule seems to be that translators are linguists and not programmers, and showing them any sort of code will lead to translation errors.  In our case, we can detect some of those errors at compile time rather than runtime, but there will still be some that slip through.

Note that the plural forms are almost certainly not going to be in 1.5 (and in fact only a subset of this might make it, given the current localizable generator architecture), so this is really a future discussion.  While I want to make sure that we can implement whatever we come up with for plurals without changing the simple cases, the primary focus is on making sure the rest of it is solid.

mP

unread,
Dec 2, 2007, 3:54:13 PM12/2/07
to Google Web Toolkit Contributors
WOuldnt support for cascading property names be simplier than the
long lists quoted above.

IE

how-many-balls-0 searches for how-many-balls-0 > how-many-balls-0 >
how-many-balls-n > how-many-balls.

and

how-many-balls-1 searches for how-many-balls-1 > how-many-balls-n >
how-many-balls.

or something like that. The long list quoted above seems like a lot of
noise and in effect is a simple language in itself.


On Dec 1, 3:19 pm, "John Tamplin" <j...@google.com> wrote:

John Tamplin

unread,
Dec 2, 2007, 4:25:22 PM12/2/07
to Google-Web-Tool...@googlegroups.com
On Dec 2, 2007 3:54 PM, mP <miroslav...@gmail.com> wrote:
Wouldnt support for cascading property names be simplier than the

long  lists quoted above.

IE

how-many-balls-0 searches for how-many-balls-0 > how-many-balls-0 >
how-many-balls-n > how-many-balls.

and

how-many-balls-1 searches for how-many-balls-1 > how-many-balls-n >
how-many-balls.

or something like that. The long list quoted above seems like a lot of
noise and in effect is a simple language in itself.

I don't see how this solves the problem at hand.  What is needed is that each translated language is able to describe different sets of rules for choosing the correct plural form, and the corresponding text for each form.  Each language has different rules.  For example, Arabic (using right-to-left numbers):
  0 - none form
  1 - singular form
  2 - dual form
  3-10 - few form
  11-99 - many form
   other - plural form
Russian uses the singular form if the count ends in 1 but not 11, the few form for numbers ending in 2-4 except for 12-14, and the plural for everything else.

So, in a property file for English, I would have:
widgets={0} widgets.
widgets[one]=A widget.

Arabic rules might result in (with English text):
widgets={0} widgets.
widgets[none]=No widgets.
widgets[one]=A widget.
widgets[two]=Both widgets.
widgets[few]={0} widgets, which are few.
widgets[many]={0} widgets which are many.

Japenese doesn't distinguish plural forms, and would have only widgets={0} widgets.

There are many other languages with very unusual (to an English speaker :) plural rules, and we need to be able to support all of them equally well.  GNU gettext has been doing this for a long time, but one limitation is numbering the states and there is no convention across apps for what those numbers mean and existing translations break if they add new states, such as when they correct the plural rules for Arabic.  ChoiceFormat can't solve the problem because you would have an infinite list of rules since the patterns repeat (for example, in Russian and Polish, among others).

Miroslav Pokorny

unread,
Dec 3, 2007, 4:26:43 AM12/3/07
to Google-Web-Tool...@googlegroups.com
Firstly it sounds like you should working at the UN.

On Dec 3, 2007 8:25 AM, John Tamplin <j...@google.com> wrote:
On Dec 2, 2007 3:54 PM, mP <miroslav...@gmail.com> wrote:
Wouldnt support for cascading property names be simplier than the

long  lists quoted above.

IE

how-many-balls-0 searches for how-many-balls-0 > how-many-balls-0 >
how-many-balls-n > how-many-balls.

and

how-many-balls-1 searches for how-many-balls-1 > how-many-balls-n >
how-many-balls.

or something like that. The long list quoted above seems like a lot of
noise and in effect is a simple language in itself.

I don't see how this solves the problem at hand.  What is needed is that each translated language is able to describe different sets of rules for choosing the correct plural form, and the corresponding text for each form.  Each language has different rules.  For example, Arabic (using right-to-left numbers):
  0 - none form
  1 - singular form
  2 - dual form
  3-10 - few form
  11-99 - many form
   other - plural form

In this case the property names are a good match. You can easily express
-0 for none
-x01  for one
-x1 for 11, 21, 31 .. 91.

I dont see how my proposal fails to express all these numbering cases. SOme languages will have a single entry as per japanese or a 2 for English or 6 for Arabic.
 

Russian uses the singular form if the count ends in 1 but not 11, the few form for numbers ending in 2-4 except for 12-14, and the plural for everything else.

So, in a property file for English, I would have:
widgets={0} widgets.
widgets[one]=A widget.

Arabic rules might result in (with English text):
widgets={0} widgets.
widgets[none]=No widgets.
widgets[one]=A widget.
widgets[two]=Both widgets.
widgets[few]={0} widgets, which are few.
widgets[many]={0} widgets which are many.

Japenese doesn't distinguish plural forms, and would have only widgets={0} widgets.

Yup.


There are many other languages with very unusual (to an English speaker :) plural rules, and we need to be able to support all of them equally well.  GNU gettext has been doing this for a long time, but one limitation is numbering the states and there is no convention across apps for what those numbers mean and existing translations break if they add new states, such as when they correct the plural rules for Arabic.  ChoiceFormat can't solve the problem because you would have an infinite list of rules since the patterns repeat (for example, in Russian and Polish, among others).  

--

John A. Tamplin
Software Engineer, Google





--
mP

John Tamplin

unread,
Dec 3, 2007, 11:34:10 AM12/3/07
to Google-Web-Tool...@googlegroups.com
On Dec 3, 2007 4:26 AM, Miroslav Pokorny <miroslav...@gmail.com> wrote:
I don't see how this solves the problem at hand.  What is needed is that each translated language is able to describe different sets of rules for choosing the correct plural form, and the corresponding text for each form.  Each language has different rules.  For example, Arabic (using right-to-left numbers):
  0 - none form
  1 - singular form
  2 - dual form
  3-10 - few form
  11-99 - many form
   other - plural form

In this case the property names are a good match. You can easily express
-0 for none
-x01  for one
-x1 for 11, 21, 31 .. 91.

I dont see how my proposal fails to express all these numbering cases. SOme languages will have a single entry as per japanese or a 2 for English or 6 for Arabic.

Exactly how does your proposal choose the proper string to use based on the value?  Are you suggesting for Arabic you have -3 -4 -5 -6 -7 -8 -9 -10 entries that all contain the same string?

Miroslav Pokorny

unread,
Dec 3, 2007, 3:25:49 PM12/3/07
to Google-Web-Tool...@googlegroups.com
For arabic you would need the following entries

hello-0=No widgets
hello-1=One widget.
hello-2=Two widgets
hello-x=Few widgets
hello-xx=Many widgets
hello=xxx=Lots of widgets

x becomes a wildcard for any digit.
--
mP

Miroslav Pokorny

unread,
Dec 3, 2007, 3:26:10 PM12/3/07
to Google-Web-Tool...@googlegroups.com
On Dec 4, 2007 7:25 AM, Miroslav Pokorny <miroslav...@gmail.com> wrote:
For arabic you would need the following entries

hello-0=No widgets
hello-1=One widget.
hello-2=Two widgets
hello-x=Few widgets
hello-xx=Many widgets
hello-xxx=Lots of widgets


x becomes a wildcard for any digit.


On Dec 4, 2007 3:34 AM, John Tamplin <j...@google.com> wrote:
On Dec 3, 2007 4:26 AM, Miroslav Pokorny <miroslav...@gmail.com> wrote:
I don't see how this solves the problem at hand.  What is needed is that each translated language is able to describe different sets of rules for choosing the correct plural form, and the corresponding text for each form.  Each language has different rules.  For example, Arabic (using right-to-left numbers):
  0 - none form
  1 - singular form
  2 - dual form
  3-10 - few form
  11-99 - many form
   other - plural form

In this case the property names are a good match. You can easily express
-0 for none
-x01  for one
-x1 for 11, 21, 31 .. 91.

I dont see how my proposal fails to express all these numbering cases. SOme languages will have a single entry as per japanese or a 2 for English or 6 for Arabic.

Exactly how does your proposal choose the proper string to use based on the value?  Are you suggesting for Arabic you have -3 -4 -5 -6 -7 -8 -9 -10 entries that all contain the same string?

--
John A. Tamplin
Software Engineer, Google





--
mP



--
mP

John Tamplin

unread,
Dec 3, 2007, 4:55:39 PM12/3/07
to Google-Web-Tool...@googlegroups.com
On Dec 3, 2007 3:25 PM, Miroslav Pokorny <miroslav...@gmail.com> wrote:
For arabic you would need the following entries

hello-0=No widgets
hello-1=One widget.
hello-2=Two widgets
hello-x=Few widgets
hello-xx=Many widgets
hello=xxx=Lots of widgets

x becomes a wildcard for any digit.

I don't believe this can represent all the necessary plural forms -- for example, in this case you would be choosing the Many widgets form for 10, which is incorrect.  Aside from that, you are now requiring the translators, who generally are linguists not programmers, to encode the plural rules in the name of the string.

I'm not sure why you are wanting to simplify the solution below the point where it no longer solves the problem at hand.  Plural forms have been around for a while and people have developed ways to handle them properly, and they all involve some form of arbitrary expression support to choose the proper form.

Miroslav Pokorny

unread,
Dec 4, 2007, 2:49:49 AM12/4/07
to Google-Web-Tool...@googlegroups.com
On Dec 4, 2007 8:55 AM, John Tamplin <j...@google.com> wrote:
On Dec 3, 2007 3:25 PM, Miroslav Pokorny <miroslav...@gmail.com> wrote:
For arabic you would need the following entries

hello-0=No widgets
hello-1=One widget.
hello-2=Two widgets
hello-x=Few widgets
hello-xx=Many widgets
hello=xxx=Lots of widgets

x becomes a wildcard for any digit.

I don't believe this can represent all the necessary plural forms -- for example, in this case you would be choosing the Many widgets form for 10, which is incorrect.  Aside from that, you are now requiring the translators, who generally are linguists not programmers, to encode the plural rules in the name of the string.

And how is using annotation any easier ? The last thing a non programmer should be touching is a *.java.

I think a properties file which is really a textfile in the above format is easier to work with, its got no code and is just text. There is no need for multuiple classes to hold annotated values except for a single interface with methods which map back to entries in the properties file.
 

I'm not sure why you are wanting to simplify the solution below the point where it no longer solves the problem at hand.  Plural forms have been around for a while and people have developed ways to handle them properly, and they all involve some form of arbitrary expression support to choose the proper

My example  was rushed but it shows that you can specify multiple entries in a properties file as a way to map plural rules to messages. Imho this a lot more understandable than the nasty long annotation thing that you presented above.
If the current idea doesnt support all combos im sure it can be extended to support ranges of numbers instead of plain old number masks.
 

--
John A. Tamplin
Software Engineer, Google





--
mP

Ray Cromwell

unread,
Dec 4, 2007, 3:12:23 AM12/4/07
to Google-Web-Tool...@googlegroups.com
On Dec 3, 2007 11:49 PM, Miroslav Pokorny <miroslav...@gmail.com> wrote:
>
>
> And how is using annotation any easier ? The last thing a non programmer
> should be touching is a *.java.
>

I think John is suggesting that all of the rules for every locale be
written once, so that no one need write the rules. Granted, anything
that can be encoded as an annotation can be encoded as a property.
Annotations have the advantage of working better with Java tool chains
(refactoring, etc) plus GWT generators out of the box, whereas
property files would require a little extra step to trigger the
generator.

Miroslav Pokorny

unread,
Dec 4, 2007, 3:27:42 AM12/4/07
to Google-Web-Tool...@googlegroups.com
On Dec 4, 2007 7:12 PM, Ray Cromwell <cromw...@gmail.com> wrote:

On Dec 3, 2007 11:49 PM, Miroslav Pokorny <miroslav...@gmail.com> wrote:
>
>
> And how is using annotation any easier ? The last thing a non programmer
> should be touching is a *.java.
>

I think John is suggesting that all of the rules for every locale be
written once, so that no one need write the rules. Granted, anything

With properties files you would only be writing the rules once. Each language would get its own properties file and have the same entries with their own set of plural entries.
 

that can be encoded as an annotation can be encoded as a property.
Annotations have the advantage of working better with Java tool chains
(refactoring, etc) plus GWT generators out of the box, whereas

Refactoring has nothing to do with this. This isnt a code artefact like a method. Its text pure and simple you cant list all the rules for all the lingos for each method on the message definition interface.
 

property files would require a little extra step to trigger the
generator.

No you would have an interface that  is simply a placeholder.

Say we have an interface lets call it MessageInterface.

i might create a sub interface called foo.Bar which extends MI.

in foo.Bar i will have some properties called
foo.Bar.properties - this is a default
foo.Bar_en.properties - holds english
foo.Bar_es.properties - holds spanish and so on...as per resource bundle naming.

 






--
mP

John Tamplin

unread,
Dec 4, 2007, 10:25:11 AM12/4/07
to Google-Web-Tool...@googlegroups.com
On Dec 4, 2007 3:27 AM, Miroslav Pokorny <miroslav...@gmail.com> wrote:
I think John is suggesting that all of the rules for every locale be
written once, so that no one need write the rules. Granted, anything

With properties files you would only be writing the rules once. Each language would get its own properties file and have the same entries with their own set of plural entries.

Where in the properties files would you put the rules used to choose each plural form?  Your proposal of a simple suffix is not sufficient to meet the needs of all the languages, so if you use property files you would have to have an additional property or a separate property file to specify the plural form rules.  As properties files are not structured, the only way to do this is via naming convention, and even then you will likely confuse translators who receive a file containing "code" for choosing the rules.  If you put them in a different file, you will have to choose that file by naming convention and filter out those files from what you send to the translator.

Aside from these problems, properties files do not allow us to supply the metadata necessary to allow translators to do a correct translation.  Even if we impose a structured comment format, other translation tools are not going to follow that format so much of the data will not make it to the translator, and even if it does the best we can hope for is that our structured comment simply shows up as a general comment in the translation tool.
 
Refactoring has nothing to do with this. This isnt a code artefact like a method.

Say you want to move a message from one bundle to another, isn't that using your refactoring tools?  If the metadata is in annotations, they will get moved to the new interface, but if it is in a properties file you will have to go move that yourself.
 
Its text pure and simple you cant list all the rules for all the lingos for each method on the message definition interface.
 
You aren't.  The idea is you are already writing the Java code, and rather than having to specify the default text in a different file which has to be separately maintained and does not support structured metadata, you keep that data in-line with the source.  While you could write other Java files to contain the translated text, the intention is that you would still use property files or XLIFF files (and maybe GNU gettext .po files) to hold the translated strings. 

Regardless, all of this discussion has been on the one piece which I indicated would definitely not make 1.5, the plural forms.  The intent of mentioning them was to show that we have sufficient flexibility to add them in the future without changing anything else about the API.  Can we discuss the merits of the rest of the proposal?

bba...@gmail.com

unread,
Dec 4, 2007, 7:21:31 PM12/4/07
to Google Web Toolkit Contributors
John,

I think this proposal would put a lot of power in the hands of the
developer regarding i18n. The general direction is definitely good.
Here are my initial thoughts for discussion:

* The distinction between @Description and @Meaning is a bit fuzzy. I
noticed that one translates to <note annotates="source"> while the
other translates to <note annotates="general"> in your XLIFF example.
However, I am not sure that the distinction would be clear to the GWT
developer. Also, it does not seem necessary to have both. If one
description may not be seen by the translator, what good is it? It
might as well just be an inline comment in the source code. However,
having both options opens up the possibility that a GWT developer
could expect his @Descriptions to be shown to the translator, without
realizing that only Meanings may be shown to the translator.

* What happens if @DefaultText is not specified for one of the message
functions? Would the lack of this annotation lead to a compiler error
or warning? Would the function default to return an empty string or
some default text? If so, would message key collision be an issue?

* My first reaction upon seeing your example code was something
like... "What the hell?" I have never seen code with such a liberal
use of annotations, though this may speak more to my lack of
experience than anything. I do not know whether this is a real
problem, but I wonder if this format is a bit too foreign to the
average Java developer. Does anyone else think this is a concern?
(If not, just ignore this comment, because it may just be me.)

Brett Bavar
bba...@google.com

On Dec 4, 7:25 am, "John Tamplin" <j...@google.com> wrote:
> indicated would definitely *not* make 1.5, the plural forms. The intent of

John Tamplin

unread,
Dec 4, 2007, 8:47:03 PM12/4/07
to Google-Web-Tool...@googlegroups.com
On Dec 4, 2007 4:21 PM, bba...@gmail.com <bba...@gmail.com> wrote:
I think this proposal would put a lot of power in the hands of the
developer regarding i18n.  The general direction is definitely good.
Here are my initial thoughts for discussion:

* The distinction between @Description and @Meaning is a bit fuzzy.  I
noticed that one translates to <note annotates="source"> while the
other translates to <note annotates="general"> in your XLIFF example.
However, I am not sure that the distinction would be clear to the GWT
developer.  Also, it does not seem necessary to have both.  If one
description may not be seen by the translator, what good is it?  It
might as well just be an inline comment in the source code.  However,
having both options opens up the possibility that a GWT developer
could expect his @Descriptions to be shown to the translator, without
realizing that only Meanings may be shown to the translator.

I included the distinction for importing data from sources that have descriptions that clearly aren't meanings but are intended for the programmer selecting the correct method.  They could be imported as Javadoc comments or moved into the meaning, although I worry some of them I have seen might confuse the translator.
 
* What happens if @DefaultText is not specified for one of the message
functions?  Would the lack of this annotation lead to a compiler error
or warning?  Would the function default to return an empty string or
some default text?  If so, would message key collision be an issue?

The compiler will give an error if it can't find a default translation, just as if you left one out of a properties file.  You could have them in different sources (ie, some in annotations and some in property files), but I wouldn't expect that is a typical use.

* My first reaction upon seeing your example code was something
like... "What the hell?"  I have never seen code with such a liberal
use of annotations, though this may speak more to my lack of
experience than anything.  I do not know whether this is a real
problem, but I wonder if this format is a bit too foreign to the
average Java developer.  Does anyone else think this is a concern?
(If not, just ignore this comment, because it may just be me.)

Users would still be able to use property files, but of course they can't take advantage of the additional metadata.

mP

unread,
Dec 5, 2007, 5:18:32 PM12/5/07
to Google Web Toolkit Contributors



On Dec 5, 12:47 pm, "John Tamplin" <j...@google.com> wrote:
> On Dec 4, 2007 4:21 PM, bba...@gmail.com <bba...@gmail.com> wrote:
>
> > I think this proposal would put a lot of power in the hands of the
> > developer regarding i18n. The general direction is definitely good.
> > Here are my initial thoughts for discussion:
>

Yup.

> > * The distinction between @Description and @Meaning is a bit fuzzy. I
> > noticed that one translates to <note annotates="source"> while the
> > other translates to <note annotates="general"> in your XLIFF example.
> > However, I am not sure that the distinction would be clear to the GWT
> > developer. Also, it does not seem necessary to have both. If one
> > description may not be seen by the translator, what good is it? It
> > might as well just be an inline comment in the source code. However,
> > having both options opens up the possibility that a GWT developer
> > could expect his @Descriptions to be shown to the translator, without
> > realizing that only Meanings may be shown to the translator.
>
> I included the distinction for importing data from sources that have
> descriptions that clearly aren't meanings but are intended for the
> programmer selecting the correct method. They could be imported as Javadoc
> comments or moved into the meaning, although I worry some of them I have
> seen might confuse the translator.
>
> > * What happens if @DefaultText is not specified for one of the message
> > functions? Would the lack of this annotation lead to a compiler error
> > or warning? Would the function default to return an empty string or
> > some default text? If so, would message key collision be an issue?
>

The generator should complain as this is obviously a mistake, there is
nothing worse than defaults values if something is missing. Java
doesnt support default values. If you want to have default values for
a method you generally overload the method and pass default values to
the "main" method.

doSomething( int a, int b ); <-- main method.
doSomething( int a ){
doSomething( a, 42 ); <= 42 becomes a default for b.
}

> The compiler will give an error if it can't find a default translation, just
> as if you left one out of a properties file. You could have them in
> different sources (ie, some in annotations and some in property files), but
> I wouldn't expect that is a typical use.
>
> * My first reaction upon seeing your example code was something
>
> > like... "What the hell?" I have never seen code with such a liberal
> > use of annotations, though this may speak more to my lack of

The annotations listing is wayyy to long.

> > experience than anything. I do not know whether this is a real
> > problem, but I wonder if this format is a bit too foreign to the
> > average Java developer. Does anyone else think this is a concern?
> > (If not, just ignore this comment, because it may just be me.)
>
> Users would still be able to use property files, but of course they can't
> take advantage of the additional metadata.
>

Properties files are much cleaner.



> --
> John A. Tamplin
> Software Engineer, Google

Firstly i want to clear up exactly where i our difference of opinions
lies, because i think something got lost in translation. I will make
this as simple as possible for the sake of brevity.

John, wishes to use a variety of annotations to specify plural rules.
For n use this form of the message, for n+1 use that, for n+2 thru n
+10 use somethign else again.

I on the other hand would prefer if these same rules were expressed as
separate entries within a properties file. I am suggesting ( i have
improved on the way to express numbers again ) to use new entries for
each plural form. Basically the default message would simply be the
message key

/**
* @key = apple
*/
void somethingAboutApples( int count );

# properties file entries.
apple={0} apple.
apple-0=no apples // this gets selected when the parameter 0 is = 1
apple-1=one apple // this gets selected when the parameter 0 is =
1
apple-2-5=a few apples // this message is used when there are 2-5
apples.
apple-#=a few more apples // this message gets used when there are 6-9
apples.
apple-##=a few more more apples // this message gets used when there
are 10-19 apples.
apples-*=lots of apples

We can use edit masks, ranges etc in the suffix of the key to match a
particular plural count. Naturally each of these forms would need a
weight so that more specific forms override more generic, eg a literal
number should take precedence over *. The available chars one can use
to express a plural number can easily be expanded.

Specifying plural rules and the matching message is not a programming
task, it is part of the role of the person that also translates or
builds the messages themselves. This is why i am arguing that the rule
and the message should be together.

The only part that should be done by a developer is the java interface
because tehre has to be some bridge between java and your message
bundles.

I am lost with regards to the refactoring point, i dont see how it
applies. The message key annotation binds the method to something in
the properties file, it doesnt matter what happens in java land as
long as this bridge is not lost.

John Tamplin

unread,
Dec 5, 2007, 6:02:22 PM12/5/07
to Google Web Toolkit Contributors, Shanjian Li, Emily Crutcher, Rajeev Dayal
On Nov 30, 2007 4:20 PM, John Tamplin <j...@google.com> wrote:
Plural Forms
While supporting plural forms is not planned initially, here are some ideas as to how they might be supported.  We feel that more work needs to be done before we understand the problem well enough to choose a solution, but it seems the proposed annotation framework would be sufficiently flexible to allow a reasonable extension for plural forms in the future.

After thinking about it some more, playing with the implementation, and trying to use it in some real apps, I have changed a bit on how to solve plurals.

Example:

@DefaultLocale("en_US")
public interface MyMessages implements Messages {
  @DefaultText("{0} {1}")
  @PluralText({"one", "A {1}"})
  String countItems(@PluralCount int count, String item);
}

All that is necessary to denote a plural form is the @PluralCount annotation, and that argument can be any of the arguments to the function rather than just the first one (as long as it is of integral type).  The actual text can be stored in the annotations using @DefaultText and @PluralText, or they can be stored in properties or XLIFF files.  A missing plural form would be a compile-time warning (and the default text would be used), and a plural form not used in the current locale being compiled would also be a warning.

@PluralCount takes an optional parameter for the PluralRule implementation to use (as a class literal) -- if not supplied, the default rule will be used which should be sufficient for most needs.  This would be maintained as part of GWT just like the DateTimeFormatter classes.  PluralRule implements Localizable (and therefore gets generated using deferred binding) and returns a string for a given count to select the plural form to use.

When used with a properties file, the keys would look like:
countItems={0} {1}
countItems[one]=A {1}

Or for Arabic plural forms using English text:
countItems={0} {1}
countItems[zero]=No {1}
countItems[one]=A {1}
countItems[two]=Both {1}
countItems[few]={0} {1}, which are few
countItems[many]={0} {1}, which are many

Similar two-part encoding would be done in other formats.

mP

unread,
Dec 5, 2007, 9:54:44 PM12/5/07
to Google Web Toolkit Contributors


On Dec 6, 10:02 am, "John Tamplin" <j...@google.com> wrote:
> On Nov 30, 2007 4:20 PM, John Tamplin <j...@google.com> wrote:
>
> > *Plural Forms*
John can you please tell me why my idea is worse ?

Where and how does it fail, ?

Surely its simpler than what you have proposed.

With your approach the person doing the translation work needs to ask
a coder to verify what "few" or "many" are. This is overly complex
separating half the message in one and the definition of these plural
count words in another class. Why cant they be together ?

bba...@gmail.com

unread,
Dec 5, 2007, 9:58:23 PM12/5/07
to Google Web Toolkit Contributors
What purpose does the @PluralText annotation serve? Shouldn't the
text to be inserted in the plural form be determined by the PluralRule
implementation?

Brett Bavar
bba...@google.com

On Dec 5, 3:02 pm, "John Tamplin" <j...@google.com> wrote:
> On Nov 30, 2007 4:20 PM, John Tamplin <j...@google.com> wrote:
>
> > *Plural Forms*

Matthew Mastracci

unread,
Dec 5, 2007, 10:27:21 PM12/5/07
to Google Web Toolkit Contributors
mP,

I suggest you read the link below to see why you need placeholders
instead of just numbers, masks or ranges. The real problem is that
the effective set of ranges you'd need is infinite. This pretty much
rules out any sort of plural rules that use anything but full numeric
expressions that (including modulus). I had originally considered
this to be a good idea when I posted earlier this month, but some
research on the topic (and complex plural forms of European languages)
has settled that for myself.

http://doc.trolltech.com/qq/qq19-plurals.html

As an example, Polish uses this for its first, singular form:

n == 1

This rule for its plural form:

n % 10 >= 2
&& n % 10 <= 4
&& (n % 100 < 10
|| n % 100 > 20)

And yet another form for its remaining cases.

John Tamplin

unread,
Dec 6, 2007, 1:24:54 AM12/6/07
to Google-Web-Tool...@googlegroups.com
On Dec 5, 2007 6:54 PM, mP <miroslav...@gmail.com> wrote:
John can you please tell me why my idea is worse ?

I believe I have covered my objections in this thread.
 
Where and how does it fail, ?

For one, I do not believe the proposed mechanism for choosing a plural form is sufficient for all languages.  Second, by putting the "code" for choosing the plural form into the property name, you are leaving that up to the translator to properly encode the messages.  Translators know languages, not some arbitrary code for describing the plural form to use.  Also, your suggestion would lead to duplicated strings, both requiring extra work for translators, more costs for companies paying for translations, and more likelihood of error.  Finally, the coding of which plural forms to use for each message are duplicated across all the messages and all the different translation units, when it will be very rare that you need non-default plural rules -- much better to define and maintain them in exactly one place (especially one that is maintained as part of the toolkit with an i18n team behind it as opposed to every application).
 
Surely its simpler than what you have proposed.

Sure.  Not supporting plural forms is even simpler, but doesn't solve the problem.  Everything should be as simple as possible but no simpler.
 
With your approach the person doing the translation work needs to ask
a coder to verify what "few" or "many" are. This is overly complex
separating half the message in one and the definition of these plural
count words in another class. Why cant they be together ?
 
See above.

John Tamplin

unread,
Dec 6, 2007, 1:27:03 AM12/6/07
to Google-Web-Tool...@googlegroups.com
On Dec 5, 2007 6:58 PM, bba...@gmail.com <bba...@gmail.com> wrote:

What purpose does the @PluralText annotation serve?  Shouldn't the
text to be inserted in the plural form be determined by the PluralRule
implementation?

The plural rule selects which plural form to be used.  The PluralText is the default text to be used for those plural forms.  The point of DefaultText, PluralText, Meaning, and Description are if you want to maintain the default translation in the Java file rather than a separate properties file (especially since properties files have no way of encoding the additional metadata), but it would certainly work if you wanted to do it that way.

mP

unread,
Dec 6, 2007, 4:31:29 PM12/6/07
to Google Web Toolkit Contributors


On Dec 6, 5:24 pm, "John Tamplin" <j...@google.com> wrote:
> On Dec 5, 2007 6:54 PM, mP <miroslav.poko...@gmail.com> wrote:
>
> > John can you please tell me why my idea is worse ?
>
> I believe I have covered my objections in this thread.
>
> > Where and how does it fail, ?
>
> For one, I do not believe the proposed mechanism for choosing a plural form
> is sufficient for all languages. Second, by putting the "code" for choosing
> the plural form into the property name, you are leaving that up to the
> translator to properly encode the messages. Translators know languages, not

My main principal idea was trying to keep it simple and move the
"rules" out of java annotations into a text file. Your idea of
defining the plural rules once,(in your case in annotations )to
selectors thing is perfectly valid...

I will rephrase my question why cant the plural selectors be defined
in the properties file ?
one=1
two=2
few=3-10
many=11+

(perhaps all these should have some prefix or something to make them
easy to distinguish from regular message entries.)

... and later in the properties...
apple[one]=one apple.
apple[two]=two apples.
apple[few]=a few apples.
apple[many]=many apples.

> some arbitrary code for describing the plural form to use. Also, your
> suggestion would lead to duplicated strings, both requiring extra work for
> translators, more costs for companies paying for translations, and more
> likelihood of error. Finally, the coding of which plural forms to use for

How do they know which plural selectors are valid without looking at
the code or getting someone to look for them ?

If the translator ends up printing the list and then goes to do their
work why not just include the selectors in the properties file, even
if a coder setup the rules for them...

> each message are duplicated across all the messages and all the different
> translation units, when it will be very rare that you need non-default
> plural rules -- much better to define and maintain them in exactly one place
> (especially one that is maintained as part of the toolkit with an i18n team
> behind it as opposed to every application).
>
> > Surely its simpler than what you have proposed.
>
> Sure. Not supporting plural forms is even simpler, but doesn't solve the
> problem. Everything should be as simple as possible but no simpler.
>

:)

Ian Petersen

unread,
Dec 6, 2007, 4:43:10 PM12/6/07
to Google-Web-Tool...@googlegroups.com
On Dec 6, 2007 4:31 PM, mP <miroslav...@gmail.com> wrote:
> I will rephrase my question why cant the plural selectors be defined
> in the properties file ?

Putting the plural rules in the properties file means that the rules
are duplicated in a mutable form all over the place. Leaving them as
annotations in the GWT library code means that they're essentially
immutable and they're only in one place. Sounds like a case of DRY to
me and is, for me, an argument in favour of annotations all by itself,
but it also lowers the maintenance burden. If there's a bug in the
plural rules, then storing them in the properties files means updating
all the properties files, whereas storing them in the GWT library code
means upgrading and recompiling.

> If the translator ends up printing the list and then goes to do their
> work why not just include the selectors in the properties file, even
> if a coder setup the rules for them...

Seems like a recipe for disaster to me. A translator can't break the
build by scribbling on a print-out. He could break the build if he
inadvertently/maliciously changes one of the plural-rules properties.

Ian

--
Tired of pop-ups, security holes, and spyware?
Try Firefox: http://www.getfirefox.com

John Tamplin

unread,
Dec 6, 2007, 5:45:07 PM12/6/07
to Google-Web-Tool...@googlegroups.com
On Dec 6, 2007 1:43 PM, Ian Petersen <ispe...@gmail.com> wrote:
Putting the plural rules in the properties file means that the rules
are duplicated in a mutable form all over the place.  Leaving them as
annotations in the GWT library code means that they're essentially
immutable and they're only in one place.  Sounds like a case of DRY to
me and is, for me, an argument in favour of annotations all by itself,
but it also lowers the maintenance burden.  If there's a bug in the
plural rules, then storing them in the properties files means updating
all the properties files, whereas storing them in the GWT library code
means upgrading and recompiling.

Just a small correction: note that the rules are not annotations themselves, but simply referenced by an annotation.  In what I currently have, PluralRule is an interface, and DefaultRule (and DefaultRule_en, DefaultRule_ar, etc) are implementations of that interface.  The PluralCount annotation can reference a different PluralRule implementation if desired, but the default would be the GWT-supplied DefaultRule implementation.  If you need to define some custom plural rule, simple implement PluralRule yourself and reference it in the PluralCount annotation.  If there is demand, there could be a DefaultPluralRule annotation that changes the default for all methods.

John Tamplin

unread,
Dec 6, 2007, 6:27:17 PM12/6/07
to Google-Web-Tool...@googlegroups.com
On Dec 6, 2007 1:31 PM, mP <miroslav...@gmail.com> wrote:
My main principal idea was trying to keep it simple and move the
"rules" out of java annotations into a text file.  Your idea of
defining the plural rules once,(in your case in annotations )to
selectors thing is perfectly valid...

I will rephrase my question why cant the plural selectors be defined
in the properties file ?
one=1
two=2
few=3-10
many=11+

As I have said many times, I do not believe the simple forms you have proposed are sufficient to cover all languages.  It is true that you could define expressions that implement the plural rules and store those expressions in the property file.  You would also need a better naming scheme to avoid conflicts, but those are solvable.  You could also invent a way of delegating to common implementations (for example, many languages share common plural rules and it would be nice to implement and debug them once rather than n times).  Finally, you could decide to put them into one common property file rather than duplicate them in every app property file.  However, I do not see what you have accomplished other than shoe-horning the functionality into an inadequate file format.  Why invent an expression language with delegation when we already have such a language, and all our Java tools know how to use it rather than needing new tools?
 
How do they know which plural selectors are valid without looking at
the code or getting someone to look for them ?
 
The comments in the file (again probably not as useful in property files as they would be in more structured translation sources) would indicate which form, plus if you choose the proper names for the plural forms of each language they probably already know.  Ie, if I see message and message[singular] it is pretty obvious to me what is intended.

As an example of an XLIFF snippet that might go to a translator:

<? xml version="1.0" ?>
<xliff version="1.2">
 <file original="org/example/MyMessages.java" source-language="en-US"
    target-language="ar-AR datatype="x-gwt-java">
  <body>
    <trans-unit id="widgetCount">
      <source>There are <ph id="0">WIDGET_COUNT</ph> widgets</source>
    </trans-unit>
    <group restype="x-gwt-plural">
      <trans-unit id="widgetCount[none]">
        <source>There are <ph id="0">WIDGET_COUNT</ph> widgets </source>
        <note annotates="target">Count is 0</note>
      </trans-unit>
      <trans-unit id="widgetCount[one]">
        <source>There are <ph id="0">WIDGET_COUNT</ph> widgets </source>
        <note annotates="target">Count is 1</note>
      </trans-unit>
      <trans-unit id="widgetCount[two]">
        <source>There are <ph id="0">WIDGET_COUNT</ph> widgets </source>
        <note annotates="target">Count is 2</note>
      </trans-unit>
      <trans-unit id="widgetCount[few]">
        <source>There are <ph id="0">WIDGET_COUNT</ph> widgets </source>
        <note annotates="target">Count is between 3 and 10</note>
      </trans-unit>
      <trans-unit id="widgetCount[many]">
        <source>There are <ph id="0">WIDGET_COUNT</ph> widgets </source>
        <note annotates="target">Count is between 11 and 99</note>
      </trans-unit>
    </group>
   </body>
  </file>
</ xliff>


The target notes would probably be in Arabic, as would the actual names of the plural forms.  Also, the default text could be in the group with the plural forms, but the above structure mirrors the gettext representation in XLIFF.

If the translator ends up printing the list and then goes to do their
work why not just include the selectors in the properties file, even
if a coder setup the rules for them...

I fail to see what printing the file has to do with anything (not to mention that it seems unlikely in today's world).

I have to ask -- have you ever actually used any translation service or localized an application?  If you have a use case that my proposal doesn't address or is overly complex for, please post an actual example.  Otherwise, I do not believe continuing this discussion is fruitful.

mP

unread,
Dec 10, 2007, 7:55:24 PM12/10/07
to Google Web Toolkit Contributors


On Dec 7, 8:43 am, "Ian Petersen" <ispet...@gmail.com> wrote:
> On Dec 6, 2007 4:31 PM, mP <miroslav.poko...@gmail.com> wrote:
>
> > I will rephrase my question why cant the plural selectors be defined
> > in the properties file ?
>
> Putting the plural rules in the properties file means that the rules
> are duplicated in a mutable form all over the place. Leaving them as
> annotations in the GWT library code means that they're essentially
> immutable and they're only in one place. Sounds like a case of DRY to
> me and is, for me, an argument in favour of annotations all by itself,
> but it also lowers the maintenance burden. If there's a bug in the
> plural rules, then storing them in the properties files means updating
> all the properties files, whereas storing them in the GWT library code
> means upgrading and recompiling.
>
> > If the translator ends up printing the list and then goes to do their
> > work why not just include the selectors in the properties file, even
> > if a coder setup the rules for them...
>
> Seems like a recipe for disaster to me. A translator can't break the
> build by scribbling on a print-out. He could break the build if he
> inadvertently/maliciously changes one of the plural-rules properties.
>

Either way the translator has to have some knowledge of whats going
on. They after all will describing the rules for a developer to
translate into the annotated rules(john) or text in properties files
(my proposal).

We cant expect they will just edit the messages portion and the
developer does their bit and magically there will never be problems
because a developer is involved.

John Tamplin

unread,
Dec 10, 2007, 10:49:14 PM12/10/07
to Shanjian Li, Google Web Toolkit Contributors, Emily Crutcher, Rajeev Dayal
On Dec 10, 2007 9:27 PM, Shanjian Li <shan...@google.com> wrote:
I got some time to chew your specification carefully. In your example,

@GenerateFile( Xliff.class, "MyMessages.xliff")
should be
@Generate(Xliff.class, "MyMessages.xliff")
to be consistent with earlier specification.

That is what is currently in place in changes/jat/i18n, other than changing the class literal to be a string containing a fully-qualified class name (since the MessageCatalogFormat implementation is almost certainly not translatable code and is only needed at generate time rather than in the compiled code).
 
This part:
  @DefaultText("The amount due is {0,number,currency}.")
  String amountDue(@Example("$5.00") @Replace("{0,number,currency}" amount);

is also unclear to me. Is there any typo here? After stripping annotation, Java code should remain valid, right? The above code does not seem so.

There should be a double in there for the type.  Also, I have since decided that @Replace is unnecessary as in all cases the generator can properly figure out which strings in the source text correspond with the argument.
 
John, did you consider dropping key generator part? User can either provide a key, or use the key GWT generating. I couldn't foresee a user case where user need to generate key in their own way. That only create more incompatibilities among GWT code. If everybody is using the same scheme, sharing messages among applications will be a breeze.

Anyone who has an existing translation system that aggregates translation strings across multiple applications (including non-GWT cases) will need that functionality, such as internal to Google.
 
To be more restrictive, I don't see if we ever want to support message format besides property file and xliff. One option is to leave those choice to the tools. Again, that will bring benefit to message sharing.

Supporting other formats only through translation to properties and xliff files is certain to lose information.  For example, supporting GNU gettext that way will lead to loss of plural form information unless we carefully structure XLIFF comments.  In addition we need to be able to support our internal format.

Rajeev Dayal

unread,
Dec 13, 2007, 3:17:41 PM12/13/07
to John Tamplin, Shanjian Li, Google Web Toolkit Contributors, Emily Crutcher
Hi John,

Sorry I am so late in jumping in on this. Thanks for taking the time to put this RR together. Looks like you put a lot of thought into it.

-I generally like the specification that you have come up with, but I would prefer it if the developer could use less annotations where possible. With that in mind, here are my comments/questions?

-Is the @Generate annotation optional? If not, I think it should be. The default behavior would be to  generate a properties file in the same directory in which the MyMessages interface is found. If you agree with this change, the behavior when the @Lookup annotation is unspecified should be changed so that .properties files take precedence over .xlf files.

-We should get rid of the @Description annotation. If the MyMessages interface is being generated by some tool, the description of each method can be inserted- as a javadoc comment.

Take these next two comments with a grain of salt, as I am not a huge fan of the way parameter-level annotations look:

-Instead of having the @Optional annotation, why not force the user to declare another interface method with the alternative number of parameters? This would lead to less ambiguity.

-Instead of having the @Example annotation, could we have the @DefaultText look something like: Access Denied: {John} does not have access to {autoexec.bat}.

Finally, some inline comments on the discussion that you and Shanjian were havingL

John, did you consider dropping key generator part? User can either provide a key, or use the key GWT generating. I couldn't foresee a user case where user need to generate key in their own way. That only create more incompatibilities among GWT code. If everybody is using the same scheme, sharing messages among applications will be a breeze.

Anyone who has an existing translation system that aggregates translation strings across multiple applications (including non-GWT cases) will need that functionality, such as internal to Google.

@Shanjian: But, there are two cases that result when the user does not specify @key. Either the user wants the method name to be used as the key, or the user want to have GWT generate a key for them.

@John: Is there some way that we could make it easy to specify that the keys be generated using a default hash function that we've come up with? Maybe if the user specifies @GenerateKeysUsing with no arguments, then it uses a default hash function that we've come up with. Otherwise, if they pass in an argument, then we use the key generator that they specified?


Thanks,
Rajeev
 




John Tamplin

unread,
Dec 13, 2007, 3:29:44 PM12/13/07
to Rajeev Dayal, Shanjian Li, Google Web Toolkit Contributors, Emily Crutcher
On Dec 13, 2007 3:17 PM, Rajeev Dayal <rda...@google.com> wrote:
Sorry I am so late in jumping in on this. Thanks for taking the time to put this RR together. Looks like you put a lot of thought into it.

-I generally like the specification that you have come up with, but I would prefer it if the developer could use less annotations where possible. With that in mind, here are my comments/questions?

-Is the @Generate annotation optional? If not, I think it should be. The default behavior would be to  generate a properties file in the same directory in which the MyMessages interface is found. If you agree with this change, the behavior when the @Lookup annotation is unspecified should be changed so that .properties files take precedence over .xlf files.

The problem is what if the .java file was itself generated from some other source -- in that case you do not want to generate an output file.  @Generate, as currently specified, is optional in that if it isn't supplied no file is generated.
 
-We should get rid of the @Description annotation. If the MyMessages interface is being generated by some tool, the description of each method can be inserted- as a javadoc comment.

Discussed FTF, but basically necessary to supply it to external formats that have a description field.  It isn't required, so if the user doesn't have a need for it they don't need to use it.
 
Take these next two comments with a grain of salt, as I am not a huge fan of the way parameter-level annotations look:

You don't have to use them, other than @Optional if you really need that functionality and @PluralCount if you need plural forms.
 
-Instead of having the @Optional annotation, why not force the user to declare another interface method with the alternative number of parameters? This would lead to less ambiguity.

What if some translations need it and others don't?  Then you would force the developer to know which version to use.  There have been requests both internally and externally (such as at the GWT conference) for the ability to not give an error if a parameter is not used in some translation.
 
-Instead of having the @Example annotation, could we have the @DefaultText look something like: Access Denied: {John} does not have access to {autoexec.bat}.

Then how does it know which parameter to fill in (note that the order is frequently changed in the translated strings), what special formatting should be supplied, etc?

@John: Is there some way that we could make it easy to specify that the keys be generated using a default hash function that we've come up with? Maybe if the user specifies @GenerateKeysUsing with no arguments, then it uses a default hash function that we've come up with. Otherwise, if they pass in an argument, then we use the key generator that they specified?

Sure, we could have it default to MD5 if @GenerateKeysUsing is specifed but no argument is supplied.  In that case, the name probably needs to change.
Reply all
Reply to author
Forward
0 new messages