Localization support

61 views
Skip to first unread message

Alexandre poirot

unread,
Oct 31, 2011, 11:42:04 AM10/31/11
to mozilla-labs-jetpack
Hi Developers, localizers, and any addon contributors,

I've just published a description of our work on supporting localization in jetpack:
  http://blog.techno-barje.fr/post/2011/10/31/jetpack-localization/

I'm looking for feedback in order to ensure choosing the right balance between
ease for developers and simplicity for localizers.

We are about to land a 100% local way to localize addons,
then we will open an online tool similar to babelzilla but for jetpack addons.


Feel free to comment on this thread if you have any question or concern!

You may follow this work on bug 691782:
https://bugzilla.mozilla.org/show_bug.cgi?id=691782


++
Alex

Jeff Griffiths

unread,
Oct 31, 2011, 3:20:26 PM10/31/11
to mozilla-la...@googlegroups.com
If you are attending MozCamp this year either in either Berlin or Kuala
Lumpur, we will be giving a session on SDK localization and would be
happy to get feedback form the community on our approach.

cheers, Jeff

> --
> You received this message because you are subscribed to the Google
> Groups "mozilla-labs-jetpack" group.
> To post to this group, send email to mozilla-la...@googlegroups.com.
> To unsubscribe from this group, send email to
> mozilla-labs-jet...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/mozilla-labs-jetpack?hl=en.

Jeff Griffiths

unread,
Oct 31, 2011, 4:23:53 PM10/31/11
to mozilla-la...@googlegroups.com, Alexandre poirot
On 11-10-31 11:42 AM, Alexandre poirot wrote:
> Hi Developers, localizers, and any addon contributors,
>
> I've just published a description of our work on supporting localization
> in jetpack:
> http://blog.techno-barje.fr/post/2011/10/31/jetpack-localization/
>
> I'm looking for feedback in order to ensure choosing the right balance
> between ease for developers and simplicity for localizers.

My concern is that you are enshrining a convention into the process (
const _ = require("l10n").get; ) that is not very descriptive, *and* has
some confusing history to it. Had you considered something slightly
longer like:

var l10n = require("l10n").get;
var cm = require("context-menu");
cm.Item({
label: l10n("My Menu Item"),
context: cm.URLContext("*.mozilla.org")
});

var localized = l10n("Translate this!");

I know, it's 3 more bytes! It is also a bit more self-explanatory. On a
more personal note I have a particular allergy to using punctuation.
Saying 'underscore' when explaining the code to someone both feels
awkward to me, and also has no connotative link with localization.

Jeff

Bogomil Shopov

unread,
Nov 1, 2011, 5:18:04 AM11/1/11
to mozilla-la...@googlegroups.com
On 31 October 2011 20:20, Jeff Griffiths <jgrif...@mozilla.com> wrote:
If you are attending MozCamp this year either in either Berlin or Kuala Lumpur, we will be giving a session on SDK localization and would be happy to get feedback form the community on our approach.


Looking forward to it :)

cheers, Jeff


On 11-10-31 11:42 AM, Alexandre poirot wrote:
Hi Developers, localizers, and any addon contributors,

I've just published a description of our work on supporting localization
in jetpack:
http://blog.techno-barje.fr/post/2011/10/31/jetpack-localization/

I'm looking for feedback in order to ensure choosing the right balance
between
ease for developers and simplicity for localizers.

We are about to land a 100% local way to localize addons,
then we will open an online tool similar to babelzilla but for jetpack
addons.


Feel free to comment on this thread if you have any question or concern!

You may follow this work on bug 691782:
https://bugzilla.mozilla.org/show_bug.cgi?id=691782


++
Alex

--
You received this message because you are subscribed to the Google
Groups "mozilla-labs-jetpack" group.
To post to this group, send email to mozilla-labs-jetpack@googlegroups.com.

To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/mozilla-labs-jetpack?hl=en.
--
You received this message because you are subscribed to the Google Groups "mozilla-labs-jetpack" group.
To post to this group, send email to mozilla-labs-jetpack@googlegroups.com.
To unsubscribe from this group, send email to mozilla-labs-jetpack+unsub...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/mozilla-labs-jetpack?hl=en.




--
Bogomil "Bogo" Shopov
Open(Web) Thinker and doer.
http://talkweb.eu


++
Education is the most powerful weapon which you can use to change the world.
Nelson Mandela

Jeff Griffiths

unread,
Nov 1, 2011, 12:08:43 PM11/1/11
to mozilla-la...@googlegroups.com
Ah, you're attending! excellent news, it will be good to meet in person.

cheers, Jeff

On 11-11-01 2:18 AM, Bogomil Shopov wrote:
>
>
> On 31 October 2011 20:20, Jeff Griffiths <jgrif...@mozilla.com

> <mailto:jgrif...@mozilla.com>> wrote:
>
> If you are attending MozCamp this year either in either Berlin or
> Kuala Lumpur, we will be giving a session on SDK localization and
> would be happy to get feedback form the community on our approach.
>
>
>
> Looking forward to it :)
>
>
> cheers, Jeff
>
>
> On 11-10-31 11:42 AM, Alexandre poirot wrote:
>
> Hi Developers, localizers, and any addon contributors,
>
> I've just published a description of our work on supporting
> localization
> in jetpack:

> http://blog.techno-barje.fr/__post/2011/10/31/jetpack-__localization/


> <http://blog.techno-barje.fr/post/2011/10/31/jetpack-localization/>
>
> I'm looking for feedback in order to ensure choosing the right
> balance
> between
> ease for developers and simplicity for localizers.
>
> We are about to land a 100% local way to localize addons,
> then we will open an online tool similar to babelzilla but for
> jetpack
> addons.
>
>
> Feel free to comment on this thread if you have any question or
> concern!
>
> You may follow this work on bug 691782:

> https://bugzilla.mozilla.org/__show_bug.cgi?id=691782


> <https://bugzilla.mozilla.org/show_bug.cgi?id=691782>
>
>
> ++
> Alex
>
> --
> You received this message because you are subscribed to the Google
> Groups "mozilla-labs-jetpack" group.
> To post to this group, send email to

> mozilla-labs-jetpack@__googlegroups.com
> <mailto:mozilla-la...@googlegroups.com>.


> To unsubscribe from this group, send email to

> mozilla-labs-jetp...@googlegroups.com
> <mailto:mozilla-labs-jetpack%2Bunsu...@googlegroups.com>.


> For more options, visit this group at

> http://groups.google.com/__group/mozilla-labs-jetpack?hl=__en
> <http://groups.google.com/group/mozilla-labs-jetpack?hl=en>.


>
>
> --
> You received this message because you are subscribed to the Google
> Groups "mozilla-labs-jetpack" group.
> To post to this group, send email to

> mozilla-labs-jetpack@__googlegroups.com
> <mailto:mozilla-la...@googlegroups.com>.


> To unsubscribe from this group, send email to

> mozilla-labs-jetp...@googlegroups.com
> <mailto:mozilla-labs-jetpack%2Bunsu...@googlegroups.com>.


> For more options, visit this group at

> http://groups.google.com/__group/mozilla-labs-jetpack?hl=__en
> <http://groups.google.com/group/mozilla-labs-jetpack?hl=en>.


>
>
>
>
> --
> Bogomil "Bogo" Shopov
> Open(Web) Thinker and doer.
> http://talkweb.eu
>
>
> ++
> Education is the most powerful weapon which you can use to change the world.
> Nelson Mandela
>

> --
> You received this message because you are subscribed to the Google
> Groups "mozilla-labs-jetpack" group.

> To post to this group, send email to mozilla-la...@googlegroups.com.


> To unsubscribe from this group, send email to

> mozilla-labs-jet...@googlegroups.com.

Bogomil Shopov

unread,
Nov 7, 2011, 9:56:56 AM11/7/11
to mozilla-la...@googlegroups.com
On 1 November 2011 17:08, Jeff Griffiths <jgrif...@mozilla.com> wrote:
Ah, you're attending! excellent news, it will be good to meet in person.

Sure, we can do that :)
 

       To unsubscribe from this group, send email to

       <mailto:mozilla-labs-jetpack%2Bunsu...@googlegroups.com>.

       For more options, visit this group at
       http://groups.google.com/__group/mozilla-labs-jetpack?hl=__en
       <http://groups.google.com/group/mozilla-labs-jetpack?hl=en>.



   --
   You received this message because you are subscribed to the Google
   Groups "mozilla-labs-jetpack" group.
   To post to this group, send email to
   mozilla-labs-jetpack@__googlegroups.com

   To unsubscribe from this group, send email to





--
Bogomil "Bogo" Shopov
Open(Web) Thinker and doer.
http://talkweb.eu


++
Education is the most powerful weapon which you can use to change the world.
Nelson Mandela

--
You received this message because you are subscribed to the Google
Groups "mozilla-labs-jetpack" group.
To post to this group, send email to mozilla-labs-jetpack@googlegroups.com.

To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/mozilla-labs-jetpack?hl=en.
--
You received this message because you are subscribed to the Google Groups "mozilla-labs-jetpack" group.
To post to this group, send email to mozilla-labs-jetpack@googlegroups.com.
To unsubscribe from this group, send email to mozilla-labs-jetpack+unsub...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/mozilla-labs-jetpack?hl=en.

Mark Hammond

unread,
Nov 9, 2011, 1:39:46 AM11/9/11
to mozilla-la...@googlegroups.com

I'm inclined to agree with this - while I've gotten used to _() in
gettext, it still leaves a bad taste in my mouth. I guess one
compromise would be to ensure any tools looking for the _() pattern are
configurable to people are free to use whatever convention they like.

Also, from the blog:

| console.log(_("Hello %s", "alex"));

I'm wondering if using positional strings will work in all cases - eg,
consider a phrase that has 2 (or more) substitutions, but some locales
require the order of those substitutions to be different than English.
IOW, I'm wondering if something like:

...("Hello %(name)s", {name: "alex"});

might be necessary?

And finally, will plurals be supported? eg,
https://github.com/Mardak/restartless/tree/examples/l10nDialogs supports
plurals directly which seems nicer than forcing 2 different string IDs
which are logically the same message.

Mark

Alexandre Poirot

unread,
Nov 9, 2011, 9:31:18 AM11/9/11
to mozilla-la...@googlegroups.com
Le 9 nov. 2011 à 07:39, Mark Hammond <skippy....@gmail.com> a
écrit :

The key thing is about gaining automatic compatibility with existing
l10n tools/libraries.
We will be able to build specific jetpack parser for our own tools. It
can be quite tricky to implement, but it is as complex as parsing
require statements.
So that I don't expect any other generic l10n tool to support this
flexible behavior.
Compared to traditional addons, we want tools that build templates and
synchronyse locales files by parsing source code for keys to
translate. This idea comes from gettext.

>
> Also, from the blog:
>
> | console.log(_("Hello %s", "alex"));
>
> I'm wondering if using positional strings will work in all cases -
> eg, consider a phrase that has 2 (or more) substitutions, but some
> locales require the order of those substitutions to be different
> than English. IOW, I'm wondering if something like:
>
> ...("Hello %(name)s", {name: "alex"});
>
> might be necessary?

I explicitely omit speaking about any "advanced" localization feature
as it may pollute this initial discussion about overall localization
pattern.
Here we have many options : %1s, #1$s, %name, %(name), ...
Again the main question here is: how disruptive we would like to be.
From what I've seen, %1s seems to be the most common scheme. But your
proposal is obviously cleaner and may help localizers figuring out
what a string means. Do you know if this syntax is used in any l10n
format?

>
> And finally, will plurals be supported? eg, https://github.com/Mardak/restartless/tree/examples/l10nDialogs
> supports plurals directly which seems nicer than forcing 2
> different string IDs which are logically the same message.

I'm currently working on it. I should end up having something similar
to your implementation.
I'll post an update when it is ready.

>
> Mark


>
> --
> You received this message because you are subscribed to the Google
> Groups "mozilla-labs-jetpack" group.

> To post to this group, send email to mozilla-la...@googlegroups.com
> .
> To unsubscribe from this group, send email to mozilla-labs-jet...@googlegroups.com
> .
> For more options, visit this group at http://groups.google.com/group/mozilla-labs-jetpack?hl=en
> .
>

Mark Hammond

unread,
Nov 9, 2011, 5:50:06 PM11/9/11
to mozilla-la...@googlegroups.com
On 10/11/2011 1:31 AM, Alexandre Poirot wrote:
> Le 9 nov. 2011 � 07:39, Mark Hammond <skippy....@gmail.com> a �crit :

The ability to reuse existing tools which support only _() is certainly
compelling, but I'm not sure how practical that will be. Are you saying
above that we will need to develop our own parser anyway? If so, then
we can probably build more flexibility into that. OTOH, if you are
saying existing parsers could work for Jetpack, I'd have to concede that
would be a big enough win to justify losing the flexibility.

> So that I don't expect any other generic l10n tool to support this
> flexible behavior.
> Compared to traditional addons, we want tools that build templates and
> synchronyse locales files by parsing source code for keys to translate.
> This idea comes from gettext.
>
>>
>> Also, from the blog:
>>
>> | console.log(_("Hello %s", "alex"));
>>
>> I'm wondering if using positional strings will work in all cases - eg,
>> consider a phrase that has 2 (or more) substitutions, but some locales
>> require the order of those substitutions to be different than English.
>> IOW, I'm wondering if something like:
>>
>> ...("Hello %(name)s", {name: "alex"});
>>
>> might be necessary?
>
> I explicitely omit speaking about any "advanced" localization feature as
> it may pollute this initial discussion about overall localization pattern.
> Here we have many options : %1s, #1$s, %name, %(name), ...
> Again the main question here is: how disruptive we would like to be.
> From what I've seen, %1s seems to be the most common scheme. But your
> proposal is obviously cleaner and may help localizers figuring out what
> a string means. Do you know if this syntax is used in any l10n format?

To be honest, I just stole that syntax from Python's string
interpolation - but my point wasn't about the specific syntax, but
instead was to question whether a "positional" scheme as originally
proposed would work OK.

IOW, a simple "%s" might not be good enough. "%1s" would be an
improvement and would address my specific concern, as would most of the
other suggestions (some of which I prefer more than the others, but this
isn't about my personal preferences ;)

>> And finally, will plurals be supported? eg,
>> https://github.com/Mardak/restartless/tree/examples/l10nDialogs
>> supports plurals directly which seems nicer than forcing 2 different
>> string IDs which are logically the same message.
>
> I'm currently working on it. I should end up having something similar to
> your implementation.
> I'll post an update when it is ready.

Just to be clear, that isn't my implementation (but it is an
implementation Firefox Share (aka F1) shamelessly stole :)

And while we are asking for ponies, one great feature would be the
ability to use this stuff in content scripts - such scripts can't
require() modules but may still present (part of) the UI for a Jetpack
and thus need l10n support.

Mark

Alexandre poirot

unread,
Nov 10, 2011, 1:44:36 PM11/10/11
to mozilla-la...@googlegroups.com
2011/11/9 Mark Hammond <skippy....@gmail.com>
On 10/11/2011 1:31 AM, Alexandre Poirot wrote:

We are going to build official implementation in cfx/python. But cfx is far from being convenient when you build a server side application. I'd like to encourage people to hack around the SDK and be able to build communities around custom tools. This scheme is very common around localization. You can simply look at how many online localization applications exists around mozilla:
  https://l10n.mozilla-community.org/narro/
  http://adofex.clear.com.ua/
  http://babelzilla.org/
  https://localize.mozilla.org/
I'm quite sure there is some others!
In my blogpost, I tried to highlight the two key advantages of gettext:
1. Easy to parse. It is simple to write a parser (or take an existing one) to read source code in order to generate or update locale files with new keys to translate. You just have to fetch all _(" ... ") instructions from your javascript files.
2. We can put hard coded localized strings in source code. It allows to simplify small addon localization by not requiring a locale file. (We still can use IDs which are a really better practice and should be highly suggested for any big addon)

Having said that, I'll mitigate all this to balance between developers, localizers and community!
And I think MozCamp is going to help me to take such decisions.


So that I don't expect any other generic l10n tool to support this
flexible behavior.
Compared to traditional addons, we want tools that build templates and
synchronyse locales files by parsing source code for keys to translate.
This idea comes from gettext.


Also, from the blog:

| console.log(_("Hello %s", "alex"));

I'm wondering if using positional strings will work in all cases - eg,
consider a phrase that has 2 (or more) substitutions, but some locales
require the order of those substitutions to be different than English.
IOW, I'm wondering if something like:

...("Hello %(name)s", {name: "alex"});

might be necessary?

I explicitely omit speaking about any "advanced" localization feature as
it may pollute this initial discussion about overall localization pattern.
Here we have many options : %1s, #1$s, %name, %(name), ...
Again the main question here is: how disruptive we would like to be.
 From what I've seen, %1s seems to be the most common scheme. But your
proposal is obviously cleaner and may help localizers figuring out what
a string means. Do you know if this syntax is used in any l10n format?

To be honest, I just stole that syntax from Python's string interpolation - but my point wasn't about the specific syntax, but instead was to question whether a "positional" scheme as originally proposed would work OK.

IOW, a simple "%s" might not be good enough.  "%1s" would be an improvement and would address my specific concern, as would most of the other suggestions (some of which I prefer more than the others, but this isn't about my personal preferences ;)


I've implemented %1s pattern:
https://github.com/ochameau/addon-sdk/blob/localization/packages/addon-kit/tests/test-l10n.js#L50
It is a first draft, I really like %name pattern. It looks more human!
 

And finally, will plurals be supported? eg,
https://github.com/Mardak/restartless/tree/examples/l10nDialogs
supports plurals directly which seems nicer than forcing 2 different
string IDs which are logically the same message.

I'm currently working on it. I should end up having something similar to
your implementation.
I'll post an update when it is ready.

Just to be clear, that isn't my implementation (but it is an implementation Firefox Share (aka F1) shamelessly stole :)

And while we are asking for ponies, one great feature would be the ability to use this stuff in content scripts - such scripts can't require() modules but may still present (part of) the UI for a Jetpack and thus need l10n support.


We really need l10n support in content scripts. Unfortunately it is far from being simple for multiple reasons:
- as you said, content script can't require modules. But we can easily expose `_` through `self`. (Again, if we allow using something else than `_`. It will complexify the parsing of keys to translate)
- content script files are not identifiable. We build a graph of dependencies for all common js modules, but we don't know what are content script files during "compile time". So that fetching l10n keys will be harder for CS.
- e10s: CS are going to be executed in another process. For performances reasons we will have to identify keys for each CS in order to load only necessary keys in each process. It may end up complexify locale files.

 
Mark




Mark

--
You received this message because you are subscribed to the Google
Groups "mozilla-labs-jetpack" group.
To post to this group, send email to

To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/mozilla-labs-jetpack?hl=en.

--
You received this message because you are subscribed to the Google Groups "mozilla-labs-jetpack" group.
To post to this group, send email to mozilla-labs-jetpack@googlegroups.com.
To unsubscribe from this group, send email to mozilla-labs-jetpack+unsub...@googlegroups.com.

Alexandre Poirot

unread,
Nov 17, 2011, 1:04:42 PM11/17/11
to mozilla-labs-jetpack
I've just send another note on my blog:
http://blog.techno-barje.fr/post/2011/11/17/jetpack-localization-yaml/
(I'm using my blog as it allows me to better show code and example
with nice layout and syntax highlighting!)

I had to address some issue reported during MozCamp.
JSON format was a bad idea, so I'm suggesting now to use YAML.
This simple format seems to fit perfectly localization needs, while
addressing issue of JSON.


Again, please jump in the discussion if you have some feedback to
share!

On 10 nov, 19:44, Alexandre poirot <poirot.a...@gmail.com> wrote:
> 2011/11/9 Mark Hammond <skippy.hamm...@gmail.com>
>
>
>
>
>
>
>
>
>
> > On 10/11/2011 1:31 AM, Alexandre Poirot wrote:
>
> >> Le 9 nov. 2011 à 07:39, Mark Hammond <skippy.hamm...@gmail.com> a écrit :
>
> >>  On 1/11/2011 7:23 AM, Jeff Griffiths wrote:
>
> >>>> On 11-10-31 11:42 AM, Alexandre poirot wrote:
>
> >>>>> Hi Developers, localizers, and any addon contributors,
>
> >>>>> I've just published a description of our work on supporting
> >>>>> localization
> >>>>> in jetpack:
> >>>>>http://blog.techno-barje.fr/**post/2011/10/31/jetpack-**localization/<http://blog.techno-barje.fr/post/2011/10/31/jetpack-localization/>
> I've implemented %1s pattern:https://github.com/ochameau/addon-sdk/blob/localization/packages/addo...
> It is a first draft, I really like %name pattern. It looks more human!
>
>
>
> >  And finally, will plurals be supported? eg,
> >>>https://github.com/Mardak/**restartless/tree/examples/**l10nDialogs<https://github.com/Mardak/restartless/tree/examples/l10nDialogs>
> >>> supports plurals directly which seems nicer than forcing 2 different
> >>> string IDs which are logically the same message.
>
> >> I'm currently working on it. I should end up having something similar to
> >> your implementation.
> >> I'll post an update when it is ready.
>
> > Just to be clear, that isn't my implementation (but it is an
> > implementation Firefox Share (aka F1) shamelessly stole :)
>
> I started working on plurals handling:https://github.com/ochameau/addon-sdk/blob/localization/packages/addo...
>
> > And while we are asking for ponies, one great feature would be the ability
> > to use this stuff in content scripts - such scripts can't require() modules
> > but may still present (part of) the UI for a Jetpack and thus need l10n
> > support.
>
> We really need l10n support in content scripts. Unfortunately it is far
> from being simple for multiple reasons:
> - as you said, content script can't require modules. But we can easily
> expose `_` through `self`. (Again, if we allow using something else than
> `_`. It will complexify the parsing of keys to translate)
> - content script files are not identifiable. We build a graph of
> dependencies for all common js modules, but we don't know what are content
> script files during "compile time". So that fetching l10n keys will be
> harder for CS.
> - e10s: CS are going to be executed in another process. For performances
> reasons we will have to identify keys for each CS in order to load only
> necessary keys in each process. It may end up complexify locale files.
>
>
>
>
>
>
>
> > Mark
>
> >>> Mark
>
> >>> --
> >>> You received this message because you are subscribed to the Google
> >>> Groups "mozilla-labs-jetpack" group.
> >>> To post to this group, send email to
> >>> mozilla-labs-jetpack@**googlegroups.com<mozilla-la...@googlegroups.com>
> >>> .
> >>> To unsubscribe from this group, send email to
> >>> mozilla-labs-jetpack+**unsub...@googlegroups.com<mozilla-labs-jetpack%2Bunsu...@googlegroups.com>
> >>> .
> >>> For more options, visit this group at
> >>>http://groups.google.com/**group/mozilla-labs-jetpack?hl=**en<http://groups.google.com/group/mozilla-labs-jetpack?hl=en>
> >>> .
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "mozilla-labs-jetpack" group.
> > To post to this group, send email to mozilla-labs-jetpack@**
> > googlegroups.com <mozilla-la...@googlegroups.com>.
> > To unsubscribe from this group, send email to mozilla-labs-jetpack+**
> > unsub...@googlegroups.com<mozilla-labs-jetpack%2Bunsu...@googlegroups.com>
> > .
> > For more options, visit this group athttp://groups.google.com/**
> > group/mozilla-labs-jetpack?hl=**en<http://groups.google.com/group/mozilla-labs-jetpack?hl=en>
> > .

Myk Melez

unread,
Nov 17, 2011, 1:56:02 PM11/17/11
to mozilla-la...@googlegroups.com, Alexandre Poirot
On 2011-11-17 10:04 AM, Alexandre Poirot wrote:
> JSON format was a bad idea, so I'm suggesting now to use YAML.
> This simple format seems to fit perfectly localization needs, while
> addressing issue of JSON.
A third option would be to use sanitized JS, i.e. something like JSON
but with explicit support for multiline strings and comments. That
avoids shipping yet another parser and dealing with YAML's various
intricacies (especially ones that don't obviously map to JS
representations, like mapping types that preserve key order) while still
enabling the features that localizers want and need.

-myk

Jeff Griffiths

unread,
Nov 17, 2011, 2:04:19 PM11/17/11
to mozilla-la...@googlegroups.com
Looks great! And ( from what I remember of the discussion last weekend
in Berlin ) this seems to cover the concerns people there had.

I assume there is some speed trade-off in parsing yaml vs JSON?

Jeff

On 11-11-17 10:04 AM, Alexandre Poirot wrote:
> I've just send another note on my blog:
> http://blog.techno-barje.fr/post/2011/11/17/jetpack-localization-yaml/
> (I'm using my blog as it allows me to better show code and example
> with nice layout and syntax highlighting!)
>
> I had to address some issue reported during MozCamp.
> JSON format was a bad idea, so I'm suggesting now to use YAML.
> This simple format seems to fit perfectly localization needs, while
> addressing issue of JSON.
>
>
> Again, please jump in the discussion if you have some feedback to
> share!
>
> On 10 nov, 19:44, Alexandre poirot<poirot.a...@gmail.com> wrote:
>> 2011/11/9 Mark Hammond<skippy.hamm...@gmail.com>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> On 10/11/2011 1:31 AM, Alexandre Poirot wrote:
>>

>>>> Le 9 nov. 2011 � 07:39, Mark Hammond<skippy.hamm...@gmail.com> a �crit :

Edward Lee

unread,
Nov 17, 2011, 2:24:06 PM11/17/11
to mozilla-la...@googlegroups.com
On Thursday, November 17, 2011 10:04:42 AM UTC-8, Alexandre Poirot wrote:
http://blog.techno-barje.fr/post/2011/11/17/jetpack-localization-yaml/
Your blog post uses this syntax of a named plural identifier:

pluralString
:
one: "%s telechargement" other: "%s telechargements"
 
How does this scale to different languages that have 0 or 6 plural forms? Is the identifier just sugar and ignored?

Jeff Griffiths

unread,
Nov 17, 2011, 4:49:51 PM11/17/11
to mozilla-la...@googlegroups.com

I had wondered about choosing some editor-friendly format that we then
compile down to standards-compliant JSON when we run or package. The
packager would strip out comments and add newline characters, etc. We
could support YAML and package the YAML data into JSON.

We would be shipping a parser in the SDK, sure, but the packager would
be used by the YAML parser at build time, not by Firefox at runtime.

Jeff

David Ascher

unread,
Nov 17, 2011, 4:51:53 PM11/17/11
to mozilla-la...@googlegroups.com, Jeff Griffiths
Trying again w/ the right email address.

I think YAML as an input format makes sense for people authoring in text
editors. I think JSON as the format that the SDK reads make sense (and
is trivial to generate from web authoring environments, etc.). Having a
simple tool to go from one to the other makes sense too.

--da

Myk Melez

unread,
Nov 17, 2011, 4:54:06 PM11/17/11
to mozilla-la...@googlegroups.com, Jeff Griffiths
On 2011-11-17 1:49 PM, Jeff Griffiths wrote:
> I had wondered about choosing some editor-friendly format that we then
> compile down to standards-compliant JSON when we run or package. The
> packager would strip out comments and add newline characters, etc. We
> could support YAML and package the YAML data into JSON.
Indeed, but we could also just use JSON with comments.

> We would be shipping a parser in the SDK, sure, but the packager would
> be used by the YAML parser at build time, not by Firefox at runtime.

Parsing YAML at build time still means landing and maintaining another
parser, even if we aren't doing the parsing at build time.

-myk

Alexandre poirot

unread,
Nov 17, 2011, 5:01:49 PM11/17/11
to mozilla-la...@googlegroups.com
2011/11/17 Jeff Griffiths <jgrif...@mozilla.com>

Looks great! And ( from what I remember of the discussion last weekend in Berlin ) this seems to cover the concerns people there had.

I assume there is some speed trade-off in parsing yaml vs JSON?
 
Not that much. As I just said to myk, cfx read these YAML files and build JSON files that are shiped into XPI and read by l10n module.

Alexandre poirot

unread,
Nov 17, 2011, 5:10:14 PM11/17/11
to mozilla-la...@googlegroups.com
2011/11/17 Edward Lee <edi...@gmail.com>

I gave you the answer on irc, but some other folks may be interested too.

There is list of all possible plural forms: zero, one, two, few, many, other.
Then for each language, you will have to define a specific set of those.
For english, you only have one and many. For arabic, you have all of them!

You can find more information about this, here:
  http://cldr.unicode.org/index/cldr-spec/plural-rules


FYI: My current work in progress on github doesn't implement plurals correctly.

Jeff Griffiths

unread,
Nov 17, 2011, 5:40:13 PM11/17/11
to Myk Melez, mozilla-la...@googlegroups.com
On 11-11-17 1:54 PM, Myk Melez wrote:
> On 2011-11-17 1:49 PM, Jeff Griffiths wrote:
>> I had wondered about choosing some editor-friendly format that we then
>> compile down to standards-compliant JSON when we run or package. The
>> packager would strip out comments and add newline characters, etc. We
>> could support YAML and package the YAML data into JSON.
> Indeed, but we could also just use JSON with comments.

Real JSON ( as opposed to what the Mozilla codebase tolerates ) does not
support comments. If someone was going to create some other tool to
handle our particular JSON strain, they could not use JSON libraries in
other runtimes ( eg Python, PHP, etc )

>> We would be shipping a parser in the SDK, sure, but the packager would
>> be used by the YAML parser at build time, not by Firefox at runtime.
> Parsing YAML at build time still means landing and maintaining another
> parser, even if we aren't doing the parsing at build time.

That is definitely the draw-back.

Irakli Gozalishvili

unread,
Nov 17, 2011, 8:51:14 PM11/17/11
to mozilla-la...@googlegroups.com, Alexandre Poirot

On Thursday, 2011-11-17 at 10:56 , Myk Melez wrote:

On 2011-11-17 10:04 AM, Alexandre Poirot wrote:
JSON format was a bad idea, so I'm suggesting now to use YAML.
This simple format seems to fit perfectly localization needs, while
addressing issue of JSON.
A third option would be to use sanitized JS, i.e. something like JSON
but with explicit support for multiline strings and comments.

It might be worth looking at quasis that are planned for ES.next and also checking how soon are they going to be shipped in spidermonkey. 

 
That
avoids shipping yet another parser and dealing with YAML's various
intricacies (especially ones that don't obviously map to JS
representations, like mapping types that preserve key order) while still
enabling the features that localizers want and need.

-myk

--
You received this message because you are subscribed to the Google Groups "mozilla-labs-jetpack" group.
To post to this group, send email to mozilla-la...@googlegroups.com.
To unsubscribe from this group, send email to mozilla-labs-jet...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mozilla-labs-jetpack?hl=en.

Irakli Gozalishvili

unread,
Nov 18, 2011, 2:32:15 PM11/18/11
to mozilla-la...@googlegroups.com, Alexandre Poirot
I just run into github get localization github hook https://github.com/mozilla/addon-sdk/admin/hooks that seems like a very nice way to get a localization it also works with online service http://www.getlocalization.com/

It might be worth looking at! 

Alexandre poirot

unread,
Nov 21, 2011, 1:02:50 PM11/21/11
to mozilla-la...@googlegroups.com
2011/11/17 Jeff Griffiths <jgrif...@mozilla.com>

On 11-11-17 1:54 PM, Myk Melez wrote:
On 2011-11-17 1:49 PM, Jeff Griffiths wrote:
I had wondered about choosing some editor-friendly format that we then
compile down to standards-compliant JSON when we run or package. The
packager would strip out comments and add newline characters, etc. We
could support YAML and package the YAML data into JSON.
Indeed, but we could also just use JSON with comments.

Real JSON ( as opposed to what the Mozilla codebase tolerates ) does not support comments. If someone was going to create some other tool to handle our particular JSON strain, they could not use JSON libraries in other runtimes ( eg Python, PHP, etc )
 
I think that Myk is suggesting to add comments as string values, like chrome extensions:
http://src.chromium.org/viewvc/chrome/trunk/src/chrome/common/extensions/docs/examples/extensions/news_i18n/_locales/en/messages.json?content-type=text/plain
So that we end up using 100% regular JSON.
But this format will just prevent half of our localizer to start looking at jetpack. I received clear negative messages about such format. Most localizers do not have any development background, some of them are just people fluent in multiple languages, passionate and willing to contribute. So requiring them to understand a structured data format like JSON with nested objects is not realistic.
I'm convinced that you do not need to know what is an object or an attribute in order to write a YAML file that I suggested:
  # comments
  key: value
If we do not start using advanced l10n feature like plurals, it will be as simple as property files. Even easier when you need multiline string. And you do not have to learn what is `{`, nor `,`. And above all, if you compare to chrome format, you can't destroy the whole file by omitting a simple comma. Having to debug such complex JSON file can be tricky when you look at what JSON parser tells you!
So yes, I agree they will have to learn something in any case, but I think that YAML avoid any noise, so that localizer won't feel writing code or complex stuff as YAML can be way simplier than JSON with all issues addressed (comments, multiline)

Then there is the "yet another new format" argument and I tend to agree on that, so that even if we start using JSON, we will have such new format. That brings me back to property files. They are almost compatible with Jetpack l10n approach. The main issue is gettext feature where keys can be something else than a key, with spaces and wild characters. Actually, mozilla's implementation doesn't follow original specification. It allows any characters except `:` and `=` so that we can use it and simply warns when the key is invalid during compilation.

Finally, one interesting fact to know. Chris Hofmann gave a talk during MozCamp about l10n tools.
And they realized that 50% of localizations was done locally, with a simple text editor.
Most productive contributors tends to use this pattern as it offers way more control to do many translations.
Then, 50% are using one of the various online tools available, and the list of mozilla specific l10n tool is impressive!

Erik Vold

unread,
Nov 24, 2011, 11:08:56 PM11/24/11
to mozilla-la...@googlegroups.com
So, I've got a belated problem with json, yaml, and anything not property files, which is that there appears to be internal Mozilla code that depends on properties files, like http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/AddonManager.jsm#260

So I'll need properties files to be generated from whatever format is used, at the least.

--
Erik Vergobbi Vold

Email: erik...@gmail.com
Website: http://erikvold.com/

Alexandre Poirot

unread,
Nov 29, 2011, 10:27:33 AM11/29/11
to mozilla-labs-jetpack
On 25 nov, 05:08, Erik Vold <erikvv...@gmail.com> wrote:
> So, I've got a belated problem with json, yaml, and anything not property
> files, which is that there appears to be internal Mozilla code that depends
> on properties files, likehttp://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensi...

>
> So I'll need properties files to be generated from whatever format is used,
> at the least.

You are right, we will have to ship property files in the final XPI,
but they can be generated automatically, like install.rdf.

Alexandre Poirot

unread,
Nov 29, 2011, 10:30:31 AM11/29/11
to mozilla-labs-jetpack
I've created an etherpad note that summarize my various proposals.
I'd like to use this etherpad to help us decide what format we will be
using:
https://jetpack.etherpad.mozilla.org/localization
Reply all
Reply to author
Forward
0 new messages