Gaia & Localization

Vivien

unread,

Jan 24, 2012, 8:51:10 PM1/24/12

to Mozilla B2G mailing list

Gaia is built on top of web technologies and there is currently no
standard about how to translate a web site.

The traditional tools used to translate the Mozilla Platform are .dtd
and .properties.
.dtd are used to translate XML document and does not feet with html.
.properties are a common format and probably works for 90% (random
number) of the cases you may encounter on a web page.

To cover 99% of the use cases (random number again) the l10n team has
come with a proposal called l20n.
That's not a standard (yet?) but they are working with Henry Sivonen to
integrate it into the platform and to used it for regular HTML.

The current proposal for Gaia is located at
https://github.com/andreasgal/gaia/pull/274 .

Basically the idea is to add 3 attributes (l10n-id, l10n-args,
l10n-path) on nodes to translate them.
Only the first one is required in most cases. The others are for more
complex cases.

In the future platform version, localization involve shadow DOM but the
current proposal is just a shim JS library and so nodes will be
translated on the fly.
The localization format, called LOL is described on
http://wiki.mozilla.org/L20n.

If you have any opinion about l10n or any concerns please feel free to
jump in the discussion.

Vivien.

lkcl luke

unread,

Jan 25, 2012, 12:21:49 AM1/25/12

to Vivien, Mozilla B2G mailing list

On Wed, Jan 25, 2012 at 1:51 AM, Vivien <2...@vingtetun.org> wrote:

> In the future platform version, localization involve shadow DOM but the
> current proposal is just a shim JS library and so nodes will be translated
> on the fly.
> The localization format, called LOL is described on
> http://wiki.mozilla.org/L20n.
>
> If you have any opinion about l10n or any concerns please feel free to jump
> in the discussion.

this would be superb to have a proper solution.

GWT's internationalisation support has hard-coded implementations for
each and every world language, i think the code-base of the
internationalisation support libraries - written entirely in java -
stands at a whopping and ridiculous 80,000 lines of code.

in pyjamas we created a pattern which comprises a base class that is
over-ridden on a per-language basis. it's... clunky, it means having
to stick to some rules that are either easily misunderstood, ignored
or broken, and it's about 20 lines of code and in no way covers things
like date formats, number formats, all the rules about spelling
pluralisations correctly etc. which is what GWT's internationalisation
supports on a massive and comprehensive basis.

it also doesn't help that browsers do not support unicode. correct
emulation of the python "unicode" data type in pyjs is flat-out
impossible.

so, whilst i'm not offering any solutions for this, i'm definitely
pointing out that solutions are definitely needed.

l.

Robert Kaiser

unread,

Jan 25, 2012, 9:49:21 AM1/25/12

to mozilla...@lists.mozilla.org

Vivien schrieb:

> To cover 99% of the use cases (random number again) the l10n team has
> come with a proposal called l20n.

As a side note, in comparison to .dtd/.properties or even gettext/.po
and any other L10n infrastructure we have been looking into over the
years, the new l20n architecture has a ton more flexibility for
localizers to make things work really well in their language - and
should be easier for developers to handle as well (esp. in cases like
"<foo> is requesting to read <bar> contacts" where <foo> is an app and
<bar> a number).

So, where no standard does exist yet and anything at all needs to be
implemented freshly, l20n should be the best available solution for
developers and localizers alike.

(And we have plans to move Firefox and other Mozilla stuff over to l20n
as well, we'll see how fast all that progresses.)

It would of course be awesome if this could lead to a real standard.

Robert Kaiser

Chris Jones

unread,

Jan 26, 2012, 12:55:24 AM1/26/12

to Vivien, Boris Zbarsky, Olli Pettay, Mozilla B2G mailing list

----- Original Message -----
> From: "Vivien" <2...@vingtetun.org>
> To: "Mozilla B2G mailing list" <dev...@lists.mozilla.org>
> Sent: Tuesday, January 24, 2012 5:51:10 PM
> Subject: [b2g] Gaia & Localization
>
> If you have any opinion about l10n or any concerns please feel free
> to
> jump in the discussion.
>

I'd like to understand the engine a bit better. A fully worked-out example somewhere, showing everything from LOL source to modifying text nodes in the DOM, would help immensely.

My general understanding of the system so far is,

0. Apps load <script src="l10n-engine.js"> (or whatever it's called). This is the code that accepts input strings/properties/args/locale and produces localized strings. Eventually this code will live in Gecko.
a) l10n-engine sets up some magical DOM event listeners. More on this below.

1. Apps load their own l10n data as <script src="l10n.js"> (or whatver). This may be LOL source, or some compiled IR. (For the purposes of this discussion it doesn't matter, yet.)

2. onload or whatever event fires
a) l10n-engine.js callback is invoked
b) searches the DOM for elements with special l10n attributes
c) for each such node |n|,
[attributes of |n|] --> l10n-engine --> localized string --> [some text node of |n|]
For the purposes of this discussion, what actually happens in the l10n-engine implementation doesn't matter too much, yet. It can be a block box: string/args/locale in, string out.

Is this a reasonable approximation?

If so, I have some questions

A. If this is going to become a web standard, it needs a fallback impl for backwards-compat. To decide whether to use the fallback impl, apps need some kind of feature test for "native" l10n support. What should that feature test look like?

B. For step (0) above, what DOM event(s) trigger localization? To me (not an expert!), there seem to be a few options
i) Listen to DOM-mutation events. Robust, precise, but (as I understand) makes for omgslow in the general case (from web content). For an engine-native l10n impl, this might be performant enough.
ii) Pages manually request l10n. Fragile, but pages "pay-as-they-go" for l10n. Ugly in that it leaves open the possibility of "FOUS" (flash of unlocalized strings).
iii) Mix of (i) and (ii): localize on "load" event, explicit requests thereafter. Might be an 80% solution, but still fragile in general, and still vulnerable to "FOUS".

bz, any thoughts/guidance on ^^^?

C. Concerning step (2b): if the "fallback" l10n impl has to search the DOM for localizable nodes, can it be done performantly for large documents? Is querySelectorAll() for a magical l10n attribute fast enough? (For an engine-native impl, this isn't much of a concern.)

I have some more questions about what I've been assuming is an "l10n black box" above, but I'll ask those in a followup post.

Cheers,
Chris

Chris Jones

unread,

Jan 26, 2012, 1:26:23 AM1/26/12

to Vivien, Axel Hecht, Zbigniew Braniecki, Mozilla B2G mailing list

----- Original Message -----
> From: "Vivien" <2...@vingtetun.org>
> To: "Mozilla B2G mailing list" <dev...@lists.mozilla.org>
> Sent: Tuesday, January 24, 2012 5:51:10 PM
> Subject: [b2g] Gaia & Localization
>
> If you have any opinion about l10n or any concerns please feel free
> to
> jump in the discussion.
>

In my last post, I referred to the l10n engine as this: given DOM node |n|,

[attributes of |n|] --> l10n-engine --> localized string --> [some text node of |n|]

>From Vivien's original post, "attributes of |n|" appear to be (l10n-id, l10n-args, l10n-path). I don't understand what these are. Again, a fully worked-out example would help in understanding these.

Based on previous discussion, I think there are two proposals for how the l10n engine should work

1. Offline, an l10n "compiler" takes LOL as input, emits small/fast compiled-l10n.js to translate strings. Web apps load <[something] src="compiled-l10n.js">. This sets up event listeners etc. for l10n magic.

2. Online, web apps load their LOL directly, let's say <link rel="l10n" href="my-l10n.lol">. If we're using a fallback l10n engine, it reads my-l10n.lol and sets up an engine appropriately. If there's native support, all the magic happens in the rendering engine.

Is this accurate?

Some questions

A. This is probably extremely naive, but how are translations from separate locales packaged up? Are they all included in the same LOL file, or is there one LOL file per locale?
i) If there's one locale per LOL file, how is a page supposed to choose the right LOL file? Load the right one dynamically? Based on what, a future browser API? For the future in which there's built-in support for HTML l10n, how is the browser supposed to know which LOL files are available to choose from?
ii) If all locales are packaged in the same file, is there evidence that this will scale to >100 locales?

B. Is (1) above ever useful in engines that natively support LOL/whatever? I tend to think not, since all downloading the script will accomplish is waste network bandwidth and slow down load time.

C. In engines that natively support LOL, is there any benefit to having an "l10n intermediate representation" (IR) compiled from LOL, that browsers can load directly? Based on the history of web standards, the answer to this seems to be "probably not", assuming that LOL is a good format. Source > bytecode.

I have some further questions about the l10n algorithms themselves, but that's not too important yet, I don't think. I'd like to understand the bigger picture first.

Cheers,
Chris

Chris Jones

unread,

Jan 26, 2012, 1:50:42 AM1/26/12

to Vivien, Axel Hecht, Zbigniew Braniecki, Mozilla B2G mailing list

----- Original Message -----
> From: "Chris Jones" <cjo...@mozilla.com>
> To: "Vivien" <2...@vingtetun.org>
> Cc: "Axel Hecht" <pi...@mozilla.com>, "Zbigniew Braniecki" <zbran...@mozilla.com>, "Mozilla B2G mailing list"
> <dev...@lists.mozilla.org>
> Sent: Wednesday, January 25, 2012 10:26:23 PM
> Subject: Re: [b2g] Gaia & Localization
>
> ii) If all locales are packaged in the same file, is there
> evidence that this will scale to >100 locales?
>

Convincing evidence of this would be an l10n package for all Firefox locales. As I recall from previous Fennec work, the 30 or so locales supported by xul-fennec at the time amounted to 100KB (compressed? I don't remember ...). That size is borderline, but probably palatable for installed web apps

Cheers,
Chris

Chris Jones

unread,

Jan 26, 2012, 4:03:30 AM1/26/12

to Olli.Pettay, Boris Zbarsky, Vivien, Mozilla B2G mailing list

----- Original Message -----
> From: "Olli.Pettay" <ope...@mozilla.com>
> To: "Chris Jones" <cjo...@mozilla.com>
> Cc: "Vivien" <2...@vingtetun.org>, "Mozilla B2G mailing list" <dev...@lists.mozilla.org>, "Boris Zbarsky"
> <bzba...@mit.edu>
> Sent: Thursday, January 26, 2012 12:14:31 AM
> Subject: Re: [b2g] Gaia & Localization
>

> On 01/26/2012 06:55 AM, Chris Jones wrote:
> > ----- Original Message -----

> >> From: "Vivien"<2...@vingtetun.org> To: "Mozilla B2G mailing
> >> list"<dev...@lists.mozilla.org> Sent: Tuesday, January 24, 2012
> >> 5:51:10 PM Subject: [b2g] Gaia& Localization

> > B. For step (0) above, what DOM event(s) trigger localization? To
> > me
> > (not an expert!), there seem to be a few options i) Listen to
> > DOM-mutation events. Robust, precise, but (as I understand) makes
> > for omgslow in the general case (from web content). For an
> > engine-native l10n impl, this might be performant enough.

> Mutation events are going away, and shouldn't be used for anything.
> MutationObserver might work, but we don't have performance data
> about it yet (sorry, my implementation is late. I'll finalize it
> right after getting my CC patches landed, I hope).
>
> If it is known beforehand which attributes l10n should handle,
> MutationObserver might work fast enough, since there
> is a way to filter out other attribute changes.
>
>

Yes, the attributes will be known beforehand. Does the filter also pass through adds and removes of elements with the "interesting" attributes? I.e., will the observer get a notification when <span l10n-id="localize_me"></span> is added to the DOM?

Cheers,
Chris

Chris Jones

unread,

Jan 26, 2012, 5:01:00 AM1/26/12

to Olli.Pettay, Boris Zbarsky, Vivien, Mozilla B2G mailing list

----- Original Message -----
> From: "Olli.Pettay" <ope...@mozilla.com>
> To: "Chris Jones" <cjo...@mozilla.com>
> Cc: "Vivien" <2...@vingtetun.org>, "Mozilla B2G mailing list" <dev...@lists.mozilla.org>, "Boris Zbarsky"
> <bzba...@mit.edu>
> Sent: Thursday, January 26, 2012 1:33:19 AM
> Subject: Re: [b2g] Gaia & Localization
>

> No. That would indeed require l10n stuff to listen for
> all the node additions/removals.
>

OK. I'm not sure we have much of a use case for watching changing *changes* to the l10n attributes, but I also haven't seen a proposal for how something like changing a localizable button label, e.g., would work.

Cheers,
Chris

Fabien Cazenave

unread,

Jan 26, 2012, 6:08:55 AM1/26/12

to Chris Jones, Mozilla B2G mailing list

Ugh, we can't see who Chris is replying to in this dev-b2g thread.
Sorry if my post is irrelevant in the whole conversation.

Le 26/01/2012 07:50, Chris Jones a écrit :
> ----- Original Message -----

>> From: "Chris Jones"<cjo...@mozilla.com>
>> To: "Vivien"<2...@vingtetun.org>

>> Cc: "Axel Hecht"<pi...@mozilla.com>, "Zbigniew Braniecki"<zbran...@mozilla.com>, "Mozilla B2G mailing list"
>> <dev...@lists.mozilla.org>

>> Sent: Wednesday, January 25, 2012 10:26:23 PM

>> Subject: Re: [b2g] Gaia& Localization
>>

>> ii) If all locales are packaged in the same file, is there
>> evidence that this will scale to>100 locales?
>>
>
> Convincing evidence of this would be an l10n package for all Firefox locales. As I recall from previous Fennec work, the 30 or so locales supported by xul-fennec at the time amounted to 100KB (compressed? I don't remember ...). That size is borderline, but probably palatable for installed web apps
>

I don't think we want to package all locales in a single file! That
would be tricky for l10n contributors, and adding a new locale would be
more difficult.

From a webapp perspective, I think we should declare an l10n resource
in a <link> element, e.g.:
<link rel="resource" type="text/l10n" href="data.l10n" />

This "data.l10n" could be either:
• a default localization file + "data.l10n.[lang]" files
• an index (manifest?) pointing to other localization files

The first case is very simple: e.g. for a browser whose
navigator.language is "fr", we'd try to load "data.l10n.fr" and default
to "data.l10n" if not found.

The main advantage of the second case (manifest) is that it'd be easy to
know (from content JS) what locales are supported, and it'd allow a good
flexibility to store l10n files (e.g. one directory per locale).
The main drawback is that *two* requests would be needed to access an
l10n resource file.

As a side note: at the moment, we can only access navigator.language
from content JS. It'd be nice to propose an API to get the list of the
browser's accepted languages (= as stored in the "intl.accept_languages"
pref) so that a client-side library could select a more appropriate
locale when the one corresponding to navigator.language is not found.

--
:kazé

Vivien

unread,

Jan 26, 2012, 9:12:50 AM1/26/12

to Mozilla B2G mailing list, pi...@mozilla.com, zbran...@mozilla.com, hsiv...@mozilla.com

On 26/01/2012 06:55, Chris Jones wrote:
> ----- Original Message -----

> I'd like to understand the engine a bit better. A fully worked-out example somewhere, showing everything from LOL source to modifying text nodes in the DOM, would help immensely.
>
> My general understanding of the system so far is,
>
> 0. Apps load<script src="l10n-engine.js"> (or whatever it's called). This is the code that accepts input strings/properties/args/locale and produces localized strings. Eventually this code will live in Gecko.
> a) l10n-engine sets up some magical DOM event listeners. More on this below.
>
> 1. Apps load their own l10n data as<script src="l10n.js"> (or whatver). This may be LOL source, or some compiled IR. (For the purposes of this discussion it doesn't matter, yet.)
>
> 2. onload or whatever event fires
> a) l10n-engine.js callback is invoked
> b) searches the DOM for elements with special l10n attributes

> c) for each such node |n|,

> [attributes of |n|] --> l10n-engine --> localized string --> [some text node of |n|]

> For the purposes of this discussion, what actually happens in the l10n-engine implementation doesn't matter too much, yet. It can be a block box: string/args/locale in, string out.
>
> Is this a reasonable approximation?

It's a reasonable approximation of how the *shim* library acts. (Except
that |[attributes of |n|] --> l10n-engine --> localized string -->
[some text node of |n|]| is |[attributes of |n|] --> l10n-engine -->
localized string --> [some text node of |n| or attribute of [n]])

afaik in the Gecko version the html parser is supposed to take |l10n-id,
l10n-args, l10n-patch| into account and translated that into shadow DOM
directly.

>
> If so, I have some questions
>
> A. If this is going to become a web standard, it needs a fallback impl for backwards-compat. To decide whether to use the fallback impl, apps need some kind of feature test for "native" l10n support. What should that feature test look like?

The shim library would use data-l10n-id, data-l10n-path, data-l10n-args
instead of l10n-id, l10n-path, l10n-args. I assume the Gecko
implementation would also expose some code to let you translate a string
from JavaScript direcly. (Pike, Gandalf?)

>
> B. For step (0) above, what DOM event(s) trigger localization? To me (not an expert!), there seem to be a few options
> i) Listen to DOM-mutation events. Robust, precise, but (as I understand) makes for omgslow in the general case (from web content). For an engine-native l10n impl, this might be performant enough.

> ii) Pages manually request l10n. Fragile, but pages "pay-as-they-go" for l10n. Ugly in that it leaves open the possibility of "FOUS" (flash of unlocalized strings).
> iii) Mix of (i) and (ii): localize on "load" event, explicit requests thereafter. Might be an 80% solution, but still fragile in general, and still vulnerable to "FOUS".
>

As I have said above I assume the parser should take those attributes
into account in the native implementation. In some ways LOL (or whatever
name) is a language like CSS.

afaik for the moment the shim library localize on 'load' event of the
webpage. This leave the possibility of FOUS and also it can create many
reflows.

Vivien

unread,

Jan 26, 2012, 9:12:55 AM1/26/12

to Mozilla B2G mailing list, Axel Hecht, Zbigniew Braniecki

On 26/01/2012 07:26, Chris Jones wrote:
> ----- Original Message -----

>> From: "Vivien"<2...@vingtetun.org>
>> To: "Mozilla B2G mailing list"<dev...@lists.mozilla.org>
>> Sent: Tuesday, January 24, 2012 5:51:10 PM
>> Subject: [b2g] Gaia& Localization
>>

>> If you have any opinion about l10n or any concerns please feel free
>> to
>> jump in the discussion.
>>

> In my last post, I referred to the l10n engine as this: given DOM node |n|,

>
> [attributes of |n|] --> l10n-engine --> localized string --> [some text node of |n|]
>

> From Vivien's original post, "attributes of |n|" appear to be (l10n-id, l10n-args, l10n-path). I don't understand what these are. Again, a fully worked-out example would help in understanding these.

From what I've understand of the format so far:
- l10n-id: is a unique identifier for the string to localize
- l10n-path: is ... an ... xpath expression (that will not work in the
web world but that's a leftover from XUL). This can be replace by a css
selector. I'm not really sure how useful it is. It seems like this is
done to retrieve some nodes once the content has been localized but
that's unclear to me.
- l10n-args: those are some arguments to pass to the translating
engine to give more information about the context (I can be wrong,
l10n-guys know much more than me here)

> Based on previous discussion, I think there are two proposals for how the l10n engine should work
>
> 1. Offline, an l10n "compiler" takes LOL as input, emits small/fast compiled-l10n.js to translate strings. Web apps load<[something] src="compiled-l10n.js">. This sets up event listeners etc. for l10n magic.
>
> 2. Online, web apps load their LOL directly, let's say<link rel="l10n" href="my-l10n.lol">. If we're using a fallback l10n engine, it reads my-l10n.lol and sets up an engine appropriately. If there's native support, all the magic happens in the rendering engine.
>
> Is this accurate?

This is what I have undestand so far.

Olli.Pettay

unread,

Jan 26, 2012, 4:33:19 AM1/26/12

to Chris Jones, Boris Zbarsky, Vivien, Mozilla B2G mailing list

On 01/26/2012 10:03 AM, Chris Jones wrote:
> ----- Original Message -----

>> From: "Olli.Pettay"<ope...@mozilla.com> To: "Chris
>> Jones"<cjo...@mozilla.com> Cc: "Vivien"<2...@vingtetun.org>, "Mozilla
>> B2G mailing list"<dev...@lists.mozilla.org>, "Boris Zbarsky"
>> <bzba...@mit.edu> Sent: Thursday, January 26, 2012 12:14:31 AM

>> Subject: Re: [b2g] Gaia& Localization
>>

>> On 01/26/2012 06:55 AM, Chris Jones wrote:
>>> ----- Original Message -----
>>>> From: "Vivien"<2...@vingtetun.org> To: "Mozilla B2G mailing
>>>> list"<dev...@lists.mozilla.org> Sent: Tuesday, January 24,
>>>> 2012 5:51:10 PM Subject: [b2g] Gaia& Localization

>>> B. For step (0) above, what DOM event(s) trigger localization?
>>> To me (not an expert!), there seem to be a few options i) Listen
>>> to DOM-mutation events. Robust, precise, but (as I understand)
>>> makes for omgslow in the general case (from web content). For
>>> an engine-native l10n impl, this might be performant enough.

>> Mutation events are going away, and shouldn't be used for
>> anything. MutationObserver might work, but we don't have
>> performance data about it yet (sorry, my implementation is late.
>> I'll finalize it right after getting my CC patches landed, I
>> hope).
>>
>> If it is known beforehand which attributes l10n should handle,
>> MutationObserver might work fast enough, since there is a way to
>> filter out other attribute changes.
>>
>>
>
> Yes, the attributes will be known beforehand. Does the filter also
> pass through adds and removes of elements with the "interesting"
> attributes? I.e., will the observer get a notification when<span
> l10n-id="localize_me"></span> is added to the DOM?

No. That would indeed require l10n stuff to listen for
all the node additions/removals.

-Olli

>
> Cheers, Chris

Olli.Pettay

unread,

Jan 26, 2012, 3:14:31 AM1/26/12

to Chris Jones, Boris Zbarsky, Vivien, Mozilla B2G mailing list

On 01/26/2012 06:55 AM, Chris Jones wrote:
> ----- Original Message -----
>> From: "Vivien"<2...@vingtetun.org> To: "Mozilla B2G mailing
>> list"<dev...@lists.mozilla.org> Sent: Tuesday, January 24, 2012
>> 5:51:10 PM Subject: [b2g] Gaia& Localization
>>

>> If you have any opinion about l10n or any concerns please feel
>> free to jump in the discussion.
>>
>

> I'd like to understand the engine a bit better. A fully worked-out
> example somewhere, showing everything from LOL source to modifying
> text nodes in the DOM, would help immensely.
>
> My general understanding of the system so far is,
>
> 0. Apps load<script src="l10n-engine.js"> (or whatever it's called).
> This is the code that accepts input strings/properties/args/locale
> and produces localized strings. Eventually this code will live in
> Gecko. a) l10n-engine sets up some magical DOM event listeners. More
> on this below.
>
> 1. Apps load their own l10n data as<script src="l10n.js"> (or
> whatver). This may be LOL source, or some compiled IR. (For the
> purposes of this discussion it doesn't matter, yet.)
>
> 2. onload or whatever event fires a) l10n-engine.js callback is
> invoked b) searches the DOM for elements with special l10n
> attributes c) for each such node |n|, [attributes of |n|] -->
> l10n-engine --> localized string --> [some text node of |n|] For
> the purposes of this discussion, what actually happens in the
> l10n-engine implementation doesn't matter too much, yet. It can be a
> block box: string/args/locale in, string out.
>
> Is this a reasonable approximation?
>

> If so, I have some questions
>
> A. If this is going to become a web standard, it needs a fallback
> impl for backwards-compat. To decide whether to use the fallback
> impl, apps need some kind of feature test for "native" l10n support.
> What should that feature test look like?
>

> B. For step (0) above, what DOM event(s) trigger localization? To me
> (not an expert!), there seem to be a few options i) Listen to
> DOM-mutation events. Robust, precise, but (as I understand) makes
> for omgslow in the general case (from web content). For an
> engine-native l10n impl, this might be performant enough.
Mutation events are going away, and shouldn't be used for anything.
MutationObserver might work, but we don't have performance data
about it yet (sorry, my implementation is late. I'll finalize it
right after getting my CC patches landed, I hope).

If it is known beforehand which attributes l10n should handle,
MutationObserver might work fast enough, since there
is a way to filter out other attribute changes.

-Olli

> ii) Pages
> manually request l10n. Fragile, but pages "pay-as-they-go" for l10n.
> Ugly in that it leaves open the possibility of "FOUS" (flash of
> unlocalized strings). iii) Mix of (i) and (ii): localize on "load"
> event, explicit requests thereafter. Might be an 80% solution, but
> still fragile in general, and still vulnerable to "FOUS".
>

Chris Jones

unread,

Jan 26, 2012, 4:46:48 PM1/26/12

to Vivien, pi...@mozilla.com, zbran...@mozilla.com, hsiv...@mozilla.com, Mozilla B2G mailing list

----- Original Message -----
> From: "Vivien" <2...@vingtetun.org>
> To: "Mozilla B2G mailing list" <dev...@lists.mozilla.org>

> Cc: pi...@mozilla.com, zbran...@mozilla.com, hsiv...@mozilla.com
> Sent: Thursday, January 26, 2012 6:12:50 AM
> Subject: Re: [b2g] Gaia & Localization
>

> It's a reasonable approximation of how the *shim* library acts.
> (Except
> that |[attributes of |n|] --> l10n-engine --> localized string -->
> [some text node of |n|]| is |[attributes of |n|] --> l10n-engine -->
> localized string --> [some text node of |n| or attribute of [n]])
>
> afaik in the Gecko version the html parser is supposed to take
> |l10n-id,
> l10n-args, l10n-patch| into account and translated that into shadow
> DOM
> directly.
>

The HTML parser can't be entirely responsible for that, because dynamically added and changed elements need to be (re-)localized. Right?

Talk of a "shadow DOM" scares me a bit but I'm not worried about making this fast in gecko, however we do it.

> >
> > If so, I have some questions
> >
> > A. If this is going to become a web standard, it needs a fallback
> > impl for backwards-compat. To decide whether to use the
> > fallback impl, apps need some kind of feature test for "native"
> > l10n support. What should that feature test look like?

> The shim library would use data-l10n-id, data-l10n-path,
> data-l10n-args
> instead of l10n-id, l10n-path, l10n-args. I assume the Gecko
> implementation would also expose some code to let you translate a
> string
> from JavaScript direcly. (Pike, Gandalf?)
>

D'oh, I thought about that question but forgot to write it down. I don't have a concrete use case for this in mind. It seems like we can always work around this by creating dummy DOM elements and reading their text node(s).

If we don't expose a JS interface to directly localize strings, then we still need another feature test.

> >
> > B. For step (0) above, what DOM event(s) trigger localization?
> > To me (not an expert!), there seem to be a few options
> > i) Listen to DOM-mutation events. Robust, precise, but (as I
> > understand) makes for omgslow in the general case (from web
> > content). For an engine-native l10n impl, this might be
> > performant enough.

> > ii) Pages manually request l10n. Fragile, but pages
> > "pay-as-they-go" for l10n. Ugly in that it leaves open the
> > possibility of "FOUS" (flash of unlocalized strings).
> > iii) Mix of (i) and (ii): localize on "load" event, explicit
> > requests thereafter. Might be an 80% solution, but still
> > fragile in general, and still vulnerable to "FOUS".
> >

> As I have said above I assume the parser should take those attributes
> into account in the native implementation. In some ways LOL (or
> whatever
> name) is a language like CSS.
>

These are approaches the shim library can use. An in-engine implementation what do whatever it wants.

> afaik for the moment the shim library localize on 'load' event of
> the
> webpage. This leave the possibility of FOUS and also it can create
> many
> reflows.
>

I don't understand where the "many reflows" come from (should be just one, right?), but yeah this seems suboptimal.

Cheers,
Chris

Zbigniew Braniecki

unread,

Jan 26, 2012, 6:19:00 PM1/26/12

to Vivien, Axel Hecht, Mozilla B2G mailing list

> Vivien <mailto:2...@vingtetun.org>
> January 26, 2012 3:12 PM

> On 26/01/2012 07:26, Chris Jones wrote:
>> ----- Original Message -----
>>> From: "Vivien"<2...@vingtetun.org>
>>> To: "Mozilla B2G mailing list"<dev...@lists.mozilla.org>

>>> Sent: Tuesday, January 24, 2012 5:51:10 PM
>>> Subject: [b2g] Gaia& Localization
>>>
>>> If you have any opinion about l10n or any concerns please feel free
>>> to
>>> jump in the discussion.
>>>

>> In my last post, I referred to the l10n engine as this: given DOM

>> node |n|,
>>
>> [attributes of |n|] --> l10n-engine --> localized string -->
>> [some text node of |n|]
>>

>> From Vivien's original post, "attributes of |n|" appear to be
>> (l10n-id, l10n-args, l10n-path). I don't understand what these are.
>> Again, a fully worked-out example would help in understanding these.
> From what I've understand of the format so far:
> - l10n-id: is a unique identifier for the string to localize
> - l10n-path: is ... an ... xpath expression (that will not work in
> the web world but that's a leftover from XUL). This can be replace by
> a css selector. I'm not really sure how useful it is. It seems like
> this is done to retrieve some nodes once the content has been
> localized but that's unclear to me.
> - l10n-args: those are some arguments to pass to the translating
> engine to give more information about the context (I can be wrong,
> l10n-guys know much more than me here)

Let me clarify.

There are three node attributes that are in use by L20n on a localizable
node:
- l10n-id: which binds the Node with localizable entity
- l10n-args: which defines local arguments passed by the developer to
the localizer in order to provide localization context (user name, some
numbers, gender etc.)
- l10n-attrs: which defines the list of attributes that can be
localized. By default we will have a set of attributes like title, value
or placeholder that will be whitelisted, but attributes like style, href
or src will not be unless they will be added to this attribute

We also have a separate concept, called DOM Fragment, which may be
localized as well. In case a Node with l10n-id contains a subtree of
nodes, they may be localized together. In such case the HTML code
contains only the DOM structure of the DOM Fragment with attributes that
are not localizable (styles, classes, urls etc.) and node values are DOM
Fragments that will be matched to the HTML DOM Fragment structure.

In such case, we want to allow localizers to reorder nodes within the
DOM Fragment which is a common request from the localizers. If all nodes
within the DOM Fragment have ID's assigned, the localizer will just use
the ID of the node and our algorithm will match the nodes using it, but
a common case is that DOM Fragment contains multiple <strong>, <p> and
<span> elements that do not use ID. In such case localizers may request
the ID to be added, or use another attribute - l10n-path - to point to a
node in the HTML structure, for example saying that the first <span> in
their localization is the third <span> in the original HTML structure.

Since we're all geeks and we speak code, please, review the last three
examples here: http://zbraniecki.github.com/l20n/ :)

>> Based on previous discussion, I think there are two proposals for how
>> the l10n engine should work
>>
>> 1. Offline, an l10n "compiler" takes LOL as input, emits small/fast
>> compiled-l10n.js to translate strings. Web apps load<[something]
>> src="compiled-l10n.js">. This sets up event listeners etc. for l10n
>> magic.
>>
>> 2. Online, web apps load their LOL directly, let's say<link
>> rel="l10n" href="my-l10n.lol">. If we're using a fallback l10n
>> engine, it reads my-l10n.lol and sets up an engine appropriately. If
>> there's native support, all the magic happens in the rendering engine.
>>
>> Is this accurate?
>
> This is what I have undestand so far.
>

More or less.

I believe that we will want to use l20n.js/l20n-xml.js libraries only
for projects that require cross-engine support. For native code we will
support L20n in Gecko and we will only link to .lol resources which
should be compiled and cached on fly.

Cheers,
g.

Zbigniew Braniecki

unread,

Jan 26, 2012, 6:21:44 PM1/26/12

to Vivien, pi...@mozilla.com, Mozilla B2G mailing list, hsiv...@mozilla.com

> Vivien <mailto:2...@vingtetun.org>
> January 26, 2012 3:12 PM

> On 26/01/2012 06:55, Chris Jones wrote:
>> ----- Original Message -----

>> I'd like to understand the engine a bit better. A fully worked-out
>> example somewhere, showing everything from LOL source to modifying
>> text nodes in the DOM, would help immensely.
>>
>> My general understanding of the system so far is,
>>
>> 0. Apps load<script src="l10n-engine.js"> (or whatever it's
>> called). This is the code that accepts input
>> strings/properties/args/locale and produces localized strings.
>> Eventually this code will live in Gecko.
>> a) l10n-engine sets up some magical DOM event listeners. More on
>> this below.
>>
>> 1. Apps load their own l10n data as<script src="l10n.js"> (or
>> whatver). This may be LOL source, or some compiled IR. (For the
>> purposes of this discussion it doesn't matter, yet.)
>>
>> 2. onload or whatever event fires
>> a) l10n-engine.js callback is invoked
>> b) searches the DOM for elements with special l10n attributes

>> c) for each such node |n|,

>> [attributes of |n|] --> l10n-engine --> localized string
>> --> [some text node of |n|]

>> For the purposes of this discussion, what actually happens in the
>> l10n-engine implementation doesn't matter too much, yet. It can be a
>> block box: string/args/locale in, string out.
>>
>> Is this a reasonable approximation?
>
> It's a reasonable approximation of how the *shim* library acts.
> (Except that |[attributes of |n|] --> l10n-engine --> localized
> string --> [some text node of |n|]| is |[attributes of |n|] -->
> l10n-engine --> localized string --> [some text node of |n| or
> attribute of [n]])
>
> afaik in the Gecko version the html parser is supposed to take
> |l10n-id, l10n-args, l10n-patch| into account and translated that into
> shadow DOM directly.
>

Yes.

>>
>> If so, I have some questions
>>
>> A. If this is going to become a web standard, it needs a fallback
>> impl for backwards-compat. To decide whether to use the fallback
>> impl, apps need some kind of feature test for "native" l10n support.
>> What should that feature test look like?
> The shim library would use data-l10n-id, data-l10n-path,
> data-l10n-args instead of l10n-id, l10n-path, l10n-args. I assume the
> Gecko implementation would also expose some code to let you translate
> a string from JavaScript direcly. (Pike, Gandalf?)

According to Henri the strategy is like this:

for JS library we go for data-l10n-*, for native support we go for l10n-*.

That way, native support will not collide with JS library as they'll get
triggered by different attributes.

>
>>
>> B. For step (0) above, what DOM event(s) trigger localization? To
>> me (not an expert!), there seem to be a few options
>> i) Listen to DOM-mutation events. Robust, precise, but (as I
>> understand) makes for omgslow in the general case (from web
>> content). For an engine-native l10n impl, this might be performant
>> enough.
>> ii) Pages manually request l10n. Fragile, but pages
>> "pay-as-they-go" for l10n. Ugly in that it leaves open the
>> possibility of "FOUS" (flash of unlocalized strings).
>> iii) Mix of (i) and (ii): localize on "load" event, explicit
>> requests thereafter. Might be an 80% solution, but still fragile in
>> general, and still vulnerable to "FOUS".
>>
> As I have said above I assume the parser should take those attributes
> into account in the native implementation. In some ways LOL (or
> whatever name) is a language like CSS.
>

> afaik for the moment the shim library localize on 'load' event of the
> webpage. This leave the possibility of FOUS and also it can create
> many reflows.

For JS library we do not have a strategy to deal with dynamically
injected content yet. I'm open to any suggestions :)

Cheers,
g.

Zbigniew Braniecki

unread,

Jan 26, 2012, 6:26:47 PM1/26/12

to Chris Jones, pi...@mozilla.com, Vivien, hsiv...@mozilla.com, Mozilla B2G mailing list

> Chris Jones <mailto:cjo...@mozilla.com>
> January 26, 2012 10:46 PM

> ----- Original Message -----
>>
>
> The HTML parser can't be entirely responsible for that, because dynamically added and changed elements need to be (re-)localized. Right?

Yes. Henri is in charge of the strategy of how to inject L20n support to
catch dynamically injected content into HTML5/XUL on the native level.

> Talk of a "shadow DOM" scares me a bit but I'm not worried about making this fast in gecko, however we do it.

Welcome to the club. :) I'm scared by shadow DOM as much as by the
concept of us using XBL2 once it's there ;)

Anyway, for now the native implementation does the same thing as JS lib
- it expands attributes and injects Node value.

Shadow DOM will introduce a new universe of challenges (like - explain
to web devs why they don't see the website text in the source code and
figure out the new API for getting to it. For now sicking suggested sth
like HTMLElement.l10n.getAttributes and HTMLElement.l10n.textValue - Yay!).

>
>> afaik for the moment the shim library localize on 'load' event of
>> the
>> webpage. This leave the possibility of FOUS and also it can create
>> many
>> reflows.
>>
>

> I don't understand where the "many reflows" come from (should be just one, right?), but yeah this seems suboptimal.

We are aware of the potential reflow issue, but we decided (we - Henri,
Sicking, Bsmedberg, jst) that we don't want to block the parser with
l10n resource loading, so the native implementation collects localizable
nodes and translates them asynchronously once the resources are loaded.
We don't flicker for some unknown reason. :)

Cheers,
g.

Chris Jones

unread,

Jan 27, 2012, 1:59:56 PM1/27/12

to Zbigniew Braniecki, Vivien, hsiv...@mozilla.com, Boris Zbarsky, pi...@mozilla.com, Olli.Pettay, Mozilla B2G mailing list

----- Original Message -----
> From: "Zbigniew Braniecki" <zbran...@mozilla.com>
> To: "Vivien" <2...@vingtetun.org>
> Cc: pi...@mozilla.com, "Mozilla B2G mailing list" <dev...@lists.mozilla.org>, hsiv...@mozilla.com
> Sent: Thursday, January 26, 2012 3:21:44 PM
> Subject: Re: [b2g] Gaia & Localization
>

> >> B. For step (0) above, what DOM event(s) trigger localization?
> >> To
> >> me (not an expert!), there seem to be a few options
> >> i) Listen to DOM-mutation events. Robust, precise, but (as I
> >> understand) makes for omgslow in the general case (from web
> >> content). For an engine-native l10n impl, this might be
> >> performant
> >> enough.
> >> ii) Pages manually request l10n. Fragile, but pages
> >> "pay-as-they-go" for l10n. Ugly in that it leaves open the
> >> possibility of "FOUS" (flash of unlocalized strings).
> >> iii) Mix of (i) and (ii): localize on "load" event, explicit
> >> requests thereafter. Might be an 80% solution, but still fragile
> >> in
> >> general, and still vulnerable to "FOUS".
> >>
> > As I have said above I assume the parser should take those
> > attributes
> > into account in the native implementation. In some ways LOL (or
> > whatever name) is a language like CSS.
> >

> > afaik for the moment the shim library localize on 'load' event of
> > the
> > webpage. This leave the possibility of FOUS and also it can create
> > many reflows.
>

> For JS library we do not have a strategy to deal with dynamically
> injected content yet. I'm open to any suggestions :)
>

Another case where we need this is if the user switches locale. We need to re-localize open windows (lazily please!).

Cheers,
Chris

Chris Jones

unread,

Jan 27, 2012, 2:01:53 PM1/27/12

to Zbigniew Braniecki, pi...@mozilla.com, Vivien, hsiv...@mozilla.com, Mozilla B2G mailing list

----- Original Message -----
> From: "Zbigniew Braniecki" <zbran...@mozilla.com>

> To: "Chris Jones" <cjo...@mozilla.com>
> Cc: "Vivien" <2...@vingtetun.org>, pi...@mozilla.com, hsiv...@mozilla.com, "Mozilla B2G mailing list"
> <dev...@lists.mozilla.org>
> Sent: Thursday, January 26, 2012 3:26:47 PM
> Subject: Re: [b2g] Gaia & Localization
>

> afaik for the moment the shim library localize on 'load' event of
> the
> webpage. This leave the possibility of FOUS and also it can create
> many

> reflows. I don't understand where the "many reflows" come from
> (should be just one, right?), but yeah this seems suboptimal. We are
> aware of the potential reflow issue, but we decided (we - Henri,
> Sicking, Bsmedberg, jst) that we don't want to block the parser with
> l10n resource loading, so the native implementation collects
> localizable nodes and translates them asynchronously once the
> resources are loaded.

That's a fair tradeoff. It's similar to the problem we have with @font-face, and similar heuristics apply.

> We don't flicker for some unknown reason. :)
>

This will be timing dependent.

Cheers,
Chris