Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

DOM overlays: security vs. use-cases

103 views
Skip to first unread message

Staś Małolepszy

unread,
Sep 13, 2013, 10:45:50 AM9/13/13
to tools...@lists.mozilla.org
(CC'ing Pascal and Flod, since I'm not sure if you're on the list;
please see my question at the end of the email.)

DOM overlays allow developers to create self-contained entities
(instead of splitting them into multiple parts which would hardcode
a specific order) by using HTML inside of localizable strings.

This gives the control over ordering of child nodes to the localizers.
However, because of how we currently implement them, localizers are
also free to create new HTML markup which ends up in the application
running code.

In this email I'd like to explore other alternative ways of
implementing this feature without sacrifices to the security.


Authoritative localized DOM
---------------------------

The way we currently implement DOM overlays treats the localized DOM as
the authoritative DOM.

So if the source HTML looks like this:

<p data-l10n-id="para" data-l10n-overlay>
<span class="red"></span>
</p>

And the translation looks like this:

<para """
A paragraph using <strong>new</strong> and
<span>existing</span> DOM nodes.
""">

The result will be this:

<p data-l10n-id="para" data-l10n-overlay>
A paragraph using <strong>new</strong> and
<span class="red">existing</span> DOM nodes.
</p>

(Notice how the span element inherited the class from the source HTML.)

This gives a lot of flexibility to the loclaizers, but obviously
introduces security concerns. Should we trust the translated DOM?
Should we whitelist tags that can be added, or blacklist ones that we
don't want added? The same goes for attributes.

Our current implementation is also naïve. It wipes out the source DOM
and replaces it with the localized one. This, in turn, creates
problems for third-party libraries and frameworks which rely on
bindings to particular DOM nodes to implement their logic (e.g.
AngularJS, Backbone). (Thanks to Michał for pointing this out in bug
907851).


Authoritative source DOM
------------------------

An alternative would be to treat the source DOM as the authoritative
one. This means that we keep source DOM and we disallow adding new
HTML elements which are not present in the source DOM.

This is more in line with what Paul Theriault suggested for Gaia's
templating micro-library which was recently moved into shared/. To
quote Paul:

I propose that we limit usage to setting:

1. Attribute values
2. values in text nodes

If that is possible, then it is very easy to change template.js to
interpolate using DOM methods, for which we no longer need to
sanitize or worry about quoting. I really like this approach since
it means the HTML structure defined in the template can't be
modified by data, so it makes it much easier to review.

https://groups.google.com/d/msg/mozilla.dev.gaia/wQ8JLkYi7EI/2zktQUIzqaMJ

Such an approach would help us avoid innerHTML (which is frowned upon;
see https://bugzil.la/901470) and with a proper implementation we could
even re-use existing nodes so we don't break any third-party bindings
(I'm not sure how hard this would be however so I wouldn't make this
a must-have for 1.0).

So the following source HTML:

<p data-l10n-id="para">
<span class="red"></span>
</p>

…plus the following translation:

<para """
A paragraph using <strong>new</strong> and
<span>existing</span> DOM nodes.
""">

…would produce:

<p data-l10n-id="para">
A paragraph using and
<span class="red">existing</span> DOM nodes.
</p>

The strong element is removed completely since it's not found in the
source DOM. Also note that in this scenario we don't need
data-l10n-overlay at all, I think, since the presence of HTML markup
inside the node determines if it accepts HTML in the translation.

(OTOH, we might want to add data-l10n-attrs or maybe data-l10n-allow
which lets developers control which attributes can be overwritten by
the translation.)


DOM re-ordering
---------------

I would still like to allow re-ordering of existing elements:

<p data-l10n-id="para">
<span class="red"></span>
<span class="green"></span>
</p>

<para """
<span data-l10n-path="span[2]">Green is first</span>
<span data-l10n-path="span[1]">Red is second</span>
""">

Output:

<p data-l10n-id="para">
<span class="green">Green is first</span>
<span class="red">Red is second</span>
</p>

My gut feeling is that re-ordering alone covers most of the use-cases
where diverging from the source HTML is needed.

So it looks like the authoritative source DOM model might alleviate
some of the security issues of the current solution but still allow us
to provide a useful feature for the localizers. It seems like it would
be a sane first step which is approriate in 1.0.

In the future, we might benefit from new sanitization APIs, like the
one discussed here:

https://groups.google.com/forum/#!topic/mozilla.dev.webapi/wDFM_T9v7Tc


Is re-ordering enough?
----------------------

Security questions aside, what is the best thing for the localizers?
Pascal and Flod, from your experience with working on Mozilla.org, how
often do localizers need or want to add new HTML markup? How often do
they only re-order the existing elements?

What do other people in the list think about this?

-stas

--
@stas

Staś Małolepszy

unread,
Sep 13, 2013, 12:02:18 PM9/13/13
to tools...@lists.mozilla.org, flo...@mozilla.com, Pascal Chevrel, flod, Francesco
It would actually help if I CC'ed the people I say I CC…

Pascal, Flod--you can find the full thread here:

https://groups.google.com/forum/#!topic/mozilla.tools.l10n/9JvxgTUwIrk

Quoting Staś Małolepszy (2013-09-13 16:45:50)
> (CC'ing Pascal and Flod, since I'm not sure if you're on the list;
> please see my question at the end of the email.)
>
> Security questions aside, what is the best thing for the localizers?
> Pascal and Flod, from your experience with working on Mozilla.org, how
> often do localizers need or want to add new HTML markup? How often do
> they only re-order the existing elements?

--
@stas

Francesco Lodolo

unread,
Sep 13, 2013, 12:04:34 PM9/13/13
to
> …plus the following translation:
> <para """
> A paragraph using <strong>new</strong> and
> <span>existing</span> DOM nodes.
> """>
>
> …would produce:
>
> <p data-l10n-id="para">
> A paragraph using and
> <span class="red">existing</span> DOM nodes.
> </p>

I personally don't like this solution. Think about Italian: one common rule would be to use Italic for foreign words (to be honest we don't use it very often).

Take for example a strings like this: "A completely new look". In Italian it could become "Un <em>look</em> completamente nuovo".

So I'm more in favor of whitelisting some "for sure not dangerous" HTML tags more than stripping everything away.

Francesco

Pascal Chevrel

unread,
Sep 13, 2013, 12:29:36 PM9/13/13
to mozilla-t...@lists.mozilla.org
Le 13/09/2013 16:45, Staś Małolepszy a écrit :

> Is re-ordering enough?
> ----------------------
>
> Security questions aside, what is the best thing for the localizers?
> Pascal and Flod, from your experience with working on Mozilla.org, how
> often do localizers need or want to add new HTML markup? How often do
> they only re-order the existing elements?

From my experience:
- It is unusual that localizers add html tags, the most added html tag
is the <br> one so as to fix design issues specific to the locale,
occasionnally they use the <sup> or <em> tags to respect typography, /ex
in French: M<sup>me</sup> for Madame instead of Mme
- It is usual that localizers add html entities that are needed for
typography in their language (&nbsp; &thinsp; &hellip; &mdash;...) or
that are needed in bidi contexts when a rtl text is mixed with ltr text
(&lrm; &rlm;)

Reordering of tags is not unusual at all as it depends on the grammar.
That said, it's not usual to have several tags in the same sentence, the
most common case is several links I think.

Localizers may also want to *remove* tags that are not relevant into
their translation, for example:

Please consult the <abbr>FAQ</abbr> on our <a>Website</a>.
Notre <a>site Web</a> vous fournira les réponses à vos questions.

Generally speaking, I'd say that if we can preserve html entities added
by localizers, that's already great, if we can whitelist a few tags (and
maybe allow the developper to decide on this list), that would be even
better.

In php, there is a function for that, strip_tags(), which is handy.

pascal

Pascal Chevrel

unread,
Sep 13, 2013, 12:34:17 PM9/13/13
to mozilla-t...@lists.mozilla.org
Le 13/09/2013 16:45, Staś Małolepszy a écrit :
> So the following source HTML:
>
> <p data-l10n-id="para">
> <span class="red"></span>
> </p>
>
> …plus the following translation:
>
> <para """
> A paragraph using <strong>new</strong> and
> <span>existing</span> DOM nodes.
> """>
>
> …would produce:
>
> <p data-l10n-id="para">
> A paragraph using and
> <span class="red">existing</span> DOM nodes.
> </p>
>

I don't like that because the sentence loses all meaning, if we have to
sanitize by removing tags, we shouldn't remove the text node. Especially
since a tag is often the indicator that the text contained is important
and/or needs special attention (<a> <em> <strong> <abbr>...).

If the developper can't offer a whitelist of tags, then the text should
be sanitized, not removed:

<p data-l10n-id="para">
A paragraph using new and
<span class="red">existing</span> DOM nodes.
</p>

Pascal

Staś Małolepszy

unread,
Sep 13, 2013, 2:04:33 PM9/13/13
to Pascal Chevrel, mozilla-t...@lists.mozilla.org
Thanks Pascal & Flod, those were just the kind of replies I was hoping
for!

So it looks like from the localizers' POV, we should:

1. allow HTML entities,

2. allow the addition of a few typographical tags, like em and sup,

3. not force the source tags to be in the translation (allow the
removal of tags),

4. allow tag reordering.

Interestingly enough, just ensuring #1 works on the client side will
require us to use innerHTML (or other, similar methods, which may be
faster, like insertAdjacentHTML). Unless we want to have rules to
replace the most common ones, which I wouldn't want us to do.

An alternative would be to encourage localizers to use unicode instead
of HTML entities.

We'll still need to sanitize the localized DOM. Number #2 looks pretty
straight-forward. We can have a default whitelist with the most common
elements and their attributes (e.g. title). So this would be okay:

<foo """
Read the <abbr title="Frequently Asked Questions">FAQ</abbr>.
""">

While this wouldn't:

<foo """
Read the <abbr onclick="alert(1)">FAQ</abbr>.
""">

The developer could add an element to the whitelist by adding it in the
source HTML.

In order to allow the control over attributes, we should re-introduce
data-l10n-attrs and have a sane default value for it: title,
placeholder, accesskey, alt, value (more?). We should think what
happens to href, src, cite defined on entities and defined on child
nodes in an entity.

For instance, href defined on an a element in the source HTML can be
inherited by a corresponding a element in the translation. What
happens if the translation redefines it, though? For instance, the
translation might link to a localized resource instead of the English
one.

I listed #3 and #4 explicitly just to make sure we don't forget about
them.

Does this sound about right?

-stas


--
@stas

Staś Małolepszy

unread,
Sep 13, 2013, 2:07:49 PM9/13/13
to Pascal Chevrel, mozilla-t...@lists.mozilla.org
Quoting Pascal Chevrel (2013-09-13 18:34:17)
> I don't like that because the sentence loses all meaning, if we have
> to sanitize by removing tags, we shouldn't remove the text node.
> Especially since a tag is often the indicator that the text contained
> is important and/or needs special attention (<a> <em> <strong>
> <abbr>...).
>
> If the developper can't offer a whitelist of tags, then the text should
> be sanitized, not removed:
>
> <p data-l10n-id="para">
> A paragraph using new and
> <span class="red">existing</span> DOM nodes.
> </p>

Yes, you're right. I don't know what I was thinking.

-stas

--
@stas

Staś Małolepszy

unread,
Sep 19, 2013, 11:29:27 AM9/19/13
to tools...@lists.mozilla.org
Quoting Staś Małolepszy (2013-09-13 16:45:50)
> In this email I'd like to explore other alternative ways of
> implementing this feature without sacrifices to the security.

I've been thinking for past three days about DOM overlays in context of
security, web-compatibility and viability for 1.0. What follows is
a rather long email, but I'll greatly appreciate your input.

I've tried to start with minimal goals and a set of constraints, and
find an implementation which would be a good minimal viable product for
1.0.


Goals
-----

1. Make it possible to localize content in HTML elements and
attributes without forcing developers to split strings into pre-
and post- parts (definitely a bad practice). For instance, it
should be possible for an <a> element to be a part of the
translation.

2. Make it possible for localizers to apply text-level semantics to
the translations and make use of HTML entities. For instance,
it should be possible for a localizer to use an <sup> element in
"M<sup>me</me>" (an abbreviation of French "Madame").


Constraints
-----------

1. Make the whole system secure and don't trust translations by
default allowing a safe set of attributes on HTML elements. For
instance, the localizer should not be able to add an onclick
handler or overwrite the target of href or src without the
developer or the localization engineer knowingly allowing it.

2. Don't break the Web; in particular, until we get more feedback
and gather more data, we should not break any two-way bindings
the third-party libraries might have set up on existing DOM
nodes.


Minimal viable product
----------------------

For 1.0, I suggest that we only support a safe set of text-level
semantical elements and a safe set of translatable attributes.
A detailed specification is provided atp the bottom of this email. For
example, if the source HTML looks like this:

<p data-l10n-id="noTagsHere"></p>

The translation can be:

<noTagsHere "M<sup>me</sup> means Madame.">

And the result will be:

<p data-l10n-id="noTagsHere">
M<sup>me</sup> means Madame.
</p>

Note that data-l10n-overlay is not needed at all and we can remove it
from our spec. It's always possible to use the whitelisted elements,
as well as HTML entities.

Additionally, any elements not on the whitelist can be explicitly
allowed by the developer by putting them in the source HTML. Only the
default whitelisted attributes found in the translation will be
preserved. For instance, if the source HTML looks like this:

<p data-l10n-id="aIsAllowed">
<a href="http://mozilla.org"
class="btn-back"
title="Mozilla"></a>
</p>

The translation can be:

<aIsAllowed """
Go back to
<a href="http://myevilwebsite.com"
onclick="alert(1)"
title="Back to the homepage">Mozilla.org</a>
""">

And the result will be:

<p data-l10n-id="noTagsHere">
Go back to <a href="http://mozilla.org"
class="btn-back"
title="Back to the homepage">Mozilla.org</a>
</p>

The title attribute is on the whitelist and will be overwritten by the
translation. The href and onclick attributes are not on the whitelist
and cannot be overwritten.

In the future, I suggest we support ITS's translateRules to let
developers specify which elements and which attributes should be added
to the whitelist, globally or locally. See the notes about data
attributes below and http://www.w3.org/TR/its20/#html5-global-approach.

Lastly, for the above inheritance to work, I suggest we remove the
reordering mechanism which uses data-l10n-path. This will be
a limitation of 1.0, but I still think that the advantages of this
solution greatly outweigh it.


Another example
---------------

Source HTML:

<label data-l10n-id="fullName">
<input name="fn">
</label>
<label data-l10n-id="age">
<input name="age" type="number" min="0">
</label>
<input data-l10n-id="submit" type="submit">

L20n translation:

<fullName """
Full name: <input placeholder="First Last">
""">
<age """
Age: <input> <small>(optional)</small>
""">
<submit
value: "Submit">

Result:

<label data-l10n-id="fullName">
Full name: <input name="fn" placeholder="First Last">
</label>
<label data-l10n-id="age">
Age: <input name="age" type="number" min="0">
<small>(optional)</small>
</label>
<input data-l10n-id="submit" type="submit" value="Submit">


Whitelisted elements
--------------------

I based this on the section of the HTML5 WD on text-level semantics:

http://www.w3.org/html/wg/drafts/html/master/text-level-semantics.html#text-level-semantics

Elements which are always allowed in translations:

a, em, strong, small, s, cite, q, dfn, abbr, data, time, code, var,
samp, kbd, sub, sup, i, b, u, mark, ruby, rt, rp, bdi, bdo, span,
br, wbr

Note that it's OK for an a element to appear in the translation because
the href attribute is not on the whitelist (see below).


Whitelisted attributes
----------------------

Except for global attributes, all other attributes are
context-sensitive. I used the list available in the section of the
HTML5 WD on the "translate" attribute.

http://www.w3.org/html/wg/drafts/html/master/dom.html#attr-translate

The key to reading the annotations is as follows:

( ) (no mark) I suggest to add this attribute to the whitelist.
* I suggest to add it post-1.0.
+ Attribute is not listed in HTML5 working draft section on
translatable attributes, but I think it would make sense to add it in 1.0.
? Attribute is not listed in HTML5 working draft, but we might want
to consider adding it after 1.0 (to HTML5 and L20n)


Global attributes:

title
aria-label
+ accesskey
* aria-valuetext
* style
? data-
? dir

Notes:

accesskey is currently not translatable; I filed a bug; if it is
accepted, I'd like o add it to the list of supported attributes.
https://www.w3.org/Bugs/Public/show_bug.cgi?id=23284

aria-valuetext is interesting because it can have multiple text
values depending on the value of the input; I suggest to add it
later.
http://www.w3.org/TR/wai-aria/states_and_properties#aria-valuetext

style must be parsed and recursively processed (e.g. for the values
of 'content' properties); I suggest to add it later.

data-* should not be translatable by default; developers can take
advantage of ITS translateRules, which as of ITS 2.0 can be linked

<link href="translateRules.xml" rel="its-rules">

or inlined:

<script type=application/its+xml id=ru1>
<its:rules version="2.0" xmlns:its="http://www.w3.org/2005/11/its"
xmlns:h="http://www.w3.org/1999/xhtml">
<its:translateRule selector="@*[starts-with(name(), 'data-')]"
translate="yes"/>
</its:rules>
</script>

See http://www.w3.org/TR/its20/#html5-global-approach. I propose
that we consider adding ITS support after 1.0.

dir attribute is not listed as translatable, either as a global
attribute or when used on the bdi and bdo elements. I'm not sure
if it should. I looked at projects/mozilla.com/trunk/{ar,fa,he} on
svn and neither uses "dir" even once. I tentatively marked it as
"?" for now, which means that we will consider adding support for
it after 1.0.

a, area

download
? href
? hreflang

Notes:

href and hreflang are currently not translatable, but I found an
interesting discussion about making them 'localizable'. This has
been slated for the next version of ITS, though. It looks like it
would be best to raise this topic again with Norbert. See:
http://www.w3.org/International/track/issues/217

To avoid security risks, I suggest we disallow href and hreflang in
translations in 1.0.

q, blockquote

? cite

Notes:

if the localizer translates a quote by choosing a different,
perhaps more culturaly relevant quote, the cite attribute should be
changed correspondingly; I think this is rare enough that we push
it after 1.0. Also see the discssion about 'localizable'
attributes mentioned at a, area.

input

+ value

Notes:

value should only be translatable on inputs with type: button,
submit, reset. Currently, the HTML5 draft doesn't take this into
account. I filed a bug about it:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=23283

menuitem, menu, optgroup, option, track

label

area, img, input

alt

input, textarea

placeholder

th

abbr

meta

* content

Notes:

content attribute should only be translatable if the name attribute
specifies a metadata name whose value is known to be translatable;
I suggest we implement this after 1.0

iframe

* srcdoc

Notes:

srcdoc must be parsed and recursively processed, see:
http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#attr-iframe-srcdoc
I suggest we implement this after 1.0.

* * *

Thanks for reading this far,
-stas

--
@stas

Michał Gołębiowski

unread,
Sep 19, 2013, 12:35:47 PM9/19/13
to
Thanks for the post, Staś! As you know, I was hit by l20n's current implementation replacing the whole nodes while overlaying so preserving the original nodes makes me happy.

I don't understand why disallow data-* attributes by default. What's the harm? Since they won't ever be used for anything defined in Web standards, they're just strings so they should be perfectly safe.

Staś Małolepszy

unread,
Sep 19, 2013, 1:30:52 PM9/19/13
to Michał Gołębiowski, tools...@lists.mozilla.org


Michał Gołębiowski pisze:
> I don't understand why disallow data-* attributes by default. What's the harm? Since they won't ever be used for anything defined in Web standards, they're just strings so they should be perfectly safe.

My rationale is that if your application uses data-* for storing data that's needed for it to work, then it might be unsafe to let localizers to tamper with this data.

Typing this from my mobile, I'll add an example tomorrow.

Staś Małolepszy

unread,
Sep 20, 2013, 6:03:50 AM9/20/13
to Michał Gołębiowski, tools...@lists.mozilla.org
Quoting Michał Gołębiowski (2013-09-19 20:13:04)
> Ah, right, I get it. White-listing the attributes from the app seems
> a great idea then.

I'm not sure if I follow. In my above email, I suggested that
developers could only have some control over the whitelist of elements.
Attributes would always be subject of the global whitelist. So if you
add an <img> element in the source of the node to be localized, the
localizer can place an <img> element in the translation to indicate its
position, but cannot put any attributes on it except for the
whitelisted ones (e.g. alt). Do you think this approach is fine?

Here's another example, using data-* attributes:

Source HTML:

<div data-l10n-id="spaceship" class="spaceship"
data-ship-id="92432" data-weapons="laser 2" data-shields="50%"
data-x="30" data-y="10" data-z="90">
<img src="ship.png">
</div>

Translation:

<spaceship """
<img alt="Image of the spaceship" src="evil.png">
Spaceship<sup>2</sup>
<iframe src="evil.html"></iframe>
"""
dataWeapons: "Ion cannon"
dataShields: "100%"
title: "This is your spaceship"
>

Result:

<div data-l10n-id="spaceship" class="spaceship"
data-ship-id="92432" data-weapons="laser 2" data-shields="50%"
data-x="30" data-y="10" data-z="90"
title="This is your spaceship">
<img src="ship.png" alt="Image of the spaceship">
Spaceship
</div>

As you can see, none of the data-* attributes was overwritten by the
translation. <img> is allowed, because it's present in the source
HTML, but only the alt attribute is taken from the translation. <sup>
is allowed because it's on the whitelist of text-level semantical
elements. <iframe> is not on the whitelist and it's not in the source
HTML, so it's not allowed in the final result.

> Also, a local approach might be safer (and more component-style):
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#html5-its-local-markup

That would be nice, but unfortunately, I don't think it's possible.
According to [1]:

All data categories defined in Section 8: Description of Data
Categories and having local implementation may be used in HTML with
the exception of the Translate, Directionality and Language
Information data categories.

And according to [2], the Translate data category is realized in HTML5
only by the "translate" attribute (which only relates to elements, and
not attributes). The only way you can have control over specific
attributes appears to cuurently be via translateRules, which are
a global realization of the Translate category.

[1] http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#html5-local-attributes
[2] http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#list-of-elements-and-attributes

-stas

--
@stas

Michał Gołębiowski

unread,
Sep 20, 2013, 9:37:17 AM9/20/13
to
Well, I need a way to translate some data-* attributes. I can declare allowed ones somewhere on the client-side if that's needed but I need to have a way. E.g. I need to be able to translate the `data-original-title` attribute used by Bootstrap for its tooltips. I'm sure there are other reasons for this need; generally, it will be needed everywhere where data-* attributes are used to carry a text to be processed and displayed by some jQuery plugin etc.

If that is possible to do somehow, then I'm happy. ;)

Staś Małolepszy

unread,
Sep 24, 2013, 4:14:29 AM9/24/13
to tools...@lists.mozilla.org
Quoting Staś Małolepszy (2013-09-19 17:29:27)
> accesskey is currently not translatable; I filed a bug; if it is
> accepted, I'd like o add it to the list of supported attributes.
> https://www.w3.org/Bugs/Public/show_bug.cgi?id=23284

The HTML spec bug got wontfixed, and the recommended way is to use ITS.

> data-* should not be translatable by default; developers can
> take advantage of ITS translateRules, which as of ITS 2.0 can be
> linked
>
> <link href="translateRules.xml" rel="its-rules">
>
> or inlined:
>
> <script type=application/its+xml id=ru1>
> <its:rules version="2.0" xmlns:its="http://www.w3.org/2005/11/its"
> xmlns:h="http://www.w3.org/1999/xhtml">
> <its:translateRule selector="@*[starts-with(name(), 'data-')]"
> translate="yes"/>
> </its:rules>
> </script>
>
> See http://www.w3.org/TR/its20/#html5-global-approach. I propose
> that we consider adding ITS support after 1.0.

I'm starting to think that maybe, for 1.0, we could support a very
small subset of ITS (translateRule) like in the example above. This
would allow the developers a fine-grained control over what to
translate, which might be needed for data-* attributes (see Michał's
email in this thread for examples), or other attributes like accesskey.

To simplify the start-up logic, only the inline version could be
supported, as another <link> would add an async call in our code.

I'd prefer this approach over any custom extensions like
data-l10n-attrs etc.

> input
>
> + value
>
> Notes:
>
> value should only be translatable on inputs with type: button,
> submit, reset. Currently, the HTML5 draft doesn't take this into
> account. I filed a bug about it:
> https://www.w3.org/Bugs/Public/show_bug.cgi?id=23283

This bug was accepted and fixed, so we can implement 'value' as part of
the spec now.

-stas

--
@stas

Zbigniew Braniecki

unread,
Sep 24, 2013, 3:03:06 PM9/24/13
to
Great work Stas!

Three notes:

1) Not using l10n-overlay seems like a performance hit.

2) Requiring entities to be listed in the source to whitelist them for translation sounds fishy.

3) our path attribute looks kind of similar to ITS's selector. Can we either use it or set a plan to use it for L20n.Next? I find this a very important feature of L20n overlay mechanism.

g.

Staś Małolepszy

unread,
Sep 24, 2013, 7:24:28 PM9/24/13
to Zbigniew Braniecki, tools...@lists.mozilla.org
Quoting Zbigniew Braniecki (2013-09-24 21:03:06)
> Great work Stas!
>
> Three notes:
>
> 1) Not using l10n-overlay seems like a performance hit.

We still need to allow the whitelisted elements even in simple cases,
so I don't think we can avoid some perf hit. We might try to optimize,
e.g. if there aren't any child nodes in the source, but I'd like to
focus on getting things right in the first place.

>
> 2) Requiring entities to be listed in the source to whitelist them
> for translation sounds fishy.

Well, this is really the core of the overlay mechanism. If you have an
<a> element present in the source, localizers can overlay its
whitelisted attributes, so now <a> is effectively whitelisted itself as
well.

> 3) our path attribute looks kind of similar to ITS's selector. Can we
> either use it or set a plan to use it for L20n.Next? I find this
> a very important feature of L20n overlay mechanism.

Right, it's both XPath, although differently rooted. What would you
like it to use it for, though?

I think it's worth exploring reordering, but not in 1.0. According to
Pascal and Flod, it's not as critical as we might have used to think.
Adding it *and* satisfying the requirement of not replacing nodes would
add complexity to the implementation.

-stas

--
@stas

Zbigniew Braniecki

unread,
Sep 27, 2013, 1:26:41 AM9/27/13
to
On Tuesday, September 24, 2013 4:24:28 PM UTC-7, Staś Małolepszy wrote:

> We still need to allow the whitelisted elements even in simple cases,
> so I don't think we can avoid some perf hit. We might try to optimize,
> e.g. if there aren't any child nodes in the source, but I'd like to
> focus on getting things right in the first place.

Wait, what I'm referring to is node.textContent vs. the whole DOM manipulation.

I would prefer to limit rich node localization to explicit cases, most nodes should be localized without node overlaying.
I would be very worried about perf hit otherwise.

> > 2) Requiring entities to be listed in the source to whitelist them
> > for translation sounds fishy.
> Well, this is really the core of the overlay mechanism. If you have an
> <a> element present in the source, localizers can overlay its
> whitelisted attributes, so now <a> is effectively whitelisted itself as
> well.

But in the current approach I can also introduce my own tags withing my localization, like <strong> <i> or <em>.


> > 3) our path attribute looks kind of similar to ITS's selector. Can we
> > either use it or set a plan to use it for L20n.Next? I find this
> > a very important feature of L20n overlay mechanism.

> Right, it's both XPath, although differently rooted. What would you
> like it to use it for, though?
>
> I think it's worth exploring reordering, but not in 1.0. According to
> Pascal and Flod, it's not as critical as we might have used to think.

Ok, I would think that reordering will be crucial.

Cheers,
g.

Staś Małolepszy

unread,
Sep 27, 2013, 3:10:12 AM9/27/13
to Zbigniew Braniecki, tools...@lists.mozilla.org
Quoting Zbigniew Braniecki (2013-09-27 07:26:41)
> On Tuesday, September 24, 2013 4:24:28 PM UTC-7, Staś Małolepszy
> wrote:
>
> > We still need to allow the whitelisted elements even in simple
> > cases, so I don't think we can avoid some perf hit. We might try
> > to optimize, e.g. if there aren't any child nodes in the source,
> > but I'd like to focus on getting things right in the first place.
>
> Wait, what I'm referring to is node.textContent vs. the whole DOM
> manipulation.
>
> I would prefer to limit rich node localization to explicit cases,
> most nodes should be localized without node overlaying.

So this what this thread is about :) I don't think it should be the
developer's decision to allow or forbid all HTML in the translation.
Flod and Pascal give good examples, for instance:

HTML: <p data-l10n-id="newLook"></p>
en-US: <newLook "A completely new look">
it: <newLook "Un <em>look</em> completamente nuovo">

I want this to work.

> I would be very worried about perf hit otherwise.

To combat this, I think we can try to introduce heuristics, like using
textContent if there's no "<" character in the translation. I marked
this as a TODO in my WIP in https://bugzil.la/921169.

> > > 2) Requiring entities to be listed in the source to whitelist
> > > them for translation sounds fishy.
> > Well, this is really the core of the overlay mechanism. If you
> > have an <a> element present in the source, localizers can overlay
> > its whitelisted attributes, so now <a> is effectively whitelisted
> > itself as well.
>
> But in the current approach I can also introduce my own tags withing
> my localization, like <strong> <i> or <em>.

On master, you can only do this if the developer remembered to add
data-l10n-overlay to the node in the source. I they didn't, the
localizer can't use own HTML like <strong>, nor HTML entities like
&nbsp;.

Like I say above, I don't think this should be developer's decision.

> > > 3) our path attribute looks kind of similar to ITS's selector.
> > > Can we either use it or set a plan to use it for L20n.Next?
> > > I find this a very important feature of L20n overlay mechanism.
>
> > Right, it's both XPath, although differently rooted. What would
> > you like it to use it for, though?

Interestingly, ITS also allows CSS selectors. I think I recall Pike
preferring those over XPath. For now, I want to remove l10n-path, so
localizers won't need to write those selectors. It's for internal use
(the overlay mechanism uses XPath in my WIP) and for the developers who
wish to use ITS.

-stas

--
@stas
0 new messages