Quoting Staś Małolepszy (2013-09-13 16:45:50)
> In this email I'd like to explore other alternative ways of
> implementing this feature without sacrifices to the security.
I've been thinking for past three days about DOM overlays in context of
security, web-compatibility and viability for 1.0. What follows is
a rather long email, but I'll greatly appreciate your input.
I've tried to start with minimal goals and a set of constraints, and
find an implementation which would be a good minimal viable product for
1.0.
Goals
-----
1. Make it possible to localize content in HTML elements and
attributes without forcing developers to split strings into pre-
and post- parts (definitely a bad practice). For instance, it
should be possible for an <a> element to be a part of the
translation.
2. Make it possible for localizers to apply text-level semantics to
the translations and make use of HTML entities. For instance,
it should be possible for a localizer to use an <sup> element in
"M<sup>me</me>" (an abbreviation of French "Madame").
Constraints
-----------
1. Make the whole system secure and don't trust translations by
default allowing a safe set of attributes on HTML elements. For
instance, the localizer should not be able to add an onclick
handler or overwrite the target of href or src without the
developer or the localization engineer knowingly allowing it.
2. Don't break the Web; in particular, until we get more feedback
and gather more data, we should not break any two-way bindings
the third-party libraries might have set up on existing DOM
nodes.
Minimal viable product
----------------------
For 1.0, I suggest that we only support a safe set of text-level
semantical elements and a safe set of translatable attributes.
A detailed specification is provided atp the bottom of this email. For
example, if the source HTML looks like this:
<p data-l10n-id="noTagsHere"></p>
The translation can be:
<noTagsHere "M<sup>me</sup> means Madame.">
And the result will be:
<p data-l10n-id="noTagsHere">
M<sup>me</sup> means Madame.
</p>
Note that data-l10n-overlay is not needed at all and we can remove it
from our spec. It's always possible to use the whitelisted elements,
as well as HTML entities.
Additionally, any elements not on the whitelist can be explicitly
allowed by the developer by putting them in the source HTML. Only the
default whitelisted attributes found in the translation will be
preserved. For instance, if the source HTML looks like this:
<p data-l10n-id="aIsAllowed">
<a href="
http://mozilla.org"
class="btn-back"
title="Mozilla"></a>
</p>
The translation can be:
<aIsAllowed """
Go back to
<a href="
http://myevilwebsite.com"
onclick="alert(1)"
title="Back to the homepage">Mozilla.org</a>
""">
And the result will be:
<p data-l10n-id="noTagsHere">
Go back to <a href="
http://mozilla.org"
class="btn-back"
title="Back to the homepage">Mozilla.org</a>
</p>
The title attribute is on the whitelist and will be overwritten by the
translation. The href and onclick attributes are not on the whitelist
and cannot be overwritten.
In the future, I suggest we support ITS's translateRules to let
developers specify which elements and which attributes should be added
to the whitelist, globally or locally. See the notes about data
attributes below and
http://www.w3.org/TR/its20/#html5-global-approach.
Lastly, for the above inheritance to work, I suggest we remove the
reordering mechanism which uses data-l10n-path. This will be
a limitation of 1.0, but I still think that the advantages of this
solution greatly outweigh it.
Another example
---------------
Source HTML:
<label data-l10n-id="fullName">
<input name="fn">
</label>
<label data-l10n-id="age">
<input name="age" type="number" min="0">
</label>
<input data-l10n-id="submit" type="submit">
L20n translation:
<fullName """
Full name: <input placeholder="First Last">
""">
<age """
Age: <input> <small>(optional)</small>
""">
<submit
value: "Submit">
Result:
<label data-l10n-id="fullName">
Full name: <input name="fn" placeholder="First Last">
</label>
<label data-l10n-id="age">
Age: <input name="age" type="number" min="0">
<small>(optional)</small>
</label>
<input data-l10n-id="submit" type="submit" value="Submit">
Whitelisted elements
--------------------
I based this on the section of the HTML5 WD on text-level semantics:
http://www.w3.org/html/wg/drafts/html/master/text-level-semantics.html#text-level-semantics
Elements which are always allowed in translations:
a, em, strong, small, s, cite, q, dfn, abbr, data, time, code, var,
samp, kbd, sub, sup, i, b, u, mark, ruby, rt, rp, bdi, bdo, span,
br, wbr
Note that it's OK for an a element to appear in the translation because
the href attribute is not on the whitelist (see below).
Whitelisted attributes
----------------------
Except for global attributes, all other attributes are
context-sensitive. I used the list available in the section of the
HTML5 WD on the "translate" attribute.
http://www.w3.org/html/wg/drafts/html/master/dom.html#attr-translate
The key to reading the annotations is as follows:
( ) (no mark) I suggest to add this attribute to the whitelist.
* I suggest to add it post-1.0.
+ Attribute is not listed in HTML5 working draft section on
translatable attributes, but I think it would make sense to add it in 1.0.
? Attribute is not listed in HTML5 working draft, but we might want
to consider adding it after 1.0 (to HTML5 and L20n)
Global attributes:
title
aria-label
+ accesskey
* aria-valuetext
* style
? data-
? dir
Notes:
accesskey is currently not translatable; I filed a bug; if it is
accepted, I'd like o add it to the list of supported attributes.
https://www.w3.org/Bugs/Public/show_bug.cgi?id=23284
aria-valuetext is interesting because it can have multiple text
values depending on the value of the input; I suggest to add it
later.
http://www.w3.org/TR/wai-aria/states_and_properties#aria-valuetext
style must be parsed and recursively processed (e.g. for the values
of 'content' properties); I suggest to add it later.
data-* should not be translatable by default; developers can take
advantage of ITS translateRules, which as of ITS 2.0 can be linked
<link href="translateRules.xml" rel="its-rules">
or inlined:
<script type=application/its+xml id=ru1>
<its:rules version="2.0" xmlns:its="
http://www.w3.org/2005/11/its"
xmlns:h="
http://www.w3.org/1999/xhtml">
<its:translateRule selector="@*[starts-with(name(), 'data-')]"
translate="yes"/>
</its:rules>
</script>
See
http://www.w3.org/TR/its20/#html5-global-approach. I propose
that we consider adding ITS support after 1.0.
dir attribute is not listed as translatable, either as a global
attribute or when used on the bdi and bdo elements. I'm not sure
if it should. I looked at projects/
mozilla.com/trunk/{ar,fa,he} on
svn and neither uses "dir" even once. I tentatively marked it as
"?" for now, which means that we will consider adding support for
it after 1.0.
a, area
download
? href
? hreflang
Notes:
href and hreflang are currently not translatable, but I found an
interesting discussion about making them 'localizable'. This has
been slated for the next version of ITS, though. It looks like it
would be best to raise this topic again with Norbert. See:
http://www.w3.org/International/track/issues/217
To avoid security risks, I suggest we disallow href and hreflang in
translations in 1.0.
q, blockquote
? cite
Notes:
if the localizer translates a quote by choosing a different,
perhaps more culturaly relevant quote, the cite attribute should be
changed correspondingly; I think this is rare enough that we push
it after 1.0. Also see the discssion about 'localizable'
attributes mentioned at a, area.
input
+ value
Notes:
value should only be translatable on inputs with type: button,
submit, reset. Currently, the HTML5 draft doesn't take this into
account. I filed a bug about it:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=23283
menuitem, menu, optgroup, option, track
label
area, img, input
alt
input, textarea
placeholder
th
abbr
meta
* content
Notes:
content attribute should only be translatable if the name attribute
specifies a metadata name whose value is known to be translatable;
I suggest we implement this after 1.0
iframe
* srcdoc
Notes:
srcdoc must be parsed and recursively processed, see:
http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#attr-iframe-srcdoc
I suggest we implement this after 1.0.
* * *
Thanks for reading this far,
-stas
--
@stas