Social contract in the L20n world

Zibi Braniecki

unread,

Aug 25, 2015, 3:11:24 PM8/25/15

to mozilla-t...@lists.mozilla.org

Hi all,

In https://bugzilla.mozilla.org/show_bug.cgi?id=1193059 we encountered a difference in perspective between our current model of thinking and L20n approach in handling en-US translation changes that I think is worth discussing in the context of L20n approach.

Basically, historically en-US entity was a reference point for translation entities and we were forcing entity ID change whenever en-US string changed.

The canonical example goes like this:

1) Entity <foo "Hello World"> in en-US gets changed to "<foo2 "Good morning World">
2) Localizers see obsolete entity "foo" and new entity "foo2" and they know they need to translate the new one and remove the old one

It brought us to extreme because we need to be caution and there were cases where non-semantic changes to en-US triggered id changes. Example:

1) Entity <foo "Whats up {{$user}}"> in en-US gets a spelling fixed to "<foo "What's up {{$user}}"> and a comma added
2) We demand ID update.

I'd argue that even in .properties model we should *not* change the ID in this scenario because the semantic meaning of the entity <foo> remains the same and all translations should be valid and not need any changes.

Another example is in the bug linked above, where we switch to using a placeable in en-US but the value of the entity doesn't change.

Now, in L20n world that of course goes *much* further. One of the paradigms changes we bring with L20n is breaking the 1-1 coupling between en-US and localizations, and one of the aspects of that is that en-US may have different form than localizations.
Example:

1) en-US strings is added:

<foo "There are {{$number}} messages">

2) localizations translate entity <foo>

3) en-US copy is updated to look better for "zero" case:

<foo[$number == 0 ? 'zero' : 'other'] {
zero: "There are no new messages",
other: "There are {{$number}} messages"
}>

I believe that in L20n paradigm this should *not* warrant entity ID change.

The rationale is like this:

Entity with ID "foo" is a social contract between the developer and the localizer. The developer is saying "foo" will be displayed in place X in UI and I will provide you variable "$number" which will be an integer and you should provide me a string that says how many new messages are there.

In some languages the "There are {{$number}} messages" may be better for all cases, in others the special case for zero may look better.

en-US is becoming one of the languages and we should not bind the social contract to the en-US copy, but to the semantic meaning and location of the entity.

The same way as en-US may be a Hash with plural index, and de may be a string, en-US may also be modified from being a string to be a hash or back, and it shouldn't trigger automatic invalidation of translations.

What do you guys think?
zb.

Francesco Lodolo [:flod]

unread,

Aug 25, 2015, 3:43:40 PM8/25/15

to mozilla-t...@lists.mozilla.org

> It brought us to extreme because we need to be caution and there were
> cases where non-semantic changes to en-US triggered id changes. Example:
>
> 1) Entity <foo "Whats up {{$user}}"> in en-US gets a spelling fixed to
> "<foo "What's up {{$user}}"> and a comma added
> 2) We demand ID update.
>

To clarify: we do *not* ask to change ID for this kind of modifications.

Typos, internal consistency matters like punctuation or case, minor changes
are explicitly called out as exceptions to that rule.
https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localization_content_best_practices#Changing_existing_strings

I won't bug developers who do change the ID in these cases (eventually
point out that it wasn't needed), while I'll do it if the string changes
without a new ID and the kind of change is not trivial.

I believe that in L20n paradigm this should *not* warrant entity ID change.
>

I agree, in the L20n world. The flexibility that l20n brings to the table
comes at a price: localizers can break things in a ton of new creative
ways, and it's up to tools and localizers to be able to deal with that. But
we're not there yet.

In a L20n scenario, the only change that would require a new ID would be
the removal of support for $number.

> en-US is becoming one of the languages and we should not bind the social
> contract to the en-US copy, but to the semantic meaning and location of the
> entity.
>
>

We need to agree on "semantic meaning". Sometimes English strings,
especially in Gaia, are so poorly worded that we ask to have a new ID to
make sure localizers are aware of the change (the meaning was there, just
explained terribly), and they can decide if they want to apply similar
changes. I think that's still a valid reason to change the ID.

Francesco

Axel Hecht

unread,

Aug 25, 2015, 4:28:17 PM8/25/15

to mozilla-t...@lists.mozilla.org

What flod said.

Axel

Robert Kaiser

unread,

Aug 26, 2015, 3:18:46 PM8/26/15

to mozilla-t...@lists.mozilla.org

Zibi Braniecki schrieb:

> I believe that in L20n paradigm this should *not* warrant entity ID change.

For me, the most important rules has always been:

We change the ID for every change that needs attention from localizers.
If we don't change IDs, tools will assume everything is fine and
localizers will not look at or change this string.

Now, either we have tools that alert localizers of changes that may be
interesting to them to react to (if en-US makes the "zero" case better,
a number of other localizations might want to do the same) or we
probably still need to follow a similar rule set.

So, what do we have on the tools side that changes the game
significantly there?

KaiRo

Zibi Braniecki

unread,

Aug 28, 2015, 11:32:52 PM8/28/15

to mozilla-t...@lists.mozilla.org

On Tuesday, August 25, 2015 at 12:43:40 PM UTC-7, Francesco Lodolo [:flod] wrote:
> Typos, internal consistency matters like punctuation or case, minor changes
> are explicitly called out as exceptions to that rule.
> https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localization_content_best_practices#Changing_existing_strings
>
> I won't bug developers who do change the ID in these cases (eventually
> point out that it wasn't needed), while I'll do it if the string changes
> without a new ID and the kind of change is not trivial.

I remember multiple cases where we asked developers to change the ID despite the fact that the change was trivial.

The rationale then was that "since the en-US localization was buggy, maybe some locales followed so it's better to upgrade ID so that localizers can update their translations".

Imho that's the trap we're falling into.

> But we're not there yet.

We're actively moving toward that with every aspect of the ecosystem. I started this thread to catch up in this regard.

> In a L20n scenario, the only change that would require a new ID would be
> the removal of support for $number.

Agree.

> We need to agree on "semantic meaning". Sometimes English strings,
> especially in Gaia, are so poorly worded that we ask to have a new ID to
> make sure localizers are aware of the change (the meaning was there, just
> explained terribly), and they can decide if they want to apply similar
> changes. I think that's still a valid reason to change the ID.

I remember there was a time when we considered asking developers to provide the original translation in what is called Basic English[1].

Then, we would create a proper en-US based on this and all localizations would base their translations on the Basic English, not en-US.

Maybe it's time to revisit that and bind the social contract to the Basic English?

zb.

[1] https://en.wikipedia.org/wiki/Basic_English

Zibi Braniecki

unread,

Aug 28, 2015, 11:35:47 PM8/28/15

to mozilla-t...@lists.mozilla.org

On Wednesday, August 26, 2015 at 12:18:46 PM UTC-7, Robert Kaiser wrote:
> Now, either we have tools that alert localizers of changes that may be
> interesting to them to react to (if en-US makes the "zero" case better,
> a number of other localizations might want to do the same) or we
> probably still need to follow a similar rule set.

That's precisely the trap that I'm trying to avoid. "Maybe locales will want to improve their translations when they see that en-US improves" is a fallacy in the world where we break 1-1 matching.

The string has one meaning. en-US can work on it the way they want and as long as the meaning and the variables and the placement stays the same, other locales should work on their independently.

It's almost like saying "we should alert localizers when german version of that string changes, because maybe german localizers fine tuned it in a way that others might want to copy".

I believe we should not. Neither for de, nor for en-US. I asking to deprioritize en-US and stop assuming that every typo en-US has will be followed to the letter by localizers, because it shouldn't.

zb.

Ricardo Palomares Martínez

unread,

Aug 30, 2015, 3:42:36 PM8/30/15

to mozilla-t...@lists.mozilla.org

[Resending, as I mistakenlly sent it just to Zibi]

El 29/08/15 a las 05:35, Zibi Braniecki escribió:

> On Wednesday, August 26, 2015 at 12:18:46 PM UTC-7, Robert Kaiser wrote:

>> Now, either we have tools that alert localizers of changes that may be
>> interesting to them to react to (if en-US makes the "zero" case better,
>> a number of other localizations might want to do the same) or we
>> probably still need to follow a similar rule set.
>

> That's precisely the trap that I'm trying to avoid. "Maybe locales will want to improve their translations when they see that en-US improves" is a fallacy in the world where we break 1-1 matching.

This is a pretty interesting problem. I plan to support L20n some time
in the future in the tool I'm writing to replace MozillaTranslator,
and the whole break of 1:1 relation between, not IDs, but indexes,
requires that tools check every string in the whole translation file
set for ID+index combinations used as variables, and then look up the
ID to see if the index is defined.

If you change the ID in en-US (which, Zibi, can't be disregarded as
the reference for localizers), then you may be forcing localizers to
change A LOT of different strings, not just the ID-changed L20n object
itself. Do that in a shared (dom, netwerk, security, toolkit) module
and you're requiring to review the entire translation.

In contrast, if the tool is able to "see" that an en-US string content
has changed, it can present it to the localizer, and he/she will be
able to decide if that involves deeper changes in the localized ID and
their usages across the whole translation.

> It's almost like saying "we should alert localizers when german
> version of that string changes, because maybe german localizers
> fine tuned it in a way that others might want to copy".

It is not the same, because the German version is tuning just for
their own language. en-US "tune" may or may not have an impact in the
rest of localizations; it even could have an impact in some of them
while not being relevant for others.

> I believe we should not. Neither for de, nor for en-US. I asking to
> deprioritize en-US and stop assuming that every typo en-US has will
> be followed to the letter by localizers, because it shouldn't.

Unless you find a way to get tools to distinguish if a change in the
en-US strings are due to a typo or to a semantic change, you can't
"deprioritize" en-US, because that's what all localizations follow.
en-US IS the reference to the rest of localizations.

But I'm not contradicting myself. en-US shouldn't change IDs even for
semantic changes, but tools (and that includes L10n dashboard in the
first place) must be able to catch string changes, added and removed
indexes and custom data, etc., and warn localizers about such changes,
all of those without having to change the ID.

JM2C

--
Proyecto NAVE
Mozilla Localization Project, es-ES Team
http://www.proyectonave.es/
Diaspora: rick...@diasp.eu

Axel Hecht

unread,

Aug 31, 2015, 9:07:55 AM8/31/15

to mozilla-t...@lists.mozilla.org

I think it's important to clarify that you're wrong.

Typos in en-US do not warrant ID changes, and we need to make sure to
not say that. People get even more confused.

That said.

In theory, we need semantic versioning per string. The semantics of
these need to be defined in two contexts:

- programs, or other localized assets

- tools to localize

Programs:
The localized asset has basically two choices: Use the localized string,
or use something else.
In the old world, that question has to be answered strictly at build time.
In the l20n world, that question can be answered at run time, but also
optimized at build time.

Tools:
IIRC, pootle does mark strings fuzzy if English copy changes. As it has
the text in there verbatim, that's one way of "versioning".
For the source-code guys, depending on how they do it, they might also
get "fuzzy" notifications. I.e., if you step through patches affecting
locales/en-US, you will hit copy diffs, which you can examine via diff.

I see one main reason to change IDs:

- previously existing translations break, in particular in ways that
compare-locales doesn't catch.

That's "yeah, we just word this completely differently now", too.

And "completely differently" is just a human decision, I'm afraid.

Fixing English grammar on the hand, shouldn't trigger a change in
localizer's heads at all, and it's kinda unfortunate that po-based
workflows probably do.

Axel

zbran...@mozilla.com

unread,

Sep 28, 2015, 4:20:05 PM9/28/15

to mozilla-t...@lists.mozilla.org

On Monday, August 31, 2015 at 6:07:55 AM UTC-7, Axel Hecht wrote:
> I think it's important to clarify that you're wrong.

(...)

> Typos in en-US do not warrant ID changes, and we need to make sure to
> not say that. People get even more confused.

I am getting confused because I fix a typo in en-US and I'm getting scolded for not updating string ID! :)
(wink, wink :flod! ;))

>
> That said.
>
>
> In theory, we need semantic versioning per string. The semantics of
> these need to be defined in two contexts:
>
> - programs, or other localized assets
>
> - tools to localize

agree.

>
> Programs:
> The localized asset has basically two choices: Use the localized string,
> or use something else.
> In the old world, that question has to be answered strictly at build time.
> In the l20n world, that question can be answered at run time, but also
> optimized at build time.

agree.

> I see one main reason to change IDs:
>
> - previously existing translations break, in particular in ways that
> compare-locales doesn't catch.
>
> That's "yeah, we just word this completely differently now", too.
>
> And "completely differently" is just a human decision, I'm afraid.
>
> Fixing English grammar on the hand, shouldn't trigger a change in
> localizer's heads at all, and it's kinda unfortunate that po-based
> workflows probably do.

agree.

I would love us to get back to the conversation about semantic versioning of strings. I still don't know if it should be a docstring, a separate metadata file or a three-way-diff, but I see a huge value in moving from a binary world of "present"/"missing" to the world of full gradient from "you need to translate this string" through "you may want to check this string, but it's ok if you will not" to "that string is totally ok despite comma being fixed in en-US".

zb.