localizing ubiquity

"mitcho (Michael 芳貴 Erlewine)"

unread,

May 21, 2009, 12:37:22 AM5/21/09

to ubiqui...@googlegroups.com

Hi all,

Yesterday Jono and I discussed some of the remaining Big Issues to
resolve before making Parser 2 the default parser for Ubiquity [1] and
it was very clear that deciding on an approach for making Ubiquity
commands and nountypes localizable is our number one priority right
now. We got a good conversation started in our meeting but as it was
getting long, we're moving this conversation back to the listhost.
Here's a recap of our discussion today:

[1] http://mitcho.com/blog/projects/big-issues-and-small-issues-with-parser-2/

--

# OUR GOAL

We would like to make two types of data localizable:

1. Ubiquity commands. These are Ubiquity "verbs" which may take
certain arguments and whose actions can be previewed and executed.
2. Ubiquity nountypes. Each nountype defines a class of argument
strings which may be accepted as an argument to a verb. For example,
built-in nountypes include number, language, URL, date.

Note that "localization of commands" and "localization of nountypes"
are two fundamentally different things. The commands require some
localized strings (the verb "names", messages in the preview's and
execute's, etc.) but a localization should not be able to change the
command's fundamental preview and execute actions (logic). Nountype
localization, however, requires updated logic: for example,
noun_type_date may accept different sets of strings when running in
different languages, due to the differing date formats of the locales.

We can break down the question of "how to make these localizable" into
two subproblems:

1. What will be the data structure of localized commands/nountypes
within Ubiquity?
2. How do we distribute/share these localizations?

I (mitcho) believe these subproblems are orthogonal.

# THE DATA STRUCTURE QUESTION: two approaches

There are broadly two approaches to the data structure question:
gettext-style string replacement vs. a unified object.

## gettext-style string replacement

In this approach, a verb might look like {name: _('move'),...} and the
underscore function uses the base string (here, 'move') as a key and
replaces it with the active locale's version on runtime. This
"dictionary" could be provided in the regular gettext-style (po or mo)
or in JSON.

PROS:
1. People are used to it (esp. in the unix world).
2. Cleanly separates strings from logic.
3. Doesn't require (much) knowledge of JS.

CONS:
1. Requires (unless we use some magic) command authors to use _() to
make strings localizable.
2. Doesn't allow localization of logic (js).
3. Some things are complicated: How would you gettext an array of
options, say? eval(_("[list]"))? _("list").split('|')? Would we use
templates for messages like "translating the selection from (source
language) to (goal language)"?

## Unified object approach

In this approach, a verb might look like {name: {en: 'move', fr:
'porte',...},...}. If I write a command like {name: {en: 'move'},...},
someone else could make a French copy {name: {fr: 'porte'},...} and
the objects could be unified.[2]

PROS:
1. Enables localization of logic.
2. Doesn't require diligent wrapping of all strings with some function
(cf gettext _())

CONS:
1. Requires some knowledge of JS.
2. Logic and strings are mixed.

[2] How exactly this happens depends on the distribution question, but
manual unification (by the command author) and automated unification
(via a centralized repository/authority (the herd) or in Ubiquity on
the client) are both possible.

## Thoughts

At the end of our conversation we sort of ended on the conclusion that
the gettext approach might be better suited towards verbs while the
unified object approach is better suited towards nountypes.

# THE DISTRIBUTION/SHARING QUESTION

In our meeting today we didn't get around to discussing the
distribution/sharing question much but I'll jot down some feelings:

1. Don't require the command author to collect/redistribute
localizations.
2. Don't require the user to subscribe to the command + localizations
separately.
3. There are benefits and downsides to both centralizing (for example,
doing it on the herd) and decentralizing (like current commands are).

--

I hope we can get this discussion rolling on the distribution
question... I'd also love to get some feedback on the two approaches
to the data structure question from some of our l10n folks.

Thanks!

mitcho

--
mitcho (Michael 芳貴 Erlewine)
mit...@mitcho.com
http://mitcho.com/
linguist, coder, teacher

Jono

unread,

May 21, 2009, 1:11:54 AM5/21/09

to ubiqui...@googlegroups.com

I'd just like to add that I feel like we've been going around in
circles about these issues for a while now. We probably won't find a
solution that perfectly covers all edge cases, and I don't think we
need to. I favor coming up with a "good enough for now" solution,
something that we can get implemented for an upcoming Ubiquity 0.1.9
release. This might mean that we handle just localization of string
data for verbs in this release, and leave nountypes as a problem for
later. Then at least localizers can start doing something, and it
will be a lot better than what we have now.

--Jono

Francesco Lodolo

unread,

May 21, 2009, 1:19:35 AM5/21/09

to ubiqui...@googlegroups.com

Il 21/05/09 07:11, Jono ha scritto:

> I favor coming up with a "good enough for now" solution,
> something that we can get implemented for an upcoming Ubiquity 0.1.9
> release. This might mean that we handle just localization of string
> data for verbs in this release, and leave nountypes as a problem for
> later. Then at least localizers can start doing something, and it
> will be a lot better than what we have now.
>

About this (partially OT), I think that Ubiquity should have its own
language setting: if I use a localized browser, for example in Italian,
I don't necessarily want to use Ubiquity in the same language, and I
should have a simple way to switch back to English.

.flod

"mitcho (Michael 芳貴 Erlewine)"

unread,

May 21, 2009, 1:28:10 AM5/21/09

to ubiqui...@googlegroups.com

The language setting (and parser version setting) was discussed in
today's meeting as well... it should be added to the Settings page of
about:ubiquity in 0.1.9.

mitcho

dynamis

unread,

May 21, 2009, 8:28:19 AM5/21/09

to ubiqui...@googlegroups.com

Hi mitho and all,

I agree that we should use gettext-style string replacement for command l10n and
unified object approach for noun type l10n.

As for l10n string definition file, we localizer use properties file for firefox
l10n and _() can read strings from it with nsIStringBundle interface
https://developer.mozilla.org/en/XUL_Tutorial/Property_Files
https://developer.mozilla.org/en/nsIStringBundle

So maybe simplest gettext function will be like

function _(id) {
return _.stringbundle.getString(id);
}
_.stringbundle = document.getElementById("translate.ja");

# when we install commands, generate stringbundleset in our xul like:
# <stringbundleset id="strbundles">
# <stringbundle id="translate.en-US" src="translate.en-US.properties"/>
# <stringbundle id="translate.ja" src="translate.ja.properties"/>
# <stringbundle id="google.ja" src="google.ja.properties"/>
# <stringbundle id="foocommand.ja" src="foocommand.ja.properties"/>
# </stringbundleset>

PROS of properties file format are:
- Firefox support it natively and enough fast to load
- me can also use PluralForm.jsm to handle plural forms if needed
# https://developer.mozilla.org/ja/Localization_and_Plurals
- with Translate Toolkit (moz2po, po2moz and pootle), we can setup command l10n
server easily (you can ask Axel about pootle setup)
CONS:
- not standard format of gettext (can convert with moz2po)

And I think ubiquity must support locale fallback system, that is, ubiquity
should use english (or any command's original locale) text if the command isn't
localized (or partly localized) into user's locale. We can handle this like:

unction _(id) {
return _.stringbundle.getString(id) || _.stringbundle_.getString(id)
}
_.stringbundle_ = document.getElementById("translate.en-US"); // default locale
_.stringbundle = document.getElementById("translate.ja"); // user locale

As for command format, we should use gettext implicitly for common labels to
avoid not-localize-able commands. value of some basic labels should be passed to
gettext automatically by ubiquity.

That is, comman format should be like
{name: 'move', description: 'move object to some where'...
... _('something in the code') ...
// use _() explicitly only the middle of the code
}
not:
{name: _('move'), description: _('move object to some where')...
... _('something in the code') ...
}

As for label id, we should localize differently depending on the context and all
the lables should have unique ids. So of course we must define string entities
for each commands (never reuse translation of other commands). Not only do that
but if ubiquity have some mechanism that localizer can define different
translation if the original command use same id in more than 2 places.

for example, if user install this command:
{name: 'move', description: 'move object to some where'...
... _('foo') ... _('foo') ...
}
Ubiquity will pre process command and generate:
{name: 'move', description: 'move object to some where'...
... _('foo',0) ... _('foo',1) ...
}

and gettext will be like:

function _(id, i) {
// context dependent or common or default locale
return _.stringbundle.getString(id+"."+i) || _.stringbundle.getString(id) ||
_.stringbundle_.getString(id)
}
... snip ...

with this, localizer can define only "foo" or define context dependent "foo[0]"
and "foo[1]" if they want/need. That is if localizer want use common translation
move.ja.properites will be:
foo = ふー
and if he want translate differently move.ja.properites will be:
foo.0 = ふー
foo.1 = フー

And if we avoid gettext function overhead, we can pre-process when we install
command like (maybe overhead is not so heavy and need not do this):
{name: '移動', description: '何かをどこかに移動させます'...
... 'ふー' ... 'フー' ...
}

And as for logic localization we can define some standard way to overload some
methods of commands. for example, if command is:
movecommand = {name: 'move', somemethod: function(){} ...
... somemethod() ... }
localizer can define their locale's somemethod.
movecommand.somemethod = function () { /* ja specific */ }

If we support OOP like inherit/overload system for part of command logic, we can
overcome the CONS (2. Doesn't allow localization of logic (js).)

Sounds resonable?
# Of course my code sample above are just for easy to understand and need more
# try-catch etc in the actual implementation.

--
- dynamis (Technical Marketing)

Mozilla Japan : http://www.mozilla-japan.org/
Firefox Support : http://support.mozilla.com/ja/kb/
L10N Forum : http://forums.firehacks.org/l10n/
Translation Forum : http://forums.firehacks.org/trans/

"mitcho (Michael 芳貴 Erlewine)"

unread,

May 21, 2009, 10:22:17 PM5/21/09

to ubiqui...@googlegroups.com

Hi dynamis— thanks for the detailed comments and code snippets!

And thank you for pointing out the stringbundleset system built into
Fx... if we take the gettext approach perhaps it will make sense to
use that system if our community is used to it. (I also dug around and
found instructions on loading in a stringbundle without XUL: http://www.xuldev.org/blog/?p=45
[Japanese].)

What do you think about encoding arrays or sets of alternatives? For
example, a verb might have different synonymous names: "email", "mail"
in English, "メールする", "メールして", "送る", etc. in
Japanese. Is using a delimiter like | a good option (email=メールす
る|メールして|送る), or are there better options?

Thanks again,

mitcho

Reply all

Reply to author

Forward