Since then, progress has been made on a number of topics related to
the l20n file format. You can review those discussions, and
Are you planning on posting the requirements and goals description as
was asked for last time? That way people like me understand what the
We will write documentation on how to use it, but I don't expect to
write documentation on "what the point is". That has proven to be hard
at least for the core team, and we're making better progress on having
interactive discussions to make sure that we're picking up the people
we're talking to at the right place, and answer the questions they have
when they have it.
I don't think that this is an effort that is suitable for that "one
document to get the point across", mostly because "people" isn't clearly
defined. There's the language piece of it, which is interesting to
localizers. There's the API point of it, which is interesting to gecko
app hackers. There's the infrastructure point in it, which is
interesting to those people doing build and release. There's the tooling
pieces that are interesting to people writing translation tools.
All of these are more or less interwoven, so even if we focused
documents on one target audience, it might very well end up sounding fishy.
Axel, who, honestly, tried.
I'm just a member in the discussion, nobody with a firm hand over the
project, but from where I stand, this is a overview on my view of
problems, requirements, strategy, goals, and status of this project:
Problems / current situation:
- Two different L10n file formats with completely different syntaxes
are confusing and complicated, both for developers and localizers.
- DTDs have no good way of inserting computed variables into strings.
- The way we do variables in stringbundles (more or less sprintf-style)
makes it easy to confuse what variable is inserting what.
- The way we are doing plural forms in some places is hacky.
- Some languages need different gender or declination forms if
variables change (e.g. in "Please restart $app" different forms of
"restart" depending on $app being Thunderbird (say, female) or
Firefox (say, male) and no current L10n system, not even gettext/.PO
has a solution for those cases.
- Variants of a language (say, US English vs. British English vs.
Canadian) need all strings copied over, even if only a small amount
actually gets changed.
- Add-ons that only need a handful of strings need to keep all
translations in different files and probably directories for each
- Developers need to change string IDs even for relatively minor
changes in strings if localizers need to take note.
- Adding new strings in releases (see FF 3.6.4) is rather complicated
as missing strings in localizations generate errors.
- If any language requires different forms for strings in certain
cases, the developers needs to implement support for that, and all
localizers need to localize all different forms (e.g. "1st, 2nd, 3rd,
4th, 5th, ..." in English and "1., 2., 3., 4., 5., ..." in German).
- Localization of Mozilla products and websites require different
tools, file formats and L10n systems.
Requirements for a new system:
- Have possible solutions for all the problem cases above, i.e.:
- A single L10n file format for everything.
- Named variables (computed or localized strings) that can be inserted
flexibly into strings and can trigger different forms of strings.
- Put intelligence of selection of different strings needed for
variable values or properties (gender, etc.) into the localizers'
hands, making it easy for developers to support languages they don't
- Support fallback to other locales when strings are not found.
- Provide means for better change management so developers don't need
to change string IDs when the basic meaning of strings doesn't
- Provide possible means for Add-ons to merge several localizations
into one file, make it a process decision if it can/should be done,
but not a L10n system limitation.
- Enable good ways for powerful localization tools to build on this
system and make the job easier for localizers.
- Keep application performance at a high level while providing this
additional simplicity for developers and possibility for localizers.
Create a new system, including file format, after we have evaluated all
existing solutions for a long time and found that they don't match all
our requirements. There are a few formats and systems that are better in
some ways than what Mozilla currently has, but they fail at other
requirements, esp. the flexibility for gender, declination, etc.
Goals of L20n:
- Create a file format that is as simple as possible for most of the
typical strings that only need one representation but can scale to
the full power and complexity needed by our requirements in more
- Hide the complexity of languages from developers as well as
localizers in languages that don't need it.
- Give us the possibility of having multi-language files, make it a
process decision if/where to use them.
- Make the design generic so other projects can easily use it as well.
- As localized files can contain some high-level logic that can require
some processing, provide means to "compile" them into an intermediate
state that can be used in a performant way by the code.
- Implement a multi-programming-language API so PHP, Python, JS, XUL,
C/C++ and others can all use the same localization files behind the
- Implement a tool chain for integrating the new system into Mozilla
processes at least as well as the existing one.
- Agreement on needed features has been reached in the interested parts
of the Mozilla L10n community after ideas have been presented years
ago by Axel and the discussion had been smoldering since.
- Work on performance and file format possibilities has been done in a
Mozilla internship last year, outcome is that JSON is bad for the
actual file format, but probably good as "compiled" representation to
be read by code.
- People from other projects outside Mozilla have been invited and IIRC
seemed interested in the outcome, but didn't really jump in to
participate in the creation a lot.
- Working group led by Mozilla L10n (seth, Pike, gandalf, stas).
- File format has been agreed on by the working group to a large part.
- Experimental implementations in code and APIs are being worked on to
see what remaining questions on the file format come up and how to
- Ideas and plans exist for the remaining steps, I think, it's best to
ask the Mozilla L10n team for details there.
As I said, that's the overview from my POV, and I'm not 100% sure of its
completeness, as this is mostly from the top of my head. Also, there may
be assumptions that are not completely in line with the rest of the
people following it or working on it, as this is purely my view.
Still, I hope it sheds some light on what this is, why we are working on
it and what we are doing there.
> Still, I hope it sheds some light on what this is, why we are working on it and what we are doing there.
Thanks, KaiRo, that was really helpful.
Thank you! This is exactly (and more than) what I was looking for.
Reading back at the comments again, I understand much more clearly why
certain design decisions were made, whereas before my reaction was just
confusion. I can now provide useful feedback.
I'm happy I could help you there and make it easier to understand.
And actually, I've been uncomfortable for a while with the disparity
between the L10n people who are enthusiastic about us working on that
and the dev community saying "wtf is up there?" - as after all, this
should help both sides. ;-)
Note that any statements of mine - no matter how passionate - are never
meant to be offensive but very often as food for thought or possible
arguments that we as a community needs answers to. And most of the time,
I even appreciate irony and fun! :)
On 25.07.10 18:47, Robert Kaiser wrote:
> - Variants of a language (say, US English vs. British English vs.
> Canadian) need all strings copied over, even if only a small amount
> actually gets changed.
This is a decision on how to ship software, and to some extent mingled
with the multi-locale files I'll talk about below. Generally, though, we
should do as much as we can at build time instead of at runtime. Having
l20n not turn taras sad will be tough even then ;-).
> - Add-ons that only need a handful of strings need to keep all
> translations in different files and probably directories for each
To me, this is a non-goal for l20n, but *the* goal of our alternative
approach, nick-named "commonpool". This alternative approach makes all
compromises you need to do on the side of ease-of-coder, and accepts to
fall down on l10n quality. But for small jetpacks with a handful of
strings, it will probably work just fine. gandalf is working on that
with the jetpack team, and with the folks from transifex for the webparts.
> - Adding new strings in releases (see FF 3.6.4) is rather complicated
> as missing strings in localizations generate errors.
I don't think it's important to fix this per-se, we know how to deal
with this at build time. Also, it's more of a ship-shit problem, IMHO.
Coding the fallbacks should become easier, though.
A few more notes:
There's currently a lively discussion on what the multi-locale file
format will do. To me, it's a source coding style question. I don't
expect it to solve any software distribution problems. This is an
evolving topic, though.
More on topic for .planning:
We're removing artifacts from the caller language from l10n, that is, no
artifacts of js, c++, xml, xbl, whatnot. The files that localizers work
on are, in a sense, declarative programming for a very special and
simple state machine. Which means, we know what works in those files,
and how, without having to digg into the calling code. That will empower
build tools to be much more reliable on finding problems and fixing
them, in particular as we'll define error recovery on the source file
level, too. Think about it as static analysis for l10n files. No more
YSOD or crashes due to bad formatters in printf statements, no more "but
plurals use #1 instead of %S" and such ugh.
And yet another amendment, the last for today:
I intend to formulate rather strictly how editors work on l20n source
files. No more "you wanna join l10n? ask the others which tools they use
or bust". An editor does A, B, C, and must not do D and E, mostly guided
by "you must not break blame".
Thanks again to KaiRo for the initial response here
Those three cases were just existing problem cases we currently have,
and don't mean that L20n would immediately solve them or need to solve
them, but I listed all of them because we are thinking about them in the
L20n process and at least trying to make them easier to solve. I think I
haven't explicitely listed them in the "Goals" section, have I?
I have put some "make it a process decision if it can/should be done,
but not a L10n system limitation" or similar statements in such places
in the actual "Requirements" and "Goals" sections, I hope that meets
what the official thinking is.
> There's currently a lively discussion on what the multi-locale file
> format will do. To me, it's a source coding style question. I don't
> expect it to solve any software distribution problems.
Sure, it may not solve them, but potentially bring up additional ways
for coming up with potential solutions.
From my POV, L20n may not solve all problems, but it empowers us to
find solutions more easily for many of those it doesn't solve right away.
> No more
> YSOD or crashes due to bad formatters in printf statements, no more "but
> plurals use #1 instead of %S" and such ugh.
That might be worth explicit mentioning here, yes, though it's
implicitly a part of my descriptions. :)
(And I'll so much cheer for that...)
> I intend to formulate rather strictly how editors work on l20n source
> files. No more "you wanna join l10n? ask the others which tools they use
> or bust". An editor does A, B, C, and must not do D and E, mostly guided
> by "you must not break blame".
That's a process decision on the Mozilla side though, right?
> Thanks again to KaiRo for the initial response here
No problem, I hope I have helped all interested parties with that!
No, not really. Tool authors should expect their users to file bugs, and
hopefully drop using their tools, on violation of those rules. I
wouldn't know any reason to use bad editors. It's neither a complex rule
set, nor an overly ambitious one. Just "if you fix a typo in one string,
the editor will fix just that byte, and not touch any other". Plus some
SHOULD clauses on how to add new entities.
Ah, OK, sounds reasonable.