Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

L10n tools talking

3 views
Skip to first unread message

Axel Hecht

unread,
Oct 30, 2007, 1:29:54 PM10/30/07
to
Hi all,

there's some talk about tools for localization going on at different levels and in different groups, and I'd like to use this post to lay out how I am looking at those. To some extent this should help you anticipate which questions I'd ask, and maybe it helps to shape the discussion in a way that helps the different groups to find ways to collaborate. Writing this down should help me to shape my thoughts, too. I'll offer a few key talking points together with some language that I use when talking about them.

I'm being a tad evil in posting in HTML, but I need the additional markup, so bah on me.

Not all that you'll see below is sequential or orthogonal, I guess that's just the way it is. I'll pull the hairball in my head at a few strings and we'll see how that goes. It's not complete either.

Workflow

Tools model bigger or smaller parts of one or more workflows. There are a variety of workflows happening right now in the different localization teams, but they merge into a common one when it comes down to work on the stable branch for Firefox and Thunderbird.
Even the naive workflow does differ for new and for existing localizations, of course. There are different options depending on the size of the team in question. In this way, the workflow section does intersect significantly with the target audience of a tool.
Tools should position themselves explicitly.

Target audience

For whom is that particular tool designed? I guess there are
  • Experienced localizers
    • within Mozilla
    • within other FOSS projects
    • within commercial applications
  • New localizers
  • Language-savvy non-localizers
  • End-user feedback

Architecture

I see some basic building blocks for at least the editing-centric part of tools. There is some inclusion of workflow parts in the following diagram (wow, what a huge word for a table), though.

UI
Translation algorigthms
  • Glossaries
  • Translation Memories
  • Machine translation
  • Dictionaries
  • Spell checking
Local translation tests
  • Completeness
  • Variable verification
  • CSS specs
Integration tests
  • Accesskeys
  • Help links
  • Dialog sizes
Data abstraction
Parsing Serialization


So I see a bunch of building blocks, which are probably more or less tightly hooked up. There's obviously some sort of IO and data abstraction layer. Building on top of that are algorithms that help in translation, among those would be TM, MT, glossaries, etc. Helping out on this layer I see local tests like completeness (compare-locales-like), variable verification (do we have a %2$s in both en-US and l10n?), and more heuristic tests, like, "en-US looks like a CSS spec, are you sure you want to translate 'height' here?".
On top of that comes some UI.
I added "integration tests" to the picture, though those might be already a bit in the workflow field. Anyway, those are tests that can't be run on a single translatable entity, and should be tested in the live application, or in another sufficiently complete context.

I'm not sure how "expose the context of the current localizable string" fits into the picture. It might actually be part of the integration part, if I make that less of a test and more of a tool box.

Online/offline/desktop

For the building blocks above, and for those in the workflow modeling, you can decide whether it should be run locally, or on a server, and if so, what happens with online and offline work. Locally could mean in the Browser, in Prism, or in a real desktop app.

Algorithms

Some of the blocks above talk about algorithms. I see a way to use the same algorithms for different target audience and workflows, so being able to factor out algorithms sounds like a good thing. On top of being able to share the algorithm, being able to share an implementation is going to be a plus, too. That might be done through using compatible licenses and implementation languages or by exposing online APIs to webservices.

Ecosystem

One big feature of any tool is the ecosystem that supports it. Code reuse is one thing, having more than one person being able to read and understand the code is another, and of course, if there's more than one person fixing stuff, it's going to be better.
Code reuse is not only restricted to the actual l10n algorithms, but also extends to things like web frameworks.

Resources

What does it take to run the tool? I'm not too worried about n MB of RAM, but if you're talking server, how many modules does it require to keep up-to-date? Or, for a desktop app, does it require a connection to a server permanently? These questions are hard to answer, but even fuzzy answers are better than none.

What do you guys think? What did I miss that I didn't miss intentionally? How do we talk about things that I did miss intentionally (i.e., workflow)?

Axel

Ricardo Palomares Martinez

unread,
Oct 31, 2007, 6:46:10 PM10/31/07
to
Axel Hecht escribió:

> Hi all,
>
> there's some talk about tools for localization going on at different
> levels and in different groups, and I'd like to use this post to lay out
> how I am looking at those.


First of all, please accept my apologies for not answering your
requests at m.d.i18n. The main reason is that progress in my L10n tool
has been stalled for months now and, while I should, I can't promise
this is going to change dramatically. For that reason, I'm going brief
in my comments in this thread.

Besides that, as my L10n tool is my final year project (yes, it has
been the final year project for two or three years now), I must work
alone until I release something usable that can pass the academic
analysis. The tool is Java-based.

> Workflow
>
> Tools model bigger or smaller parts of one or more workflows. There are
> a variety of workflows happening right now in the different localization
> teams, but they merge into a common one when it comes down to work on
> the stable branch for Firefox and Thunderbird.
> Even the naive workflow does differ for new and for existing
> localizations, of course. There are different options depending on the
> size of the team in question. In this way, the workflow section does
> intersect significantly with the target audience of a tool.
> Tools should position themselves explicitly.


In the initial stage, I don't intend to bundle VCS (generic Version
Control System support, not just CVS) support, except for "pre-import"
and "post-export" scripts that can be run before importing and
exporting. Thus, bash scripts for Linux and equivalent solutions for
Windows could be written _outside_ the tool to invoke CVS or
Mercurial, to run diffs, compare-locales scripts, and so.


> Target audience
>
> For whom is that particular tool designed? I guess there are
>

> * Experienced localizers
> o within Mozilla
> o within other FOSS projects
> o within commercial applications
> * New localizers
> * Language-savvy non-localizers
> * End-user feedback


Experienced localizers within Mozilla, although new localizers could
learn it (after all, I learnt to use MozillaTranslator with just an
incomplete manual; there was no VCS concepts to learn or trademark
policies involved at that time, though, but I think those are external
to any L10n tool).

Language savvy non-localizers are localized nightlies testers to me.
:-? The localization team resources (web pages, mailing lists, etc.)
should provide resources for end-user feedback, IMHO.

> Architecture
>
> I see some basic building blocks for at least the editing-centric part
> of tools. There is some inclusion of workflow parts in the following
> diagram (wow, what a huge word for a table), though.
>

(I should have answered also in HTML) :-)


Overall architecture

I tend to agree with your envisioned architecture. I'm not 100% sure
that everything can be so independent of the data model as your
architecture suggests (esp. if we think about L20n and not just
current DTD and properties), but I'm not denying it can't be done.

Translation algorithms

I'd like to include glossaries and translation memories. Except for
auto-translating strings based on translation memories where a new
entity/key has a already-existing string with just one possible
translation, I don't intend to provide machine translation. I found
time ago a Java API to use myspell dictionaries, but I haven't merged
that yet into the code.

Local translation tests

Completeness and variable verification exist for MozillaTranslator and
my new L10n tools should at least keep current status, if not enhance
it. Regarding CSS specs, I initially see a very marginal benefit on
it, although I may be wrong. Anyway, it won't be part of a first
usable release, for sure.

Data abstraction, parsing and serialization are definitely considered.
As I commented in the thread in m.d.i18n, I see some challenges in
L20n for data abstraction, esp. if we want a common model for current
L10n file formats and L20n. This can be further discussed.

Integration tests

Some basic checks are being done right now in MT for accesskeys. I
stand in my position that Mozilla could do it better to allow better
automated checks for accesskeys. It may be too expensive in workload
terms to do it for current L10n architecture, but it definitely worths
a look in L20n.


> Online/offline/desktop
>
> For the building blocks above, and for those in the workflow modeling,
> you can decide whether it should be run locally, or on a server, and if
> so, what happens with online and offline work. Locally could mean in the
> Browser, in Prism, or in a real desktop app.


Initially, it will be a Java desktop application running either a
bundled DB or a JDBC connection to a SQL server. In the distant
future, I'd like to turn most features into webservices so it can be
run as backend for something presented in a webpage or alternative
desktop UIs.

It will be Java-based because it is what I've spent my scarce free
time on to learn, and it will be desktop-based because I have yet to
know a web-based application that feels as productive as a
desktop-based one. Incidentally, I have a prejudice to web-based
applications in the sense I think that ocassional volunteers may be
more inclined to enter in a web-based tool, and this could degrade
translation quality. My bias, as I've said.


> Algorithms
>
> Some of the blocks above talk about algorithms. I see a way to use the
> same algorithms for different target audience and workflows, so being
> able to factor out algorithms sounds like a good thing. On top of being
> able to share the algorithm, being able to share an implementation is
> going to be a plus, too. That might be done through using compatible
> licenses and implementation languages or by exposing online APIs to
> webservices.


I'm not sure I can go public with the algorithms and implementation
before getting my final year project approved. Anyway, as far as it
can be done without additional work, I intend to port new features to
MT (for instance, I'm implementing glossaries in a way that can be
used in MT), which is MPL licensed and has source code available at
sf.net (don't worry, Axel, you don't need to create again your account
there). ;-)


> Ecosystem
>
> One big feature of any tool is the ecosystem that supports it. Code
> reuse is one thing, having more than one person being able to read and
> understand the code is another, and of course, if there's more than one
> person fixing stuff, it's going to be better.
> Code reuse is not only restricted to the actual l10n algorithms, but
> also extends to things like web frameworks.


Of course, once I get my academic duties done, the code would go
public in an open source license (not sure which one, though, but it
will surely be either GPL, MPL or CDDL, which is very similar to MPL I
think).


> Resources
>
> What does it take to run the tool? I'm not too worried about n MB of
> RAM,


I do. :-) I expect that, by using a DB (bundled or not), the memory
usage goes down from MT. Not a lot, but something.


> but if you're talking server, how many modules does it require to
> keep up-to-date?


N/A at this moment.


> Or, for a desktop app, does it require a connection to
> a server permanently?


I don't think a permanent connection to a SQL server thru the internet
will be a usual setup.

> These questions are hard to answer, but even fuzzy
> answers are better than none.


OK, this is my starting point. For newcomers to es-ES L10n process,
the bigger complaint was difficulty to setup Java and MT (for those
who don't know it, in Windows it is just installing Java as any other
application, AFAIK, whereas in Linux you also need to set a couple of
environment variables and add a path to your PATH variable).

I can only imagine people screaming and escaping if I tell them that
they have to setup Java *and* a full-fledged SQL database with
security concerns, TCP ports messing and so, just to help in
localization. So the most usual scenario to me for my L10n tool would
be to use the bundled SQL server (Java-DB, aka Apache Derby) and start
using the program.

The program will create the database and tables, and I'm using Apache
DDLUtils to allow me to change the database schema if a new version
of the L10n tool needs it, without the user having to back up the
database, running scripts or typing SQL commands.

I expect to have something similar to current (in MT) partial glossary
import-export features, so people can collaborate without needing to
have direct access to CVS.


> What do you guys think? What did I miss that I didn't miss
> intentionally? How do we talk about things that I did miss intentionally
> (i.e., workflow)?


Just out of curiosity, what other L10n tools are in the way that have
been mentioned as being alive in the last year or so?

Ricardo

--
If it's true that we are here to help others,
then what exactly are the OTHERS here for?

0 new messages