Planning a future L10N infrastructure (including Fedora)

15 views
Skip to first unread message

Asgeir Frimannsson

unread,
Sep 15, 2008, 2:09:22 AM9/15/08
to Fedora Infrastructure, transif...@googlegroups.com
Hi infrastructure wranglers,

(cc transifex-devel)

Over the last few months, a few of us involved in Red Hat L10N engineering
have discussed how to best ensure we have Localisation Infrastructure and
Tools that can serve the needs of Red Hat, JBoss, Fedora and 'upstream'
communities in years to come. Let me first describe some of the background and
requirements behind this project:

Up until now, we have managed translations through version control systems
such as CVS, Svn and Git. This has ensured that all contributions are pushed
upstream, as we always store translations within the upstream repositories and
projects. 'Damned Lies' further gave us a tool to view language-specific
translation statistics for modules, branches and releases, as well as
convenient information about people, teams and projects. This has been a great
help for translators in their work. Dimitris' (and others) work on Transifex
has in addition given the translation community a way to submit translations
upstream without ever touching a developer-centric version control system,
which has been of great help to translators.

Some of the immediate needs that could be addressed within the existing
framework (some of which are on the Transifex roadmap) are:
- Consolidation of Damned Lies and Transifex, allowing retrieving and
submitting translations through the same interface
- Allowing retrieving and submitting multiple-files at once (e.g. for
translating a publican document with many PO files)
- Simple workflow on top of Transifex (porting features from Vertimus)
- Better usability and easier user registration process (Fedora specific)

Transifex is gaining some traction upstream (e.g. within Gnome), and we hope
development will continue strong, serving Fedora and potentially other
upstream communities.

Looking at the bigger picture, some of the core requirements we have identified
for Red Hat and community L10N going forward are:
- Customizable Translation Workflows and integration with e.g. Content
Authoring Workflows
- Infrastructure easily adaptable to support new File formats and project
types (e.g. OpenOffice formats, CMS formats, DTP formats, Wiki, Dita, Java
formats), rather than relying on 'upstream' projects to fit a certain L10N
infrastructure.
- Managing the life-cycle of a translation project across releases and
iterations
- Translation Reuse and Terminology Management across projects and iterations
- Job management, scoping, tracking and resourcing
- Managing and/or Tracking upstream translation projects, pushing changes back
upstream.

These requirements require a system where the translation lifecycle would be
managed within 'Translation Repositories' (similar to e.g. Pootle or Launchpad
Translations), rather than directly through e.g. upstream version control
systems. With a repository-based approach, we would be able to track and
manage changes to a project on a translation unit level, and manage e.g.
translation reuse and terminology within and across projects. We could still
retain a link with upstream repositories (like with Transifex/Damned Lies).
However, this would not be the 'core datamodel', but on a different layer
through plug-ins. This link to external repositories could also go beyond
traditional version control systems, communicating with external sources like
wikis and CMSs.

We have evaluated a number of existing open source L10N frameworks and
systems, but haven't found any (yet) that stands out or satisfies our needs or
requirements as a development platform. Technology-wise, we are aiming to
develop a Java-based(!) system, using technology such as JBoss Seam,
Hibernate, jBPM and RichFaces. A java based platform will enable us to make
best use of internal expertise in these technologies, as well as making use of
technology we are developing (as open source) through collaboration with
partners in the L10N industry.

We hope some of these requirements and ideas will excite some of you, and
ultimately lead to something that can be of use to open source communities.
While we have certain requirements and goals for this internally within the
company, there is no need for this to be an 'internal' Red Hat project, and
most of the requirements and needs overlap with those of community projects
like Fedora. In other words, by developing this in collaboration with the
community from a very early stage, we are more likely to develop something
that may be of use to the greater community.

Thoughts and comments, all sorts of comments, are very welcome.

cheers,
asgeir frimannsson
(Senior Software Engineer, I18N Engineering, Red Hat APAC)

"Sankarshan (সঙ্কর্ষণ)"

unread,
Sep 15, 2008, 8:48:34 AM9/15/08
to transif...@googlegroups.com
Asgeir Frimannsson wrote:

> Some of the immediate needs that could be addressed within the existing
> framework (some of which are on the Transifex roadmap) are:
> - Consolidation of Damned Lies and Transifex, allowing retrieving and
> submitting translations through the same interface
> - Allowing retrieving and submitting multiple-files at once (e.g. for
> translating a publican document with many PO files)
> - Simple workflow on top of Transifex (porting features from Vertimus)
> - Better usability and easier user registration process (Fedora specific)

Which ones are not on Tx roadmap ? And, how are those elements proposed
to be met ?

> Looking at the bigger picture, some of the core requirements we have identified
> for Red Hat and community L10N going forward are:
> - Customizable Translation Workflows and integration with e.g. Content
> Authoring Workflows
> - Infrastructure easily adaptable to support new File formats and project
> types (e.g. OpenOffice formats, CMS formats, DTP formats, Wiki, Dita, Java
> formats), rather than relying on 'upstream' projects to fit a certain L10N
> infrastructure.
> - Managing the life-cycle of a translation project across releases and
> iterations
> - Translation Reuse and Terminology Management across projects and iterations
> - Job management, scoping, tracking and resourcing
> - Managing and/or Tracking upstream translation projects, pushing changes back
> upstream.

Since Tx is gaining traction with other communities as well, is it
prudent to open the net wider and ask about the requirements from such
communities ?

> These requirements require a system where the translation lifecycle would be
> managed within 'Translation Repositories' (similar to e.g. Pootle or Launchpad
> Translations), rather than directly through e.g. upstream version control
> systems. With a repository-based approach, we would be able to track and
> manage changes to a project on a translation unit level, and manage e.g.
> translation reuse and terminology within and across projects. We could still
> retain a link with upstream repositories (like with Transifex/Damned Lies).
> However, this would not be the 'core datamodel', but on a different layer
> through plug-ins. This link to external repositories could also go beyond
> traditional version control systems, communicating with external sources like
> wikis and CMSs.

Does Transifex allow such a set of 'plug-ins' ? If yes, how would one go
about integrating them within the plans of Transifex ? If not, how does
the integration happen ?

> We have evaluated a number of existing open source L10N frameworks and
> systems, but haven't found any (yet) that stands out or satisfies our needs or
> requirements as a development platform. Technology-wise, we are aiming to
> develop a Java-based(!) system, using technology such as JBoss Seam,
> Hibernate, jBPM and RichFaces. A java based platform will enable us to make
> best use of internal expertise in these technologies, as well as making use of
> technology we are developing (as open source) through collaboration with
> partners in the L10N industry.

Can the results of the evaluation be shared ?

Mike McGrath

unread,
Sep 15, 2008, 11:16:06 PM9/15/08
to Fedora Infrastructure, transif...@googlegroups.com

I'd think much of what you're looking to do could be done in transifex
farily easily. I think some of it is already done and being done.

> We have evaluated a number of existing open source L10N frameworks and
> systems, but haven't found any (yet) that stands out or satisfies our needs or
> requirements as a development platform. Technology-wise, we are aiming to
> develop a Java-based(!) system, using technology such as JBoss Seam,
> Hibernate, jBPM and RichFaces. A java based platform will enable us to make
> best use of internal expertise in these technologies, as well as making use of
> technology we are developing (as open source) through collaboration with
> partners in the L10N industry.
>
> We hope some of these requirements and ideas will excite some of you, and
> ultimately lead to something that can be of use to open source communities.
> While we have certain requirements and goals for this internally within the
> company, there is no need for this to be an 'internal' Red Hat project, and
> most of the requirements and needs overlap with those of community projects
> like Fedora. In other words, by developing this in collaboration with the
> community from a very early stage, we are more likely to develop something
> that may be of use to the greater community.
>
> Thoughts and comments, all sorts of comments, are very welcome.
>

Please correct me if I'm reading this wrong but I see "transifex is great
or close to it" and "here's how we're going to build our own solution
anyway" ?

-Mike

Asgeir Frimannsson

unread,
Sep 16, 2008, 2:16:42 AM9/16/08
to Fedora Infrastructure, transif...@googlegroups.com, Fedora Translation Project List
On Mon, Sep 15, 2008 at 10:48 PM, "Sankarshan (সঙ্কর্ষণ)"
<sankarshan....@gmail.com> wrote:

>
> Asgeir Frimannsson wrote:
>
> > Some of the immediate needs that could be addressed within the existing
> > framework (some of which are on the Transifex roadmap) are:
> > - Consolidation of Damned Lies and Transifex, allowing retrieving and
> > submitting translations through the same interface
> > - Allowing retrieving and submitting multiple-files at once (e.g. for
> > translating a publican document with many PO files)
> > - Simple workflow on top of Transifex (porting features from Vertimus)
> > - Better usability and easier user registration process (Fedora specific)
>
> Which ones are not on Tx roadmap ? And, how are those elements proposed
> to be met ?

Background:
http://groups.google.com/group/transifex-
devel/browse_thread/thread/a637310e8ff63555
http://transifex.org/wiki/Development/Roadmap
http://transifex.org/roadmap

Dimitris is a better person to answer this, but I believe we already have
basic statistics support finished in Transifex upstream. Also, I know Stéphane
Raimbault is involved in integrating the concepts found in Vertimus into
Transifex. What the future regarding integration of Damned Lies concepts such
as Teams and Releases is, I am not sure.

> > Looking at the bigger picture, some of the core requirements we have
identified
> > for Red Hat and community L10N going forward are:
> > - Customizable Translation Workflows and integration with e.g. Content
> > Authoring Workflows
> > - Infrastructure easily adaptable to support new File formats and project
> > types (e.g. OpenOffice formats, CMS formats, DTP formats, Wiki, Dita, Java
> > formats), rather than relying on 'upstream' projects to fit a certain L10N
> > infrastructure.
> > - Managing the life-cycle of a translation project across releases and
> > iterations
> > - Translation Reuse and Terminology Management across projects and
iterations
> > - Job management, scoping, tracking and resourcing
> > - Managing and/or Tracking upstream translation projects, pushing changes
back
> > upstream.
>

> Since Tx is gaining traction with other communities as well, is it
> prudent to open the net wider and ask about the requirements from such
> communities ?

Yes. I would however add that this project is not directly linked with Tx at
this point. Dimitris has done a great job in networking with other
communities, and have a plan for Tx that goes way beyond Fedora.

> > These requirements require a system where the translation lifecycle would
be
> > managed within 'Translation Repositories' (similar to e.g. Pootle or
Launchpad
> > Translations), rather than directly through e.g. upstream version control
> > systems. With a repository-based approach, we would be able to track and
> > manage changes to a project on a translation unit level, and manage e.g.
> > translation reuse and terminology within and across projects. We could
still
> > retain a link with upstream repositories (like with Transifex/Damned
Lies).
> > However, this would not be the 'core datamodel', but on a different layer
> > through plug-ins. This link to external repositories could also go beyond
> > traditional version control systems, communicating with external sources
like
> > wikis and CMSs.
>

> Does Transifex allow such a set of 'plug-ins' ? If yes, how would one go
> about integrating them within the plans of Transifex ? If not, how does
> the integration happen ?

The existing Transifex handles very different concepts than what is described
here, and writing this on top of transifex would be hard.

Take for example the 'submission' page, this page is centered around
submitting a file to a repository (or to bugzilla, email as a result of
Christos' SoC work). In a Repository-model, the 'submit' action would be more
about updating the internal state of a project within a repository. To achieve
the same 'workflow' as Transifex, an external plugin could then 'listen' to
these changes and transparently submit changes upstream (even by interacting
with Transifex?). In this sense, the project we are proposing could use the
submission logic of Tx, but handle that in the background. There will be
little reuse of actual code from Transifex (most of which is UI-Model
interaction which is linked with file submission).

It is also important to note that the internal format of the repository will
not be PO, but a much richer format more similar to e.g. XLIFF, that
accommodates features such as change tracking and terminology-annotations
within translation units. PO would still be supported as an input format, as
well as an intermediate format that is sent to translators using existing PO
tools. However, in the long term, we aim to provide translators with richer
tools that can make use of the additional meta-data that is part of the
repository.

Dimitris mentioned on IRC the other day that the concept of a Translation
Repository similar to e.g. Pootle had been briefly discussed, and could be part
of Transifex in the future. This is exciting news, utilizing Translate Toolkit
more is something that could take Tx to the next level quickly. I think
Transifex with its existing Roadmap serves a very useful purpose, and we are
not trying to 'hijack' that project in any way. In fact, I am also pushing
towards putting more resources into transifex development (read between the
lines whatever you want here).

What we are about to develop is a new way of doing localisation repositories
and workflow, more similar to what happens in many commercial tools than what
we see in open source communities. I feel a bit 'uneasy' about pushing that
onto the Tx roadmap at this stage, and also uneasy about developing such a
'workflow system' in Turbogears.

> > We have evaluated a number of existing open source L10N frameworks and
> > systems, but haven't found any (yet) that stands out or satisfies our needs
or
> > requirements as a development platform. Technology-wise, we are aiming to
> > develop a Java-based(!) system, using technology such as JBoss Seam,
> > Hibernate, jBPM and RichFaces. A java based platform will enable us to
make
> > best use of internal expertise in these technologies, as well as making
use of
> > technology we are developing (as open source) through collaboration with
> > partners in the L10N industry.
>

> Can the results of the evaluation be shared ?

So the alternatives would be Pootle (Translate Toolkit), and Transifex
(Tx+DL+Vertimus), pootle clearly being the more mature from a resource-
management perspective. Pootle works with PO, XLIFF and many other formats.
Still, it is very limited in its use of e.g. workflow support and translation
memory management. One of the main architectural limitations of Translate
Toolkit is it's inheritance hierarchy, where all resource-formats (e.g. PO,
Properties, XLIFF, TMX) inherit from a base resource class. A 'pivot' format
(similar to e.g. XLIFF) with converters to and from the native format is what
we're looking for. Nevertheless, Translate Toolkit (and even Damned Lies) has
a lot of knowledge vested in it in how to handle specific project types
(intltool, gnome-doc-utils, firefox, openoffice). This is reusable across
solutions.

Feature-wise, it is much more interesting to compare with e.g. Idiom
WorldServer and Lionbridge Freeway, which are commercial solutions in the L10N
space.

cheers,
asgeir

"Sankarshan (সঙ্কর্ষণ)"

unread,
Sep 16, 2008, 2:32:09 AM9/16/08
to transif...@googlegroups.com
Thanks for the reasonably detailed response.

Asgeir Frimannsson wrote:

> What we are about to develop is a new way of doing localisation repositories
> and workflow, more similar to what happens in many commercial tools than what
> we see in open source communities. I feel a bit 'uneasy' about pushing that
> onto the Tx roadmap at this stage, and also uneasy about developing such a
> 'workflow system' in Turbogears.

Since this mail was on Tx-devel, for a moment I assumed it alluded to
enhancing or, extending the functionality of Tx.

Now that you mention that this is a separate effort, I guess the context
is somewhat clear.

> Feature-wise, it is much more interesting to compare with e.g. Idiom
> WorldServer and Lionbridge Freeway, which are commercial solutions in the L10N
> space.

Yes, but what you write is a summary. The query I had was - is the
result of the evaluation of the current state-of-the-nation projected on
a future-proposed-state-of-the-nation available for read. A detail
evaluation/assessment report that is.

Asgeir Frimannsson

unread,
Sep 16, 2008, 3:15:17 AM9/16/08
to fedora-infras...@redhat.com, Fedora Translation Project List, transif...@googlegroups.com
On Tue, Sep 16, 2008 at 4:32 PM, "Sankarshan (সঙ্কর্ষণ)"
<sankarshan....@gmail.com> wrote:
>
> Thanks for the reasonably detailed response.
>
> Asgeir Frimannsson wrote:
>
>> What we are about to develop is a new way of doing localisation
repositories
>> and workflow, more similar to what happens in many commercial tools than
what
>> we see in open source communities. I feel a bit 'uneasy' about pushing that
>> onto the Tx roadmap at this stage, and also uneasy about developing such a
>> 'workflow system' in Turbogears.
>
> Since this mail was on Tx-devel, for a moment I assumed it alluded to
> enhancing or, extending the functionality of Tx.
>
> Now that you mention that this is a separate effort, I guess the context
> is somewhat clear.
>
>> Feature-wise, it is much more interesting to compare with e.g. Idiom
>> WorldServer and Lionbridge Freeway, which are commercial solutions in the
L10N
>> space.
>
> Yes, but what you write is a summary. The query I had was - is the
> result of the evaluation of the current state-of-the-nation projected on
> a future-proposed-state-of-the-nation available for read. A detail
> evaluation/assessment report that is.

This is knowledge we have acquired internally by members of our team over a
long period of time, rather than a requirements-based systematic evaluation of
these projects. I for one have e.g. been monitoring e.g. how Translate Toolkit
as well as Gnome, KDE, Mozilla and OO.org L10N communities have evolved over
the last 3-4 years. Seeing the increasing 'gap' between state-of-the art in
this area and current practices, it was not a rocket-science decision (now
that we finally have the resources to do something about it). But I fully see
your point, a document or white-paper outlining this would be very useful, and
something I will investigate if we could provide as the plans go further.

cheers,
asgeir

Asgeir Frimannsson

unread,
Sep 16, 2008, 3:30:58 AM9/16/08
to fedora-infras...@redhat.com, transif...@googlegroups.com

Yes, "Transifex is great and will continue to serve us".

BUT:

If you look at the state of the art in L10N outside the typical Linux projects
where PO and Gettext rule, you'll notice we are very short on areas like:
- Translation Reuse
- Terminology Management
- Translation Workflow and Project Management
- Integration with CMSs.
- Richer Translation Tools

This is an effort in narrowing that gap, and I can't see that effort work by
evolving an existing tool from this 'cultural background'. Yes, we can get
some of the way by developing custom solutions for e.g. linking wikis to
Transifex for CMS integration, or using e.g. Pootle for web-based translation.
But we would still be limited to the core architecture of the intent of the
original developers, which is something that would radically slow the project
down.

That said, I am not talking down Transifex, and the fact that someone in the
community has sacrificed a lot and done a great job in getting us this far
within Fedora.

cheers,
asgeir

Mike McGrath

unread,
Sep 16, 2008, 9:29:32 AM9/16/08
to Fedora Infrastructure, transif...@googlegroups.com
On Tue, 16 Sep 2008, Asgeir Frimannsson wrote:
> > >
> > > Thoughts and comments, all sorts of comments, are very welcome.
> >
> > Please correct me if I'm reading this wrong but I see "transifex is great
> > or close to it" and "here's how we're going to build our own solution
> > anyway" ?
>
> Yes, "Transifex is great and will continue to serve us".
>
> BUT:
>
> If you look at the state of the art in L10N outside the typical Linux projects
> where PO and Gettext rule, you'll notice we are very short on areas like:
> - Translation Reuse
> - Terminology Management
> - Translation Workflow and Project Management
> - Integration with CMSs.
> - Richer Translation Tools
>
> This is an effort in narrowing that gap, and I can't see that effort work by
> evolving an existing tool from this 'cultural background'. Yes, we can get
> some of the way by developing custom solutions for e.g. linking wikis to
> Transifex for CMS integration, or using e.g. Pootle for web-based translation.
> But we would still be limited to the core architecture of the intent of the
> original developers, which is something that would radically slow the project
> down.
>
> That said, I am not talking down Transifex, and the fact that someone in the
> community has sacrificed a lot and done a great job in getting us this far
> within Fedora.
>

Correct me if I'm wrong though, instead of forking or adapting or working
with upstream, you are talking about doing your own thing right?

-Mike

Asgeir Frimannsson

unread,
Sep 16, 2008, 7:29:04 PM9/16/08
to fedora-infras...@redhat.com, transif...@googlegroups.com
(forgot to cc tx-devel)

We have a goal of where we want to see L10N infrastructure go, to enable us in
the future to provide internal (translators paid by Red Hat) and community
translators with tools to increase their productivity as well as better tools
to manage the overall L10N process. If there is an 'upstream' that provides
this, or a platform on to which we could develop this, then yes, we would
consider 'working with upstream' or (in a worst-case-scenario) forking
upstream.

So to answer your question bluntly, YES - after 4 years involvement in
industry and community L10N processes - I believe we can do better. But
holding that thought, remember that this is in many ways 'middleware', and
making use of e.g. the vast amount of knowledge invested in Translate Toolkit
(file format conversions, build tools, QA) makes sense, and I'm not saying
'forget about all that we have invested in tools so far'. In addition, we are
pulling in resources from other communities on this (JBoss, people in the L10N
industry and academia), so we're not talking about or attempting to pull
attention away from Transifex. (If so, we'd at least consider doing it in
Python!). Still, we find it very important to develop this in the open and
accept ideas and contributions from the wider community, including people in
the Fedora infrastructure.

Regarding building on top of Transifex, I think it is much better that these
two projects 'compliment' each other in the near future. Someone with
Dimitris' caliber, character, passion and vision is very rare (only flaw I can
see in him is I think he's a Gnome guy, rather than a KDE guy - but I can see
beyond that ;) ), and I honestly think it would be both wrong and counter-
productive to attempt to 'hijack', fork or radically change core direction and
architecture of a project that seems to finally gain traction way beyond the
Fedora-circle.

cheers,
asgeir

Dimitris Glezos

unread,
Sep 20, 2008, 6:11:42 PM9/20/08
to Fedora Infrastructure, Transifex devel list
2008/9/17 Asgeir Frimannsson <asg...@redhat.com>:

> On Tuesday 16 September 2008 23:29:32 Mike McGrath wrote:
>> > >
>> > > Please correct me if I'm reading this wrong but I see "transifex is
>> > > great or close to it" and "here's how we're going to build our own
>> > > solution anyway" ?
>> >
>> > Yes, "Transifex is great and will continue to serve us".
>> >
>> > BUT:
>> >
>> > If you look at the state of the art in L10N outside the typical Linux
>> > projects where PO and Gettext rule, you'll notice we are very short on
>> > areas like: - Translation Reuse
>> > - Terminology Management
>> > - Translation Workflow and Project Management
>> > - Integration with CMSs.
>> > - Richer Translation Tools
>> >
>> > This is an effort in narrowing that gap, and I can't see that effort work
>> > by evolving an existing tool from this 'cultural background'. Yes, we can
>> > get some of the way by developing custom solutions for e.g. linking wikis
>> > to Transifex for CMS integration, or using e.g. Pootle for web-based
>> > translation. But we would still be limited to the core architecture of
>> > the intent of the original developers, which is something that would
>> > radically slow the project down.

For the record, I believe these are some fine ideas, which I would
like to see added to Transifex as features (eg. through plugins). I
have been discussing most of them with people around conferences for
the past year. An example: Tx already downloaded all the translation
files from upstream projects, so if someone requests a translation
file, why not be able to pre-populate it using existing translations
from all the other projects (translation reuse)?

Also, I should mention that Transifex isn't (and will never be)
specific to a particular translation file format (eg. PO) or any
translation repository. I'd like to support translation of both PO and
XLIFF files. And also support not only VCSs, but CMSs, wiki pages and
even arbitrary chunks of text. Transifex's goal is to be a platform to
help you manage your translations.

>> Correct me if I'm wrong though, instead of forking or adapting or working
>> with upstream, you are talking about doing your own thing right?
>

> We have a goal of where we want to see L10N infrastructure go, to enable us in
> the future to provide internal (translators paid by Red Hat) and community
> translators with tools to increase their productivity as well as better tools
> to manage the overall L10N process. If there is an 'upstream' that provides
> this, or a platform on to which we could develop this, then yes, we would
> consider 'working with upstream' or (in a worst-case-scenario) forking
> upstream.

The Translate Toolkit folks are a very friendly bunch, actively
maintaining and extending the rich library, and always open to
suggestions. Maybe some (if not all) of the features could be done in
TT, and the rest that might not fit there, as Python libraries to
maximize interoperability and community involvement.

I also think that Transifex could serve as the "UI" for a lot of
translation-specific tasks. If there's a library that does X, that
would help people manage their translations or leverage Transifex's
strong points of "I read a lot of repositories" and "I write to some
repositories", then we could provide a web wrapper around it. (eg.
search for string "X" in all translation files of language "Y", or
"mark <this> file as a downstream of <that> and send me an msgmerged
file whenever <that> changes".

> So to answer your question bluntly, YES - after 4 years involvement in
> industry and community L10N processes - I believe we can do better. But
> holding that thought, remember that this is in many ways 'middleware', and
> making use of e.g. the vast amount of knowledge invested in Translate Toolkit
> (file format conversions, build tools, QA) makes sense, and I'm not saying
> 'forget about all that we have invested in tools so far'.

It might be my poor English or the fact that I usually read long mails
at night, but despite the lengthy descriptions I still don't have a
clear picture of exactly what problem you'd like to solve, and the
reasoning behind the decisions being made.

Don't take me wrong -- I think there are some good ideas. But I feel
it would be too bad if you guys didn't invest on top of existing tools
(TT for file formats, Transifex for file operations and UI, OmegaT for
translation memory) or just isolate specific solutionsthat don't fit
into other projects in well-defined libraries (do one thing, to it
right). Sure, it takes a lot more effort to work *with* other people,
but it is usually worth it. :-)

-d

--
Dimitris Glezos
Jabber ID: gle...@jabber.org, GPG: 0xA5A04C3B
http://dimitris.glezos.com/

"He who gives up functionality for ease of use
loses both and deserves neither." (Anonymous)
--

Asgeir Frimannsson

unread,
Sep 21, 2008, 9:16:20 PM9/21/08
to Fedora Infrastructure, Transifex devel list
Hi Dimitris,

Thanks for your comments.

For the record (since XLIFF is mentioned and since I'm part of the Oasis XLIFF Technical Committee), I am not aiming to design anything around XLIFF in this project, other than perhaps support XLIFF is an import/export format for resources in the same way as we support PO (we do have the odd XLIFF file coming through for translation). I don't think XLIFF (1.2) is mature enough yet as a L10N resource format.

I know there are some big ideas in transifex. In fact, when transifex is mentioned, often people refer to the *goal/idea* of transifex, rather the actual current implementation. Take for example plugins, transifex doesn't currently have a plugin system, neither does it have workflow, project management, or any concept of translation resources internally. Transifex today is a simple 'file submission system' with a growing community aiming to build it into something more. With this in mind, 'building on top of transifex' really means redefining what transifex really is. For example, 'file submission' should really be a plugin, not a core feature. That means all of transifex today (excluding maybe the login UI), should really be plugins to a core model of projects, people, etc, that currently doesn't exist.

Defining this 'model' of a repository doesn't really depend much on the implementation, and in fact many implementations might help push this faster and ensure a better solution (if it was on the tx roadmap in the first place). And it's not like it is impossible for e.g. a java based repository to communicate with Transifex for file submissions, isn't that exactly what the remote-interface of TX (on the roadmap) is supposed to provide? What I'm hearing is "Don't build something new, continue building on the python/tg/transifex architecture", which is fully understandable. However, considering the cost of developing this on top of tx (re-architecture, convincing all that it is the right path to go, immaturity/stability of libraries for e.g. ajax, limited workflow support), I honestly think it's better with two projects that 'compliment' each other. There are more than enough tasks for everyone in the existing Tx roadmap, and the idea is bigger than what a combined development team could accomplish. Diversifying and pulling in good people from e.g. the java-side of things might even help speed things up.

> >> Correct me if I'm wrong though, instead of forking or adapting or
> working
> >> with upstream, you are talking about doing your own thing right?
> >
> > We have a goal of where we want to see L10N infrastructure go, to
> enable us in
> > the future to provide internal (translators paid by Red Hat) and
> community
> > translators with tools to increase their productivity as well as
> better tools
> > to manage the overall L10N process. If there is an 'upstream' that
> provides
> > this, or a platform on to which we could develop this, then yes, we
> would
> > consider 'working with upstream' or (in a worst-case-scenario)
> forking
> > upstream.
>
> The Translate Toolkit folks are a very friendly bunch, actively
> maintaining and extending the rich library, and always open to
> suggestions. Maybe some (if not all) of the features could be done in
> TT, and the rest that might not fit there, as Python libraries to
> maximize interoperability and community involvement.

Yes, I know TT very well, and have discussed the library with Dwayne Bailey (the main visionary behind the project) in the past, even before tx was born. In fact, a django-migration of Pootle (built on top of the TT) has been on the agenda for a while, and combining forces with TT is one of the other options I have been strongly considering for a repository (TT e.g. has a file submission library, and there is a lot of duplication between tt and tx). Looking at the svn activity of TT (in my rss reader), it is definetly a project with a 'dangerous' future.

> I also think that Transifex could serve as the "UI" for a lot of
> translation-specific tasks. If there's a library that does X, that
> would help people manage their translations or leverage Transifex's
> strong points of "I read a lot of repositories" and "I write to some
> repositories", then we could provide a web wrapper around it. (eg.
> search for string "X" in all translation files of language "Y", or
> "mark <this> file as a downstream of <that> and send me an msgmerged
> file whenever <that> changes".
>
> > So to answer your question bluntly, YES - after 4 years involvement
> in
> > industry and community L10N processes - I believe we can do better.
> But
> > holding that thought, remember that this is in many ways
> 'middleware', and
> > making use of e.g. the vast amount of knowledge invested in
> Translate Toolkit
> > (file format conversions, build tools, QA) makes sense, and I'm not
> saying
> > 'forget about all that we have invested in tools so far'.
>
> It might be my poor English or the fact that I usually read long
> mails
> at night, but despite the lengthy descriptions I still don't have a
> clear picture of exactly what problem you'd like to solve, and the
> reasoning behind the decisions being made.

I do understand there is a 'semantic gap' here, and that we do need to provide a better description and demonstration of why a new project is necessary. I do believe everything is theoretically possible to build on top of python/tg and through reuse of concepts in e.g. tx and TT, but I honestly believe if we are going to manage and drive the development effort in this, it is more worthwhile to expand beyond the fedora/python community, and use tools that the core developers would be more comfortable and productive with. This is not a 'we think you guys should develop this' request, we are taking ownership of the project, as well as inviting anyone that is interested in the community to participate and take ownership.



> Don't take me wrong -- I think there are some good ideas. But I feel
> it would be too bad if you guys didn't invest on top of existing
> tools
> (TT for file formats, Transifex for file operations and UI, OmegaT
> for
> translation memory) or just isolate specific solutionsthat don't fit
> into other projects in well-defined libraries (do one thing, to it
> right). Sure, it takes a lot more effort to work *with* other people,
> but it is usually worth it. :-)

This is *not* about an effort to avoid working with people. It is an effort to get more people working on this. I know more people in the Java community that is or might be interested in a open source solution for these problems than in the Python/Fedora/TG community. And of course adding to this a portion of my natural bias towards Java, and the fact that the people that would be working on this would initially be much more productive in Java than in Python (TG2 or django).

With the fact that we throw this idea out to the fedora/tx community early, please take that as a sign that we are trying to work with the community, rather than simply developing something on our own. And I for one will continue being involved with Tx to some degree, and help out where I can. L10N is an area with a lot of space for improvement, and an area that has sadly been to some extent 'neglected' except for Dimitris' recent work. We still have a long way to go before we have what I would call a L10N infrastructure that serves translators well.

cheers,
asgeir

"Sankarshan (সঙ্কর্ষণ)"

unread,
Oct 15, 2008, 5:10:59 AM10/15/08
to transif...@googlegroups.com
Asgeir Frimannsson wrote:

> This is knowledge we have acquired internally by members of our team over a
> long period of time, rather than a requirements-based systematic evaluation of
> these projects. I for one have e.g. been monitoring e.g. how Translate Toolkit
> as well as Gnome, KDE, Mozilla and OO.org L10N communities have evolved over
> the last 3-4 years.


Would you be interested in FOSS.IN (http://foss.in) and perhaps may be
lead a WorkOut related to this ?

~sankarshan

ps: The submission window for the proposal would close on 16th Oct
(midnight) India time.

Asgeir Frimannsson

unread,
Oct 15, 2008, 7:40:16 AM10/15/08
to transif...@googlegroups.com
Hi Sankarshan (সঙ্কর্ষণ),

On Wed, Oct 15, 2008 at 7:10 PM, "Sankarshan (সঙ্কর্ষণ)"
<sankarshan....@gmail.com> wrote:
>
> Asgeir Frimannsson wrote:
>
>> This is knowledge we have acquired internally by members of our team over a
>> long period of time, rather than a requirements-based systematic evaluation of
>> these projects. I for one have e.g. been monitoring e.g. how Translate Toolkit
>> as well as Gnome, KDE, Mozilla and OO.org L10N communities have evolved over
>> the last 3-4 years.
>
>
> Would you be interested in FOSS.IN (http://foss.in) and perhaps may be
> lead a WorkOut related to this ?

I would have loved to be there. However, I will be in New Zealand in
that period (mid Nov - mid Dec) spending time far far away from
anything technology-related, so it won't work for me this year
unfortunately. I will attempt to push more information out before that
time though, in case that may help in the organization of the workout.

cheers,
asgeir

ob.s...@gmail.com

unread,
Nov 7, 2008, 3:54:41 PM11/7/08
to transifex-devel


News about this thread
Reply all
Reply to author
Forward
0 new messages