Gettext and ITS (was Re: Feature proposal: string extracting by RegExp for xgettext)

Asgeir Frimannsson

unread,

Mar 14, 2008, 8:13:29 AM3/14/08

to Bruno Haible, bug-gnu...@gnu.org

Hi Bruno,

On Fri, Mar 14, 2008 at 9:41 PM, Bruno Haible <br...@clisp.org> wrote:

> Hello Asgeir,
>
> > For example, for Glade XML files, the following ITS descriptor [2] can
> be
> > applied to extract/merge translatable features:
> >
> > <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0">
> > 
> > <its:translateRule selector="/glade-interface" translate="no"/>
> > <its:translateRule selector="//*[@translatable='yes']"
> translate="yes"/>
> > <its:translateRule selector="//atkaction/@description"
> translate="yes"/>
> > <its:locNoteRule selector="//*[@translatable='yes']"
> > locNoteType="description" locNotePointer="@comments"/>
> > </its:rules>
>
> Thank you for posting this example! I had looked at the ITS specification,
> but not understood what it was really about and how it was meant to be
> used.
>

Note that this Glade example is an actual example used in the 'best
practices' document, not something I came up with :)

So if I understand it right, tools for extracting translatable strings and
> for merging back translated strings into XML documents could use this
> W3C ITS specification?

Yes, exactly. That is, for merging back you probably don't need it... But
imagine this combined with xgettext, e.g. for extracting stuff from odf
through xhtml and glade,ts... the absolute path for a translation unit could
be stored in the #: reference elem, for example
"/html/body/p[34]/table[3]/p" and be used as a locator when merging...
something like "xgettext --its=myconfig.its mydoc.xml".

>
> There is no free implementation of it right now?
>

There are a couple: http://www.w3.org/International/its/links.html

Rainbow (mono/.net) is LGPL, so is Spritser.

>
> An implementation of it would have to rely on XPath. For example, use
> libxml2.
> Right?

Yeah, the spec relies heavily on xpath expressions, libxml2 is excellent for
this.. It should be able to do a 'streaming' implementation, and just rely
on xpath for evaluating if the given node is translatable/inline/comment
etc, and not rely on loading the whole document into memory.

cheers,
asgeir

Asgeir Frimannsson

unread,

Mar 14, 2008, 8:42:19 AM3/14/08

to Bruno Haible, bug-gnu...@gnu.org

One limitation with a PO-based implementation is of course the
handling of inline elements.

For example:

Specify non-translatable elements: <its:translateRule translate="no"
selector="//d:email|//d:uri"/>
Specify inline elements: <its:withinTextRule withinText="yes"
selector="//d:email|//d:uri"

Say you have the xml fragment:
<para>Please email us at <email>in...@example.com</email>, or visit our
website at <uri>http://www.example.com</uri>.</para>

Here, everything within para would become a msgid, however, we have no
way of blocking translators from modifying the non-translatable email
or uri elements... This could however be put in automatic comments by
the extraction tool, and even be checked by msgfmt if we have the its
configuration available...

A possible PO representation:

#: //section/para[34]
#. do not translate content within the <email> element
#. do not translate content within the <uri> element
#, xml-format
msgid "Please email us at <email>in...@example.com</email>, or visit
our website at <uri>http://www.example.com</uri>."
msgstr ""

cheers,
asgeir

Chusslove Illich

unread,

Mar 14, 2008, 9:50:43 AM3/14/08

to bug-gn...@gnu.org

> [: Asgeir Frimannsson :]
> [...] the absolute path for a translation unit could be stored in the #:

> reference elem, for example "/html/body/p[34]/table[3]/p" and be used as a
> locator when merging...

Notwithstanding the main line of the discussion, which I know little of to
add anything, this particular bit I do not like. The source reference should
be a source reference; a link to a particular file and line should the
translator wish to venture there for more context.

Instead, I'd put the document-tree path as another automatic comment (#.),
with a certain prefix to indicate it as such.

--
Chusslove Illich (Часлав Илић)

Asgeir Frimannsson

unread,

Mar 14, 2008, 4:56:42 PM3/14/08

to Chusslove Illich, bug-gn...@gnu.org

Well, yes, the link to the source *file* should be there somewhere.
But:, with XML, the absolute path to an element is much more precise
than a line-number, and transferable. Imagine e.g. an XML file with
all content on one long line.

Both is of course ideal. I've been doing XML processing before where
we needed the line number and byte offset/length for the element, and
it's a very tricky business to combine with the standard xml
processing tools. But I'd be very happy to be proven wrong here :)

cheers,
asgeir