Advanced internationalization

Christopher Lenz

unread,

Jun 27, 2007, 8:33:38 AM6/27/07

to gen...@googlegroups.com, python...@googlegroups.com

Hey all,

Genshi now has basic support for internationalization [2], which in
combination with Babel [1] works rather nicely AFAICT.

However there's one problem that isn't addressed yet, namely that of
messages that may contain tags. This is a complicated issue,
compounded by Genshi's striving to do correct escaping of strings in
templates. That means you can't just have messages like the following:

msgid "Here's a <a href='#foobar'>link</a>."

The <a> tag would be escaped, and I think that's the right thing to
do, because translations may very well contain things that *do* need
to be escaped, and the translators shouldn't have to worry about
escaping -- they may not even know what escaping is.

So we need a proper solution for this issue. I've outlined a possible
approach in:

<http://genshi.edgewall.org/ticket/129#comment:2>

To summarize, I propose adding an i18n namespace, which would be
processed exclusively by the Translator filter. That namespace
provides tags to define exactly how a message is composed from mixed
content. Please see the ticket linked above for details.

I'd love to hear your thoughts on this, and maybe alternative proposals.

[1] http://genshi.edgewall.org/wiki/Documentation/i18n.html
[2] http://babel.edgewall.org/

Thanks,
Chris
--
Christopher Lenz
cmlenz at gmx.de
http://www.cmlenz.net/

Christian Boos

unread,

Jun 27, 2007, 11:45:14 AM6/27/07

to gen...@googlegroups.com

Christopher Lenz wrote:
> Hey all,
>
> Genshi now has basic support for internationalization [2], which in
> combination with Babel [1] works rather nicely AFAICT.
>
> However there's one problem that isn't addressed yet, namely that of
> messages that may contain tags. This is a complicated issue,
> compounded by Genshi's striving to do correct escaping of strings in
> templates. That means you can't just have messages like the following:
>
> msgid "Here's a <a href='#foobar'>link</a>."
>
> The <a> tag would be escaped, and I think that's the right thing to
> do, because translations may very well contain things that *do* need
> to be escaped, and the translators shouldn't have to worry about
> escaping -- they may not even know what escaping is.
>

There's a closely related issue which is how will we deal with similar
messages built from within the Python code using the genshi.builder.

Example from Trac:

tag.p("You can ",
tag.a("search", href=req.href.log(path, rev=rev, mode='path_history')),
" in the repository history to see if that path existed but"
" was later removed")

There are actually 2 distinct problems here:
1. how to collect the msgid from the Python source?
2. how to compose the msgid in a non fragmented way?

> So we need a proper solution for this issue. I've outlined a possible
> approach in:
>
> <http://genshi.edgewall.org/ticket/129#comment:2>
>
> To summarize, I propose adding an i18n namespace, which would be
> processed exclusively by the Translator filter. That namespace
> provides tags to define exactly how a message is composed from mixed
> content. Please see the ticket linked above for details.
>
> I'd love to hear your thoughts on this, and maybe alternative proposals.
>

This approach looks very promising and could perhaps be extended to the
genshi.builder situation.

In particular, for point 2. we could imagine using a few helper
functions that would inject the appropriate attribute from the i18n
namespace into the Element argument.

The above example becomes:

i18n_message(tag.p("You can ",
i18n_tag('search', tag.a("search", href=req.href.log(path, rev=rev,
mode='path_history'))),
" in the repository history to see if that path existed but"
" was later removed"))

i18n_message would also build the msgid by including the plain text from
static strings (dynamic strings should be wrapped in i18_param() calls)
and return the translation.

_But_ there's still the problematic point 1, and I'm not sure how the
current extract_python() could be extended to handle that... One idea
could be to track nested calls and have the possibility to register
callbacks for each keyword, so the callback for i18_message could
rebuild the tag expression. Well, this looks tedious, so I hope there's
a simpler way.

-- Christian

Christopher Lenz

unread,

Jun 27, 2007, 12:10:39 PM6/27/07

to gen...@googlegroups.com

You're absolutely right, that's a problem the proposal doesn't
address, and I also don't have a good idea so far how to solve it :-/

Well, one approach would be to move more of that kind of stuff into
actual templates, but of course that's not always appropriate. On the
other hand, Trac *does* too often put markup into exception messages,
I think.

Cheers,

Shun-ichi GOTO

unread,

Jun 27, 2007, 12:13:06 PM6/27/07

to gen...@googlegroups.com

Hi,

I'm trying to translate trac i18n branch to Japanese.
But not yet familar with genshi and babel.

I've read the proposal and having some questions.
(These are not Japanese specific issue)

2007/6/27, Christopher Lenz <cml...@gmx.de>:

> However there's one problem that isn't addressed yet, namely that of
> messages that may contain tags. This is a complicated issue,
> compounded by Genshi's striving to do correct escaping of strings in
> templates. That means you can't just have messages like the following:
>
> msgid "Here's a <a href='#foobar'>link</a>."
>
> The <a> tag would be escaped, and I think that's the right thing to
> do, because translations may very well contain things that *do* need
> to be escaped, and the translators shouldn't have to worry about
> escaping -- they may not even know what escaping is.

1. Translating attribute values
-------------------------------

Eacaping may good for text of content, but we should translate
button text also. But the proposal does not mention about attribute
text. I think the proposal is expecting explicit directive to be
extracted. Is it extracted without directive automaticaly?

I think we may need one more i18n:xxx attribute to specify attribute
names to be extracted.
For example (with Japanese):

=>

msgid="Reply"
msgstr="返信"

msgid="Reply to comment ${change.cnum}"
msgid="${change.cnum}へのコメント"

2. How to deal parameter in attribute?
--------------------------------------

In example above, i18n:param cannot be used for attribute value.
How about using parameter name as-is in msgid/msgstr?

3. i18n:tag might be required feature
-------------------------------------

I think i18n:tag should be REQUIRED (at least when having multiple tags
in msgstr) because the changing order of tags is always happen.
And nested tags may be separated in translated text, and vice versa.

How about giving auto index number? (no need to give i18n:tag)
It always appeared in msgid and it can be used in msgstr.
ex:
msgid="Please see [1:Help] for [2:details]."
msgstr="[2:Details] finden Sie unter [1:Hilfe]."

--
Shun-ichi GOTO

Christopher Lenz

unread,

Jun 27, 2007, 12:29:28 PM6/27/07

to gen...@googlegroups.com

Am 27.06.2007 um 18:13 schrieb Shun-ichi GOTO:
> Hi,
>
> I'm trying to translate trac i18n branch to Japanese.
> But not yet familar with genshi and babel.
>
> I've read the proposal and having some questions.
> (These are not Japanese specific issue)
>
> 2007/6/27, Christopher Lenz <cml...@gmx.de>:
>> However there's one problem that isn't addressed yet, namely that of
>> messages that may contain tags. This is a complicated issue,
>> compounded by Genshi's striving to do correct escaping of strings in
>> templates. That means you can't just have messages like the
>> following:
>>
>> msgid "Here's a <a href='#foobar'>link</a>."
>>
>> The <a> tag would be escaped, and I think that's the right thing to
>> do, because translations may very well contain things that *do* need
>> to be escaped, and the translators shouldn't have to worry about
>> escaping -- they may not even know what escaping is.
>
> 1. Translating attribute values
> -------------------------------
>
> Eacaping may good for text of content, but we should translate
> button text also. But the proposal does not mention about attribute
> text. I think the proposal is expecting explicit directive to be
> extracted. Is it extracted without directive automaticaly?

In general, there are a couple of attribute values that are extracted
by default, such as "title" and "alt". Actually, these should only be
extracted/translated automatically if they contain literal strings,
but I'll have to check (and probably fix) the code in that respect.

> I think we may need one more i18n:xxx attribute to specify attribute
> names to be extracted.
> For example (with Japanese):
>
> <input type="submit" value="Reply" title="Reply to comment $
> {change.cnum}"
> i18n:attributes="title value" i/>
>
> =>

In this case what you really should do is use gettext explicitly:

I don't see the need to add anything in the proposed i18n namespace
to handle this situation.

> 2. How to deal parameter in attribute?
> --------------------------------------
>
> In example above, i18n:param cannot be used for attribute value.
> How about using parameter name as-is in msgid/msgstr?

I'm not sure I understand this one. Does the above answer it maybe?

> 3. i18n:tag might be required feature
> -------------------------------------
>
> I think i18n:tag should be REQUIRED (at least when having multiple
> tags
> in msgstr) because the changing order of tags is always happen.

You mean when the original string in the template is updated?

> And nested tags may be separated in translated text, and vice versa.

Hm, really? Do you have an example for that? Translations changing
the order I can understand, but the nesting?

> How about giving auto index number? (no need to give i18n:tag)
> It always appeared in msgid and it can be used in msgstr.
> ex:
> msgid="Please see [1:Help] for [2:details]."
> msgstr="[2:Details] finden Sie unter [1:Hilfe]."

Yeah, that's actually more convenient and consistent. If we do it
this way, we actually won't need i18n:tag at all, AFAICT.

Shun-ichi GOTO

unread,

Jun 27, 2007, 2:18:14 PM6/27/07

to gen...@googlegroups.com

2007/6/28, Christopher Lenz <cml...@gmx.de>:

>
> > 1. Translating attribute values
> > -------------------------------
> >
> > Eacaping may good for text of content, but we should translate
> > button text also. But the proposal does not mention about attribute
> > text. I think the proposal is expecting explicit directive to be
> > extracted. Is it extracted without directive automaticaly?
>
> In general, there are a couple of attribute values that are extracted
> by default, such as "title" and "alt". Actually, these should only be
> extracted/translated automatically if they contain literal strings,
> but I'll have to check (and probably fix) the code in that respect.

OK. It's helpful.

> > I think we may need one more i18n:xxx attribute to specify attribute
> > names to be extracted.
> > For example (with Japanese):
> >
> > <input type="submit" value="Reply" title="Reply to comment $
> > {change.cnum}"
> > i18n:attributes="title value" i/>
> >
> > =>
>
> In this case what you really should do is use gettext explicitly:
>
> <input type="submit" value="${_('Reply')}"
> title="${_('Reply to comment %(num)s') % {'num':
> change.cnum}}" />
>
> I don't see the need to add anything in the proposed i18n namespace
> to handle this situation.

OK, I see.

> > 2. How to deal parameter in attribute?
> > --------------------------------------
> >
> > In example above, i18n:param cannot be used for attribute value.
> > How about using parameter name as-is in msgid/msgstr?
>
> I'm not sure I understand this one. Does the above answer it maybe?

Yes, it's enough.

> > 3. i18n:tag might be required feature
> > -------------------------------------
> >
> > I think i18n:tag should be REQUIRED (at least when having multiple
> > tags
> > in msgstr) because the changing order of tags is always happen.
>
> You mean when the original string in the template is updated?
>
> > And nested tags may be separated in translated text, and vice versa.
>
> Hm, really? Do you have an example for that? Translations changing
> the order I can understand, but the nesting?

As a simplest example, the sentence S+V+O in English will be
translated as S+O+V in Japanese in generally.

So, as an example:
S <a href="xxx">V</a> O
would be translated into
S O <a href="xxx">V</a>
or
S O <a href="xxx">V</a>

Of course the translator can make effort to keep original structure of
nesting, but it is not always a good sentence in his language. To be
better translation, the translator might want to change the structure,
I guess.

--
Shun-ichi GOTO

Christian Boos

unread,

Jun 28, 2007, 2:51:01 AM6/28/07

to gen...@googlegroups.com

Shun-ichi GOTO wrote:
> ...

> As a simplest example, the sentence S+V+O in English will be
> translated as S+O+V in Japanese in generally.
>
> So, as an example:
> S <a href="xxx">V</a> O
> would be translated into
> S O <a href="xxx">V</a>
> or
> S O <a href="xxx">V</a>
>
> Of course the translator can make effort to keep original structure of
> nesting, but it is not always a good sentence in his language. To be
> better translation, the translator might want to change the structure,
> I guess.
>

What about the following?

''S [xxx V]'' O

translated to:

''S'' O [xxx V]
or
''S'' O ''[xxx V]''

Oh I forgot, we're not talking /only/ about Trac ;-)

-- Christian

Andreas Reuleaux

unread,

Jul 3, 2007, 6:20:11 PM7/3/07

to gen...@googlegroups.com

A lot has happened on the i18n front in the last days: babel and now
ticket #129 (i18n namespace) - I am glad to see this fast progress,
especially since I had proposed an i18n namespace (like in Zope 3) on
this list earlier, so here are some comments.

Some of these ideas are just borrowd from Zope 3, which uses an i18n
namespace already - I found the best description of i18n Zope 3 in
Philipps book, 2nd edition, chapter 9 by the way:
http://worldcookery.com/ (I am aware that it is not a particularly
cheap book).

* I am always for short descriptive names: why not
just i18n:msg="" instead of i18n:message=""
- you are using msg for message in your examples anyway:
in msgid, msgstr and I guess this is one of things one
has to type rather often.

* i18n:message is roughly Zopes i18n:translate, however
in Zope the attribute can be used to denote a custom msgid. I think
this is a good idea, since sometimes one wants the same string to be
translated differently in different circumstances - this has happend
to me before and Philipp gives the example of the word "view"
meaning differnt things in various situations: the noun view, the
verb view, a view permission, a view button, a view tab etc. - this
could be handled:
view
-> msgid: view-permission
-> msgstr, en: view
-> msgstr, de: Betrachten-Recht
etc.
view
-> msgid: view-button
-> msgstr, en: view
-> msgstr, de: Ansehen
etc.
The rule is: if the i18n:message attribute is empty (="")
then the string itself is used as a msgid - this is what
you were proposing, example:
Please see...
-> msgid: Please see...
Otherwise the string is used as the msgid, example
whatever...
-> msgid: someid
The onliest place were you used the message attribute
was in the singular/plural example (6. Compound pluralizable messages
including a tag),

...(i18n:param="num" used inside)
...(i18n:param="num" used inside)

Not sure why this is needed here, couldn't this just be
written as (empty i18:message)?:

...(i18n:param="num" used inside)
...(i18n:param="num" used inside)

* params/tags are obviously needed, if only because
the order of words in a sentence is different in different languages,
Philipps example:
"It takes x minutes to cook"
"Es werden x Minuten zum Kochen benötigt"
"Necesita x minutos..."
param/tag: x

Zope comes only with i18:name while you are using i18n:tag and
i18n:name - Are both really necessary? - The difference seems to
be that i18n:param is a numerical value upon which a singular/plural
decision can be made, while i18n:tag is just an id, is that right?

Just to complete the comparison with Zope:

* Zope make heavy use of translation domains:
<html i18n:domain="myapp">...
and all translation lookups are made in terms of this domain
- don't know if this is really needed (your examples seem to work
fine without): in practice I find myself always to stick to
just the single domain of my application

* There is also an internationalized version of py:attrs (genshi)
in Zope (actually it is called attributes there): i18n:attrs -
- haven't used this yet, but an example I can think of
(taking your first example: Compound messages including a tag)


Please see <a href="help.html">Help</a> for details.


Say you want to give differnt links in different languages
like a german help page: href="help-de.html", an english
one href="help-en.html" - then one could write

Please see <a i18n:attrs="helplink">Help</a> for details.

That's at least how I understood i18n:attrs - as mentioned
before, I haven't used it yet

Just some food to think about - I am aware that some of these
ideas are rather vague or even questions but I hope they are helful anyway.

-Andreas

> !DSPAM:4682926927111655010704!

Christopher Lenz

unread,

Jul 4, 2007, 5:27:31 AM7/4/07

to gen...@googlegroups.com

Hi Andreas,

Am 04.07.2007 um 00:20 schrieb Andreas Reuleaux:
> A lot has happened on the i18n front in the last days: babel and now
> ticket #129 (i18n namespace) - I am glad to see this fast progress,
> especially since I had proposed an i18n namespace (like in Zope 3) on
> this list earlier, so here are some comments.
>
> Some of these ideas are just borrowd from Zope 3, which uses an i18n
> namespace already - I found the best description of i18n Zope 3 in
> Philipps book, 2nd edition, chapter 9 by the way:
> http://worldcookery.com/ (I am aware that it is not a particularly
> cheap book).
>
> * I am always for short descriptive names: why not
> just i18n:msg="" instead of i18n:message=""
> - you are using msg for message in your examples anyway:
> in msgid, msgstr and I guess this is one of things one
> has to type rather often.

Yeah, I agree.

While I understand the problem this tries to address, I don't think
it's the right approach. Ideally, the msgid should be usable as-is as
a fallback string (or simply the default language version).

gettext actually provides a cleaner approach, "message contexts":

<http://www.gnu.org/software/gettext/manual/gettext.html#Contexts>

Unfortunately, the pgettext() family of functions is not supported by
the Python gettext module. We discussed this just yesterday on the
#python-babel IRC channel. I think Babel could provide an extended
gettext module that could be swapped in by apps, and we'd provide
patches for this support to go into a future Python version.

Anyway, I think using msgctxt is the way to go in the long term,
instead of trying to encode the context inside the msgid itself.

> The onliest place were you used the message attribute
> was in the singular/plural example (6. Compound pluralizable
> messages
> including a tag),
> 
> ...(i18n:param="num" used inside)
> ...(i18n:param="num" used inside)
> 
> Not sure why this is needed here, couldn't this just be
> written as (empty i18:message)?:
> 
> ...(i18n:param="num" used inside)
> ...(i18n:param="num" used inside)
>

Well, there needs to be a way to specify which number the singular/
plural selection should be based on. i18n:param is more generic (see
below), and you could easily have more than one parameter in a
pluralizable message.

Stuffing the variable reference in the i18n:msg attribute value is
clumsy and not intuitive, though.

> * params/tags are obviously needed, if only because
> the order of words in a sentence is different in different
> languages,
> Philipps example:
> "It takes x minutes to cook"

> "Es werden x Minuten zum Kochen ben�tigt"

> "Necesita x minutos..."
> param/tag: x
>
> Zope comes only with i18:name while you are using i18n:tag and
> i18n:name - Are both really necessary? - The difference seems to
> be that i18n:param is a numerical value upon which a singular/plural
> decision can be made, while i18n:tag is just an id, is that right?

(hmm, I don't think I proposed i18n:name, I suspect that's a typo)

I've actually dropped i18n:tag from the updated proposal; nested tags
always get a numeric identifier, which requires less typing and works
just as well.

And i18n:param is not limited to pluralization, it's more general. It
basically tells the framework which part of a message is a parameter
that gets substituted into the translation. For example:

Today is ${format.date("EEEE")}.

This gets translated to the following in the catalog:

msgid "Today is [1:%(weekday)s]."

Does that clarify the proposal?

> Just to complete the comparison with Zope:
>
> * Zope make heavy use of translation domains:
> <html i18n:domain="myapp">...
> and all translation lookups are made in terms of this domain
> - don't know if this is really needed (your examples seem to work
> fine without): in practice I find myself always to stick to
> just the single domain of my application

Same here. I understand how using multiple domains may be nice, but
for now I'm not really thinking about supporting them explicitly.

Also, the I18n in Genshi makes this a bit challenging, because you
can have implicit/automatic messages (normal text in tags and
attributes), explicit gettext() function calls in expressions and
code blocks, as well as the namespace directives this proposal would
add. In Zope, IIUC, you have only the i18n namespace stuff.

> * There is also an internationalized version of py:attrs (genshi)
> in Zope (actually it is called attributes there): i18n:attrs -
> - haven't used this yet, but an example I can think of
> (taking your first example: Compound messages including a tag)
>
> 
> Please see <a href="help.html">Help</a> for details.
> 
>
> Say you want to give differnt links in different languages
> like a german help page: href="help-de.html", an english
> one href="help-en.html" - then one could write
>
> 
> Please see <a i18n:attrs="helplink">Help</a> for details.
> 
>
> That's at least how I understood i18n:attrs - as mentioned
> before, I haven't used it yet

Actually, as far as I understand, i18n:attributes is simply a list of
attributes that specifies which attribute values need localization.
Genshi provides two ways to do that already:

* simply use gettext calls in expressions in the attribute value
* include the attribute in the set of attributes that should be
localized in general (alt, title, etc, are already in that set by
default)

> Just some food to think about - I am aware that some of these
> ideas are rather vague or even questions but I hope they are helful
> anyway.

Yeah, thanks for the feedback!

Cheers,

Andreas Reuleaux

unread,

Jul 5, 2007, 8:26:16 AM7/5/07

to gen...@googlegroups.com

> > Zope comes only with i18:name while you are using i18n:tag and
> > i18n:name - Are both really necessary? - The difference seems to
> > be that i18n:param is a numerical value upon which a singular/plural
> > decision can be made, while i18n:tag is just an id, is that right?
>
> (hmm, I don't think I proposed i18n:name, I suspect that's a typo)

Yes, my typo, sorry.

> I've actually dropped i18n:tag from the updated proposal; nested tags
> always get a numeric identifier, which requires less typing and works
> just as well.
>
> And i18n:param is not limited to pluralization, it's more general. It
> basically tells the framework which part of a message is a parameter
> that gets substituted into the translation. For example:
>
> 
> Today is ${format.date("EEEE")}.
> 
>
> This gets translated to the following in the catalog:
>
> msgid "Today is [1:%(weekday)s]."
>
> Does that clarify the proposal?