Genshi now has basic support for internationalization [2], which in combination with Babel [1] works rather nicely AFAICT.
However there's one problem that isn't addressed yet, namely that of messages that may contain tags. This is a complicated issue, compounded by Genshi's striving to do correct escaping of strings in templates. That means you can't just have messages like the following:
msgid "Here's a <a href='#foobar'>link</a>."
The <a> tag would be escaped, and I think that's the right thing to do, because translations may very well contain things that *do* need to be escaped, and the translators shouldn't have to worry about escaping -- they may not even know what escaping is.
So we need a proper solution for this issue. I've outlined a possible approach in:
To summarize, I propose adding an i18n namespace, which would be processed exclusively by the Translator filter. That namespace provides tags to define exactly how a message is composed from mixed content. Please see the ticket linked above for details.
I'd love to hear your thoughts on this, and maybe alternative proposals.
> Genshi now has basic support for internationalization [2], which in > combination with Babel [1] works rather nicely AFAICT.
> However there's one problem that isn't addressed yet, namely that of > messages that may contain tags. This is a complicated issue, > compounded by Genshi's striving to do correct escaping of strings in > templates. That means you can't just have messages like the following:
> msgid "Here's a <a href='#foobar'>link</a>."
> The <a> tag would be escaped, and I think that's the right thing to > do, because translations may very well contain things that *do* need > to be escaped, and the translators shouldn't have to worry about > escaping -- they may not even know what escaping is.
There's a closely related issue which is how will we deal with similar messages built from within the Python code using the genshi.builder.
Example from Trac:
tag.p("You can ", tag.a("search", href=req.href.log(path, rev=rev, mode='path_history')), " in the repository history to see if that path existed but" " was later removed")
There are actually 2 distinct problems here: 1. how to collect the msgid from the Python source? 2. how to compose the msgid in a non fragmented way?
> So we need a proper solution for this issue. I've outlined a possible > approach in:
> To summarize, I propose adding an i18n namespace, which would be > processed exclusively by the Translator filter. That namespace > provides tags to define exactly how a message is composed from mixed > content. Please see the ticket linked above for details.
> I'd love to hear your thoughts on this, and maybe alternative proposals.
This approach looks very promising and could perhaps be extended to the genshi.builder situation.
In particular, for point 2. we could imagine using a few helper functions that would inject the appropriate attribute from the i18n namespace into the Element argument.
The above example becomes:
i18n_message(tag.p("You can ", i18n_tag('search', tag.a("search", href=req.href.log(path, rev=rev, mode='path_history'))), " in the repository history to see if that path existed but" " was later removed"))
i18n_message would also build the msgid by including the plain text from static strings (dynamic strings should be wrapped in i18_param() calls) and return the translation.
_But_ there's still the problematic point 1, and I'm not sure how the current extract_python() could be extended to handle that... One idea could be to track nested calls and have the possibility to register callbacks for each keyword, so the callback for i18_message could rebuild the tag expression. Well, this looks tedious, so I hope there's a simpler way.
>> Genshi now has basic support for internationalization [2], which in >> combination with Babel [1] works rather nicely AFAICT.
>> However there's one problem that isn't addressed yet, namely that of >> messages that may contain tags. This is a complicated issue, >> compounded by Genshi's striving to do correct escaping of strings in >> templates. That means you can't just have messages like the >> following:
>> msgid "Here's a <a href='#foobar'>link</a>."
>> The <a> tag would be escaped, and I think that's the right thing to >> do, because translations may very well contain things that *do* need >> to be escaped, and the translators shouldn't have to worry about >> escaping -- they may not even know what escaping is.
> There's a closely related issue which is how will we deal with similar > messages built from within the Python code using the genshi.builder.
> Example from Trac:
> tag.p("You can ", > tag.a("search", href=req.href.log(path, rev=rev, > mode='path_history')), > " in the repository history to see if that path existed but" > " was later removed")
> There are actually 2 distinct problems here: > 1. how to collect the msgid from the Python source? > 2. how to compose the msgid in a non fragmented way?
You're absolutely right, that's a problem the proposal doesn't address, and I also don't have a good idea so far how to solve it :-/
Well, one approach would be to move more of that kind of stuff into actual templates, but of course that's not always appropriate. On the other hand, Trac *does* too often put markup into exception messages, I think.
> However there's one problem that isn't addressed yet, namely that of > messages that may contain tags. This is a complicated issue, > compounded by Genshi's striving to do correct escaping of strings in > templates. That means you can't just have messages like the following:
> msgid "Here's a <a href='#foobar'>link</a>."
> The <a> tag would be escaped, and I think that's the right thing to > do, because translations may very well contain things that *do* need > to be escaped, and the translators shouldn't have to worry about > escaping -- they may not even know what escaping is.
Eacaping may good for text of content, but we should translate button text also. But the proposal does not mention about attribute text. I think the proposal is expecting explicit directive to be extracted. Is it extracted without directive automaticaly?
I think we may need one more i18n:xxx attribute to specify attribute names to be extracted. For example (with Japanese):
<input type="submit" value="Reply" title="Reply to comment ${change.cnum}" i18n:attributes="title value" i/>
=>
msgid="Reply" msgstr="返信"
msgid="Reply to comment ${change.cnum}" msgid="${change.cnum}へのコメント"
2. How to deal parameter in attribute? --------------------------------------
In example above, i18n:param cannot be used for attribute value. How about using parameter name as-is in msgid/msgstr?
3. i18n:tag might be required feature -------------------------------------
I think i18n:tag should be REQUIRED (at least when having multiple tags in msgstr) because the changing order of tags is always happen. And nested tags may be separated in translated text, and vice versa.
How about giving auto index number? (no need to give i18n:tag) It always appeared in msgid and it can be used in msgstr. ex: msgid="Please see [1:Help] for [2:details]." msgstr="[2:Details] finden Sie unter [1:Hilfe]."
> I'm trying to translate trac i18n branch to Japanese. > But not yet familar with genshi and babel.
> I've read the proposal and having some questions. > (These are not Japanese specific issue)
> 2007/6/27, Christopher Lenz <cml...@gmx.de>: >> However there's one problem that isn't addressed yet, namely that of >> messages that may contain tags. This is a complicated issue, >> compounded by Genshi's striving to do correct escaping of strings in >> templates. That means you can't just have messages like the >> following:
>> msgid "Here's a <a href='#foobar'>link</a>."
>> The <a> tag would be escaped, and I think that's the right thing to >> do, because translations may very well contain things that *do* need >> to be escaped, and the translators shouldn't have to worry about >> escaping -- they may not even know what escaping is.
> Eacaping may good for text of content, but we should translate > button text also. But the proposal does not mention about attribute > text. I think the proposal is expecting explicit directive to be > extracted. Is it extracted without directive automaticaly?
In general, there are a couple of attribute values that are extracted by default, such as "title" and "alt". Actually, these should only be extracted/translated automatically if they contain literal strings, but I'll have to check (and probably fix) the code in that respect.
> I think we may need one more i18n:xxx attribute to specify attribute > names to be extracted. > For example (with Japanese):
I don't see the need to add anything in the proposed i18n namespace to handle this situation.
> 2. How to deal parameter in attribute? > --------------------------------------
> In example above, i18n:param cannot be used for attribute value. > How about using parameter name as-is in msgid/msgstr?
I'm not sure I understand this one. Does the above answer it maybe?
> 3. i18n:tag might be required feature > -------------------------------------
> I think i18n:tag should be REQUIRED (at least when having multiple > tags > in msgstr) because the changing order of tags is always happen.
You mean when the original string in the template is updated?
> And nested tags may be separated in translated text, and vice versa.
Hm, really? Do you have an example for that? Translations changing the order I can understand, but the nesting?
> How about giving auto index number? (no need to give i18n:tag) > It always appeared in msgid and it can be used in msgstr. > ex: > msgid="Please see [1:Help] for [2:details]." > msgstr="[2:Details] finden Sie unter [1:Hilfe]."
Yeah, that's actually more convenient and consistent. If we do it this way, we actually won't need i18n:tag at all, AFAICT.
> > Eacaping may good for text of content, but we should translate > > button text also. But the proposal does not mention about attribute > > text. I think the proposal is expecting explicit directive to be > > extracted. Is it extracted without directive automaticaly?
> In general, there are a couple of attribute values that are extracted > by default, such as "title" and "alt". Actually, these should only be > extracted/translated automatically if they contain literal strings, > but I'll have to check (and probably fix) the code in that respect.
> > I think i18n:tag should be REQUIRED (at least when having multiple > > tags > > in msgstr) because the changing order of tags is always happen.
> You mean when the original string in the template is updated?
> > And nested tags may be separated in translated text, and vice versa.
> Hm, really? Do you have an example for that? Translations changing > the order I can understand, but the nesting?
As a simplest example, the sentence S+V+O in English will be translated as S+O+V in Japanese in generally.
So, as an example: <em>S <a href="xxx">V</a></em> O would be translated into <em>S</em> O <a href="xxx">V</a> or <em>S</em> O <em><a href="xxx">V</a></em>
Of course the translator can make effort to keep original structure of nesting, but it is not always a good sentence in his language. To be better translation, the translator might want to change the structure, I guess.
Shun-ichi GOTO wrote: > ... > As a simplest example, the sentence S+V+O in English will be > translated as S+O+V in Japanese in generally.
> So, as an example: > <em>S <a href="xxx">V</a></em> O > would be translated into > <em>S</em> O <a href="xxx">V</a> > or > <em>S</em> O <em><a href="xxx">V</a></em>
> Of course the translator can make effort to keep original structure of > nesting, but it is not always a good sentence in his language. To be > better translation, the translator might want to change the structure, > I guess.
What about the following?
''S [xxx V]'' O
translated to:
''S'' O [xxx V] or ''S'' O ''[xxx V]''
Oh I forgot, we're not talking /only/ about Trac ;-)
A lot has happened on the i18n front in the last days: babel and now ticket #129 (i18n namespace) - I am glad to see this fast progress, especially since I had proposed an i18n namespace (like in Zope 3) on this list earlier, so here are some comments.
Some of these ideas are just borrowd from Zope 3, which uses an i18n namespace already - I found the best description of i18n Zope 3 in Philipps book, 2nd edition, chapter 9 by the way: http://worldcookery.com/ (I am aware that it is not a particularly cheap book).
* I am always for short descriptive names: why not just i18n:msg="" instead of i18n:message="" - you are using msg for message in your examples anyway: in msgid, msgstr and I guess this is one of things one has to type rather often.
* i18n:message is roughly Zopes i18n:translate, however in Zope the attribute can be used to denote a custom msgid. I think this is a good idea, since sometimes one wants the same string to be translated differently in different circumstances - this has happend to me before and Philipp gives the example of the word "view" meaning differnt things in various situations: the noun view, the verb view, a view permission, a view button, a view tab etc. - this could be handled: <p i18n:message="view-permission">view</p> -> msgid: view-permission -> msgstr, en: view -> msgstr, de: Betrachten-Recht etc. <p i18n:message="view-button">view</p> -> msgid: view-button -> msgstr, en: view -> msgstr, de: Ansehen etc. The rule is: if the i18n:message attribute is empty (="") then the string itself is used as a msgid - this is what you were proposing, example: <p i18n:message="">Please see...</p> -> msgid: Please see... Otherwise the string is used as the msgid, example <p i18n:message="someid">whatever...</p> -> msgid: someid The onliest place were you used the message attribute was in the singular/plural example (6. Compound pluralizable messages including a tag), <p i18n:message="num"> <p i18n:singular>...(i18n:param="num" used inside)</p> <p i18n:plural>...(i18n:param="num" used inside)</p> </p> Not sure why this is needed here, couldn't this just be written as (empty i18:message)?: <p i18n:message=""> <p i18n:singular>...(i18n:param="num" used inside)</p> <p i18n:plural>...(i18n:param="num" used inside)</p> </p>
* params/tags are obviously needed, if only because the order of words in a sentence is different in different languages, Philipps example: "It takes x minutes to cook" "Es werden x Minuten zum Kochen benötigt" "Necesita x minutos..." param/tag: x
Zope comes only with i18:name while you are using i18n:tag and i18n:name - Are both really necessary? - The difference seems to be that i18n:param is a numerical value upon which a singular/plural decision can be made, while i18n:tag is just an id, is that right?
Just to complete the comparison with Zope:
* Zope make heavy use of translation domains: <html i18n:domain="myapp">... and all translation lookups are made in terms of this domain - don't know if this is really needed (your examples seem to work fine without): in practice I find myself always to stick to just the single domain of my application
* There is also an internationalized version of py:attrs (genshi) in Zope (actually it is called attributes there): i18n:attrs - - haven't used this yet, but an example I can think of (taking your first example: Compound messages including a tag)
<p i18n:message=""> Please see <a href="help.html">Help</a> for details. </p>
Say you want to give differnt links in different languages like a german help page: href="help-de.html", an english one href="help-en.html" - then one could write
<p i18n:message=""> Please see <a i18n:attrs="helplink">Help</a> for details. </p>
That's at least how I understood i18n:attrs - as mentioned before, I haven't used it yet
Just some food to think about - I am aware that some of these ideas are rather vague or even questions but I hope they are helful anyway.
On Wed, Jun 27, 2007 at 02:33:38PM +0200, Christopher Lenz wrote:
> Hey all,
> Genshi now has basic support for internationalization [2], which in > combination with Babel [1] works rather nicely AFAICT.
> However there's one problem that isn't addressed yet, namely that of > messages that may contain tags. This is a complicated issue, > compounded by Genshi's striving to do correct escaping of strings in > templates. That means you can't just have messages like the following:
> msgid "Here's a <a href='#foobar'>link</a>."
> The <a> tag would be escaped, and I think that's the right thing to > do, because translations may very well contain things that *do* need > to be escaped, and the translators shouldn't have to worry about > escaping -- they may not even know what escaping is.
> So we need a proper solution for this issue. I've outlined a possible > approach in:
> To summarize, I propose adding an i18n namespace, which would be > processed exclusively by the Translator filter. That namespace > provides tags to define exactly how a message is composed from mixed > content. Please see the ticket linked above for details.
> I'd love to hear your thoughts on this, and maybe alternative proposals.
> A lot has happened on the i18n front in the last days: babel and now > ticket #129 (i18n namespace) - I am glad to see this fast progress, > especially since I had proposed an i18n namespace (like in Zope 3) on > this list earlier, so here are some comments.
> Some of these ideas are just borrowd from Zope 3, which uses an i18n > namespace already - I found the best description of i18n Zope 3 in > Philipps book, 2nd edition, chapter 9 by the way: > http://worldcookery.com/ (I am aware that it is not a particularly > cheap book).
> * I am always for short descriptive names: why not > just i18n:msg="" instead of i18n:message="" > - you are using msg for message in your examples anyway: > in msgid, msgstr and I guess this is one of things one > has to type rather often.
> * i18n:message is roughly Zopes i18n:translate, however > in Zope the attribute can be used to denote a custom msgid. I think > this is a good idea, since sometimes one wants the same string to be > translated differently in different circumstances - this has happend > to me before and Philipp gives the example of the word "view" > meaning differnt things in various situations: the noun view, the > verb view, a view permission, a view button, a view tab etc. - this > could be handled: > <p i18n:message="view-permission">view</p> > -> msgid: view-permission > -> msgstr, en: view > -> msgstr, de: Betrachten-Recht > etc. > <p i18n:message="view-button">view</p> > -> msgid: view-button > -> msgstr, en: view > -> msgstr, de: Ansehen > etc. > The rule is: if the i18n:message attribute is empty (="") > then the string itself is used as a msgid - this is what > you were proposing, example: > <p i18n:message="">Please see...</p> > -> msgid: Please see... > Otherwise the string is used as the msgid, example > <p i18n:message="someid">whatever...</p> > -> msgid: someid
While I understand the problem this tries to address, I don't think it's the right approach. Ideally, the msgid should be usable as-is as a fallback string (or simply the default language version).
gettext actually provides a cleaner approach, "message contexts":
Unfortunately, the pgettext() family of functions is not supported by the Python gettext module. We discussed this just yesterday on the #python-babel IRC channel. I think Babel could provide an extended gettext module that could be swapped in by apps, and we'd provide patches for this support to go into a future Python version.
Anyway, I think using msgctxt is the way to go in the long term, instead of trying to encode the context inside the msgid itself.
> The onliest place were you used the message attribute > was in the singular/plural example (6. Compound pluralizable > messages > including a tag), > <p i18n:message="num"> > <p i18n:singular>...(i18n:param="num" used inside)</p> > <p i18n:plural>...(i18n:param="num" used inside)</p> > </p> > Not sure why this is needed here, couldn't this just be > written as (empty i18:message)?: > <p i18n:message=""> > <p i18n:singular>...(i18n:param="num" used inside)</p> > <p i18n:plural>...(i18n:param="num" used inside)</p> > </p>
Well, there needs to be a way to specify which number the singular/ plural selection should be based on. i18n:param is more generic (see below), and you could easily have more than one parameter in a pluralizable message.
Stuffing the variable reference in the i18n:msg attribute value is clumsy and not intuitive, though.
> * params/tags are obviously needed, if only because > the order of words in a sentence is different in different > languages, > Philipps example: > "It takes x minutes to cook" > "Es werden x Minuten zum Kochen ben tigt" > "Necesita x minutos..." > param/tag: x
> Zope comes only with i18:name while you are using i18n:tag and > i18n:name - Are both really necessary? - The difference seems to > be that i18n:param is a numerical value upon which a singular/plural > decision can be made, while i18n:tag is just an id, is that right?
(hmm, I don't think I proposed i18n:name, I suspect that's a typo)
I've actually dropped i18n:tag from the updated proposal; nested tags always get a numeric identifier, which requires less typing and works just as well.
And i18n:param is not limited to pluralization, it's more general. It basically tells the framework which part of a message is a parameter that gets substituted into the translation. For example:
<p i18n:msg=""> Today is <em i18n:param="weekday">${format.date("EEEE")}</em>. </p>
This gets translated to the following in the catalog:
msgid "Today is [1:%(weekday)s]."
Does that clarify the proposal?
> Just to complete the comparison with Zope:
> * Zope make heavy use of translation domains: > <html i18n:domain="myapp">... > and all translation lookups are made in terms of this domain > - don't know if this is really needed (your examples seem to work > fine without): in practice I find myself always to stick to > just the single domain of my application
Same here. I understand how using multiple domains may be nice, but for now I'm not really thinking about supporting them explicitly.
Also, the I18n in Genshi makes this a bit challenging, because you can have implicit/automatic messages (normal text in tags and attributes), explicit gettext() function calls in expressions and code blocks, as well as the namespace directives this proposal would add. In Zope, IIUC, you have only the i18n namespace stuff.
> * There is also an internationalized version of py:attrs (genshi) > in Zope (actually it is called attributes there): i18n:attrs - > - haven't used this yet, but an example I can think of > (taking your first example: Compound messages including a tag)
> <p i18n:message=""> > Please see <a href="help.html">Help</a> for details. > </p>
> Say you want to give differnt links in different languages > like a german help page: href="help-de.html", an english > one href="help-en.html" - then one could write
> <p i18n:message=""> > Please see <a i18n:attrs="helplink">Help</a> for details. > </p>
> That's at least how I understood i18n:attrs - as mentioned > before, I haven't used it yet
Actually, as far as I understand, i18n:attributes is simply a list of attributes that specifies which attribute values need localization. Genshi provides two ways to do that already:
* simply use gettext calls in expressions in the attribute value * include the attribute in the set of attributes that should be localized in general (alt, title, etc, are already in that set by default)
> Just some food to think about - I am aware that some of these > ideas are rather vague or even questions but I hope they are helful > anyway.