Google Groups Home
Help | Sign in
Advanced internationalization
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  10 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Christopher Lenz  
View profile
 More options Jun 27 2007, 8:33 am
From: Christopher Lenz <cml...@gmx.de>
Date: Wed, 27 Jun 2007 14:33:38 +0200
Local: Wed, Jun 27 2007 8:33 am
Subject: Advanced internationalization
Hey all,

Genshi now has basic support for internationalization [2], which in  
combination with Babel [1] works rather nicely AFAICT.

However there's one problem that isn't addressed yet, namely that of  
messages that may contain tags. This is a complicated issue,  
compounded by Genshi's striving to do correct escaping of strings in  
templates. That means you can't just have messages like the following:

   msgid  "Here's a <a href='#foobar'>link</a>."

The <a> tag would be escaped, and I think that's the right thing to  
do, because translations may very well contain things that *do* need  
to be escaped, and the translators shouldn't have to worry about  
escaping -- they may not even know what escaping is.

So we need a proper solution for this issue. I've outlined a possible  
approach in:

   <http://genshi.edgewall.org/ticket/129#comment:2>

To summarize, I propose adding an i18n namespace, which would be  
processed exclusively by the Translator filter. That namespace  
provides tags to define exactly how a message is composed from mixed  
content. Please see the ticket linked above for details.

I'd love to hear your thoughts on this, and maybe alternative proposals.

   [1] http://genshi.edgewall.org/wiki/Documentation/i18n.html
   [2] http://babel.edgewall.org/

Thanks,
Chris
--
Christopher Lenz
   cmlenz at gmx.de
   http://www.cmlenz.net/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christian Boos  
View profile
 More options Jun 27 2007, 11:45 am
From: Christian Boos <cb...@neuf.fr>
Date: Wed, 27 Jun 2007 17:45:14 +0200
Local: Wed, Jun 27 2007 11:45 am
Subject: Re: Advanced internationalization

There's a closely related issue which is how will we deal with similar
messages built from within the Python code using the genshi.builder.

Example from Trac:

tag.p("You can ",
  tag.a("search", href=req.href.log(path, rev=rev, mode='path_history')),
  " in the repository history to see if that path existed but"
  " was later removed")

There are actually 2 distinct problems here:
 1. how to collect the msgid from the Python source?
 2. how to compose the msgid in a non fragmented way?

> So we need a proper solution for this issue. I've outlined a possible  
> approach in:

>    <http://genshi.edgewall.org/ticket/129#comment:2>

> To summarize, I propose adding an i18n namespace, which would be  
> processed exclusively by the Translator filter. That namespace  
> provides tags to define exactly how a message is composed from mixed  
> content. Please see the ticket linked above for details.

> I'd love to hear your thoughts on this, and maybe alternative proposals.

This approach looks very promising and could perhaps be extended to the
genshi.builder situation.

In particular, for point 2. we could imagine using a few helper
functions that would inject the appropriate attribute from the i18n
namespace into the Element argument.

The above example becomes:

i18n_message(tag.p("You can ",
    i18n_tag('search', tag.a("search", href=req.href.log(path, rev=rev,
mode='path_history'))),
  " in the repository history to see if that path existed but"
  " was later removed"))

i18n_message would also build the msgid by including the plain text from
static strings (dynamic strings should be wrapped in i18_param() calls)
and return the translation.

_But_ there's still the problematic point 1, and I'm not sure how the
current extract_python() could be extended to handle that... One idea
could be to track nested calls and have the possibility to register
callbacks for each keyword, so the callback for i18_message could
rebuild the tag expression. Well, this looks tedious, so I hope there's
a simpler way.

-- Christian


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christopher Lenz  
View profile
 More options Jun 27 2007, 12:10 pm
From: Christopher Lenz <cml...@gmx.de>
Date: Wed, 27 Jun 2007 18:10:39 +0200
Local: Wed, Jun 27 2007 12:10 pm
Subject: Re: Advanced internationalization
Am 27.06.2007 um 17:45 schrieb Christian Boos:

You're absolutely right, that's a problem the proposal doesn't  
address, and I also don't have a good idea so far how to solve it :-/

Well, one approach would be to move more of that kind of stuff into  
actual templates, but of course that's not always appropriate. On the  
other hand, Trac *does* too often put markup into exception messages,  
I think.

Cheers,
Chris
--
Christopher Lenz
   cmlenz at gmx.de
   http://www.cmlenz.net/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Shun-ichi GOTO  
View profile
 More options Jun 27 2007, 12:13 pm
From: "Shun-ichi GOTO" <shunichi.g...@gmail.com>
Date: Thu, 28 Jun 2007 01:13:06 +0900
Local: Wed, Jun 27 2007 12:13 pm
Subject: Re: Advanced internationalization
Hi,

I'm trying to translate trac i18n branch to Japanese.
But not yet familar with genshi and babel.

I've read the proposal and having some questions.
(These are not Japanese specific issue)

2007/6/27, Christopher Lenz <cml...@gmx.de>:

> However there's one problem that isn't addressed yet, namely that of
> messages that may contain tags. This is a complicated issue,
> compounded by Genshi's striving to do correct escaping of strings in
> templates. That means you can't just have messages like the following:

>    msgid  "Here's a <a href='#foobar'>link</a>."

> The <a> tag would be escaped, and I think that's the right thing to
> do, because translations may very well contain things that *do* need
> to be escaped, and the translators shouldn't have to worry about
> escaping -- they may not even know what escaping is.

1. Translating attribute values
-------------------------------

Eacaping may good for text of content, but we should translate
button text also. But the proposal does not mention about attribute
text. I think the proposal is expecting explicit directive to be
extracted. Is it extracted without directive automaticaly?

I think we may need one more i18n:xxx attribute to specify attribute
names to be extracted.
For example (with Japanese):

 <input type="submit" value="Reply" title="Reply to comment ${change.cnum}"
        i18n:attributes="title value" i/>

=>

 msgid="Reply"
 msgstr="返信"

 msgid="Reply to comment ${change.cnum}"
 msgid="${change.cnum}へのコメント"

2. How to deal parameter in attribute?
--------------------------------------

In example above, i18n:param cannot be used for attribute value.
How about using parameter name as-is in msgid/msgstr?

3. i18n:tag might be required feature
-------------------------------------

I think i18n:tag should be REQUIRED (at least when having multiple tags
in msgstr) because the changing order of tags is always happen.
And nested tags may be separated in translated text, and vice versa.

How about giving auto index number? (no need to give i18n:tag)
It always appeared in msgid and it can be used in msgstr.
ex:
  msgid="Please see [1:Help] for [2:details]."
  msgstr="[2:Details] finden Sie unter [1:Hilfe]."

--
Shun-ichi GOTO


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christopher Lenz  
View profile
 More options Jun 27 2007, 12:29 pm
From: Christopher Lenz <cml...@gmx.de>
Date: Wed, 27 Jun 2007 18:29:28 +0200
Local: Wed, Jun 27 2007 12:29 pm
Subject: Re: Advanced internationalization
Am 27.06.2007 um 18:13 schrieb Shun-ichi GOTO:

In general, there are a couple of attribute values that are extracted  
by default, such as "title" and "alt". Actually, these should only be  
extracted/translated automatically if they contain literal strings,  
but I'll have to check (and probably fix) the code in that respect.

> I think we may need one more i18n:xxx attribute to specify attribute
> names to be extracted.
> For example (with Japanese):

>  <input type="submit" value="Reply" title="Reply to comment $
> {change.cnum}"
>         i18n:attributes="title value" i/>

> =>

In this case what you really should do is use gettext explicitly:

   <input type="submit" value="${_('Reply')}"
          title="${_('Reply to comment %(num)s') % {'num':  
change.cnum}}" />

I don't see the need to add anything in the proposed i18n namespace  
to handle this situation.

> 2. How to deal parameter in attribute?
> --------------------------------------

> In example above, i18n:param cannot be used for attribute value.
> How about using parameter name as-is in msgid/msgstr?

I'm not sure I understand this one. Does the above answer it maybe?

> 3. i18n:tag might be required feature
> -------------------------------------

> I think i18n:tag should be REQUIRED (at least when having multiple  
> tags
> in msgstr) because the changing order of tags is always happen.

You mean when the original string in the template is updated?

> And nested tags may be separated in translated text, and vice versa.

Hm, really? Do you have an example for that? Translations changing  
the order I can understand, but the nesting?

> How about giving auto index number? (no need to give i18n:tag)
> It always appeared in msgid and it can be used in msgstr.
> ex:
>   msgid="Please see [1:Help] for [2:details]."
>   msgstr="[2:Details] finden Sie unter [1:Hilfe]."

Yeah, that's actually more convenient and consistent. If we do it  
this way, we actually won't need i18n:tag at all, AFAICT.

Thanks,
Chris
--
Christopher Lenz
   cmlenz at gmx.de
   http://www.cmlenz.net/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Shun-ichi GOTO  
View profile
 More options Jun 27 2007, 2:18 pm
From: "Shun-ichi GOTO" <shunichi.g...@gmail.com>
Date: Thu, 28 Jun 2007 03:18:14 +0900
Local: Wed, Jun 27 2007 2:18 pm
Subject: Re: Advanced internationalization
2007/6/28, Christopher Lenz <cml...@gmx.de>:

> > 1. Translating attribute values
> > -------------------------------

> > Eacaping may good for text of content, but we should translate
> > button text also. But the proposal does not mention about attribute
> > text. I think the proposal is expecting explicit directive to be
> > extracted. Is it extracted without directive automaticaly?

> In general, there are a couple of attribute values that are extracted
> by default, such as "title" and "alt". Actually, these should only be
> extracted/translated automatically if they contain literal strings,
> but I'll have to check (and probably fix) the code in that respect.

OK. It's helpful.

OK, I see.

> > 2. How to deal parameter in attribute?
> > --------------------------------------

> > In example above, i18n:param cannot be used for attribute value.
> > How about using parameter name as-is in msgid/msgstr?

> I'm not sure I understand this one. Does the above answer it maybe?

Yes, it's enough.

> > 3. i18n:tag might be required feature
> > -------------------------------------

> > I think i18n:tag should be REQUIRED (at least when having multiple
> > tags
> > in msgstr) because the changing order of tags is always happen.

> You mean when the original string in the template is updated?

> > And nested tags may be separated in translated text, and vice versa.

> Hm, really? Do you have an example for that? Translations changing
> the order I can understand, but the nesting?

As a simplest example, the sentence S+V+O in English will be
translated as S+O+V in Japanese in generally.

So, as an example:
  <em>S <a href="xxx">V</a></em> O
would be translated into
  <em>S</em> O <a href="xxx">V</a>
or
  <em>S</em> O <em><a href="xxx">V</a></em>

Of course the translator can make effort to keep original structure of
nesting, but it is not always a good sentence in his language.  To be
better translation, the translator might want to change the structure,
I guess.

--
Shun-ichi GOTO


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christian Boos  
View profile
 More options Jun 28 2007, 2:51 am
From: Christian Boos <cb...@neuf.fr>
Date: Thu, 28 Jun 2007 08:51:01 +0200
Local: Thurs, Jun 28 2007 2:51 am
Subject: Re: Advanced internationalization

What about the following?

''S [xxx V]'' O

translated to:

''S'' O [xxx V]
or
''S'' O ''[xxx V]''

Oh I forgot, we're not talking /only/ about Trac ;-)

-- Christian


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andreas Reuleaux  
View profile
 More options Jul 3 2007, 6:20 pm
From: Andreas Reuleaux <reule...@web.de>
Date: Wed, 4 Jul 2007 00:20:11 +0200
Local: Tues, Jul 3 2007 6:20 pm
Subject: Re: Advanced internationalization
A lot has happened on the i18n front in the last days: babel and now
ticket #129 (i18n namespace) - I am glad to see this fast progress,
especially since I had proposed an i18n namespace (like in Zope 3) on
this list earlier, so here are some comments.

Some of these ideas are just borrowd from Zope 3, which uses an i18n
namespace already - I found the best description of i18n Zope 3 in
Philipps book, 2nd edition, chapter 9 by the way:
http://worldcookery.com/ (I am aware that it is not a particularly
cheap book).

* I am always for short descriptive names: why not
  just i18n:msg="" instead of i18n:message=""
  - you are using msg for message in your examples anyway:
  in msgid, msgstr and I guess this is one of things one
  has to type rather often.

* i18n:message is roughly Zopes i18n:translate, however
  in Zope the attribute can be used to denote a custom msgid.  I think
  this is a good idea, since sometimes one wants the same string to be
  translated differently in different circumstances - this has happend
  to me before and Philipp gives the example of the word "view"
  meaning differnt things in various situations: the noun view, the
  verb view, a view permission, a view button, a view tab etc. - this
  could be handled:
    <p i18n:message="view-permission">view</p>
      -> msgid: view-permission
      -> msgstr, en: view
      -> msgstr, de: Betrachten-Recht
      etc.
    <p i18n:message="view-button">view</p>
      -> msgid: view-button
      -> msgstr, en: view
      -> msgstr, de: Ansehen
    etc.
  The rule is: if the i18n:message attribute is empty (="")
  then the string itself is used as a msgid - this is what
  you were proposing, example:
    <p i18n:message="">Please see...</p>
    -> msgid: Please see...
  Otherwise the string is used as the msgid, example
    <p i18n:message="someid">whatever...</p>
    -> msgid: someid
  The onliest place were you used the message attribute
  was in the singular/plural example (6. Compound pluralizable messages
  including a tag),
    <p i18n:message="num">
      <p i18n:singular>...(i18n:param="num" used inside)</p>
      <p i18n:plural>...(i18n:param="num" used inside)</p>
    </p>
  Not sure why this is needed here, couldn't this just be
  written as (empty i18:message)?:
    <p i18n:message="">
      <p i18n:singular>...(i18n:param="num" used inside)</p>
      <p i18n:plural>...(i18n:param="num" used inside)</p>
    </p>

* params/tags are obviously needed, if only because
  the order of words in a sentence is different in different languages,
  Philipps example:
    "It takes x minutes to cook"
    "Es werden x Minuten zum Kochen benötigt"
    "Necesita x minutos..."
  param/tag: x

  Zope comes only with i18:name while you are using i18n:tag and
  i18n:name - Are both really necessary? - The difference seems to
  be that i18n:param is a numerical value upon which a singular/plural
  decision can be made, while i18n:tag is just an id, is that right?

Just to complete the comparison with Zope:

* Zope make heavy use of translation domains:
  <html i18n:domain="myapp">...
  and all translation lookups are made in terms of this domain
  - don't know if this is really needed (your examples seem to work
  fine without): in practice I find myself always to stick to
  just the single domain of my application

* There is also an internationalized version of py:attrs (genshi)
  in Zope (actually it is called attributes there): i18n:attrs -
  - haven't used this yet, but an example I can think of
  (taking your first example: Compound messages including a tag)

    <p i18n:message="">
      Please see <a href="help.html">Help</a> for details.
    </p>

  Say you want to give differnt links in different languages
  like a german help page: href="help-de.html", an english
  one href="help-en.html" - then one could write

    <p i18n:message="">
      Please see <a i18n:attrs="helplink">Help</a> for details.
    </p>

  That's at least how I understood i18n:attrs - as mentioned
  before, I haven't used it yet

Just some food to think about - I am aware that some of these
ideas are rather vague or even questions but I hope they are helful anyway.

-Andreas


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christopher Lenz  
View profile
 More options Jul 4 2007, 5:27 am
From: Christopher Lenz <cml...@gmx.de>
Date: Wed, 4 Jul 2007 11:27:31 +0200
Local: Wed, Jul 4 2007 5:27 am
Subject: Re: Advanced internationalization
Hi Andreas,

Am 04.07.2007 um 00:20 schrieb Andreas Reuleaux:

Yeah, I agree.

While I understand the problem this tries to address, I don't think  
it's the right approach. Ideally, the msgid should be usable as-is as  
a fallback string (or simply the default language version).

gettext actually provides a cleaner approach, "message contexts":

   <http://www.gnu.org/software/gettext/manual/gettext.html#Contexts>

Unfortunately, the pgettext() family of functions is not supported by  
the Python gettext module. We discussed this just yesterday on the  
#python-babel IRC channel. I think Babel could provide an extended  
gettext module that could be swapped in by apps, and we'd provide  
patches for this support to go into a future Python version.

Anyway, I think using msgctxt is the way to go in the long term,  
instead of trying to encode the context inside the msgid itself.

>   The onliest place were you used the message attribute
>   was in the singular/plural example (6. Compound pluralizable  
> messages
>   including a tag),
>     <p i18n:message="num">
>       <p i18n:singular>...(i18n:param="num" used inside)</p>
>       <p i18n:plural>...(i18n:param="num" used inside)</p>
>     </p>
>   Not sure why this is needed here, couldn't this just be
>   written as (empty i18:message)?:
>     <p i18n:message="">
>       <p i18n:singular>...(i18n:param="num" used inside)</p>
>       <p i18n:plural>...(i18n:param="num" used inside)</p>
>     </p>

Well, there needs to be a way to specify which number the singular/
plural selection should be based on. i18n:param is more generic (see  
below), and you could easily have more than one parameter in a  
pluralizable message.

Stuffing the variable reference in the i18n:msg attribute value is  
clumsy and not intuitive, though.

> * params/tags are obviously needed, if only because
>   the order of words in a sentence is different in different  
> languages,
>   Philipps example:
>     "It takes x minutes to cook"
>     "Es werden x Minuten zum Kochen ben tigt"
>     "Necesita x minutos..."
>   param/tag: x

>   Zope comes only with i18:name while you are using i18n:tag and
>   i18n:name - Are both really necessary? - The difference seems to
>   be that i18n:param is a numerical value upon which a singular/plural
>   decision can be made, while i18n:tag is just an id, is that right?

(hmm, I don't think I proposed i18n:name, I suspect that's a typo)

I've actually dropped i18n:tag from the updated proposal; nested tags  
always get a numeric identifier, which requires less typing and works  
just as well.

And i18n:param is not limited to pluralization, it's more general. It  
basically tells the framework which part of a message is a parameter  
that gets substituted into the translation. For example:

   <p i18n:msg="">
     Today is <em i18n:param="weekday">${format.date("EEEE")}</em>.
   </p>

This gets translated to the following in the catalog:

   msgid "Today is [1:%(weekday)s]."

Does that clarify the proposal?

> Just to complete the comparison with Zope:

> * Zope make heavy use of translation domains:
>   <html i18n:domain="myapp">...
>   and all translation lookups are made in terms of this domain
>   - don't know if this is really needed (your examples seem to work
>   fine without): in practice I find myself always to stick to
>   just the single domain of my application

Same here. I understand how using multiple domains may be nice, but  
for now I'm not really thinking about supporting them explicitly.

Also, the I18n in Genshi makes this a bit challenging, because you  
can have implicit/automatic messages (normal text in tags and  
attributes), explicit gettext() function calls in expressions and  
code blocks, as well as the namespace directives this proposal would  
add. In Zope, IIUC, you have only the i18n namespace stuff.

Actually, as far as I understand, i18n:attributes is simply a list of  
attributes that specifies which attribute values need localization.  
Genshi provides two ways to do that already:

  * simply use gettext calls in expressions in the attribute value
  * include the attribute in the set of attributes that should be  
localized in general (alt, title, etc, are already in that set by  
default)

> Just some food to think about - I am aware that some of these
> ideas are rather vague or even questions but I hope they are helful  
> anyway.

Yeah, thanks for the feedback!

Cheers,
Chris
--
Christopher Lenz
   cmlenz at gmx.de
   http://www.cmlenz.net/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the