I18N support

Showing 1-10 of 10 messages
I18N support kou 9/13/11 7:25 PM
Hi,

Could you tell whether there is any schedule to which YARD supports I18N?
If YARD will support I18N, I want to contribute to it.

Background:
I want to provide English version document and Japanese version document
to my library users. There are some ways to do this.

(1) GNU gettext style:
I write the original document in English, create English to Japanese message
translate file and generate Japanese version document from the original
document in English + English to Japanese message translate file.
(2) Full translate style:
I write the original documents in English and Japanese. They both include
all text content and markup.
(3) ...

I want to use (1) style because I don't want to write the same markup twice.
(1) style just needs English to Japanese translation file that doesn't care
about markup. It only handles text. (2) style requires text with markup in
Japanese.


I'm creating document in English and Japanese with YARD + xml2po(*).
Here is created documents example:
http://groonga.rubyforge.org/rroonga/en/ (English)
http://groonga.rubyforge.org/rroonga/ja/ (Japanese)

(*) http://live.gnome.org/GnomeDocUtils
xml2po
generates .po files for GNU gettext from XML(XHML) files and
translates XML file with generated po files
It is used by many GNOME related libraries.


I met some problems with the above work because xml2po has some problems
to handle XHTML file:

(1) xml2po can't handle  <span>...</span> in <pre> nicely. Spaces in
<span> aren't preserved. <span> in <pre> is used syntax highlighting in YARD.
(2) xml2po requires all alt and title attributes are translated. If
one of them is
not translated, xml2po raises an exception and exit abnormally.
(I had submitted a patch to fix it.)


So I consider about another solution.

FYI: Sphinx starts I18N support. http://sphinx.pocoo.org/latest/intl.html


Thanks,
--
kou

Re: [YARD] I18N support Loren Segal 9/13/11 9:14 PM
Hi Kouhei,

On 9/13/2011 10:25 PM, Kouhei Sutou wrote:
> Could you tell whether there is any schedule to which YARD supports I18N?
> If YARD will support I18N, I want to contribute to it.
Right now there is no I18N support in YARD minus some awkward hacks (you
list them below), but those hacks don't cover the templates themselves.
We've discussed it before, but there was never enough user interest to
devote the effort. We would *love* to have better I18N support! If you
want to help out, please do!

> Background:
> I want to provide English version document and Japanese version document
> to my library users. There are some ways to do this.
>
> (1) GNU gettext style:
> I write the original document in English, create English to Japanese message
> translate file and generate Japanese version document from the original
> document in English + English to Japanese message translate file.
This certainly seems like the most viable approach.

Can you provide an example of how you envision a gettext-style
implementation working in the context of YARD? Would you want special
syntax within the docstrings? How would YARD detect the boundaries of
translation strings, or would a translation string just be the entire
docstring itself? Note that each tag value would have its own
translation string separate from the docstring (I assume?). Or perhaps
you would rather the translation strings be generated after markup is
generated (like the xml2po you describe below, but "better").

Admittedly, I'm not well versed with gettext, so a lot of the
implementation details might get lost on me. Another reason why we had
never attempted I18N before-- I'm not exactly qualified to do this kind
of stuff. Any explanations would be helpful.

> I want to use (1) style because I don't want to write the same markup twice.
>
Understandable.

> I'm creating document in English and Japanese with YARD + xml2po(*).
> Here is created documents example:
> http://groonga.rubyforge.org/rroonga/en/ (English)
> http://groonga.rubyforge.org/rroonga/ja/ (Japanese)
> (*) http://live.gnome.org/GnomeDocUtils
> xml2po generates .po files for GNU gettext from XML(XHML) files and
*snip*

> I met some problems with the above work because xml2po has some problems
> to handle XHTML file:
*snip*

(These docs look awesome by the way)

Again, a lot of the details of gettext are lost on me, but it seems as
though it's just ripping out text nodes in the XML file and making those
the translation strings for the .po files? You would end up with lots of
problems with that, since things like span, as you pointed out, get
separated from larger strings.


> So I consider about another solution.

If you have any specific ideas I'd love to hear. I'm not sure what kind
of better solutions we could consider-- the only thing I can imagine is
that it would be easier to do the translation on the raw docstrings,
prior to the data getting turned into HTML (that way you could also
translate for other formats). Of course that would mean that formatting
would get swallowed up along with the translation strings, and you would
need to redo this stuff, something you mentioned you don't want to deal
with.

Unless there was an extra syntax in the docstrings to explicitly show
the "text" part of the docstring to use in a translation string, it
would have to just dump the entire raw docstring... but adding in extra
syntax would look bulky and awkward, since it would be used on almost
every paragraph/sentence. So, all this to say, maybe xml2po gets you
closer to your goal than anything we can do within YARD itself? Again,
I'm not sure what we can do differently, but I'm open to suggestions!

Regards,

Loren

Re: [YARD] I18N support kou 9/14/11 2:34 AM
Hi,

2011/9/14 Loren Segal <lse...@soen.ca>:

> devote the effort. We would *love* to have better I18N support! If you want
> to help out, please do!

It's good news. :-)

> Can you provide an example of how you envision a gettext-style
> implementation working in the context of YARD? Would you want special syntax
> within the docstrings? How would YARD detect the boundaries of translation
> strings, or would a translation string just be the entire docstring itself?
>
> Note that each tag value would have its own translation string separate from
> the docstring (I assume?). Or perhaps you would rather the translation
> strings be generated after markup is generated (like the xml2po you describe
> below, but "better").

I don't want to add any new syntax.
It seems better that we use a paragraph in tag value for
a translation string. tag.value.split(/\n{2,}/) will be a easy and good
enough implementation.

I don't want to use generated text by markup parser because it
causes some noises. For example, program code syntax highlight
has many <span>s. They are noises for translation.

>> I want to use (1) style because I don't want to write the same markup
>> twice.
>>
> Understandable.

Ah, sorry. It's my mistake. "(1)" should be "(2)".


> (These docs look awesome by the way)

Thanks. :-)
I'm developing a library that adds I18N document support to a library.
It's Packnga: http://groonga.rubyforge.org/packnga/en/


> Again, a lot of the details of gettext are lost on me, but it seems as
> though it's just ripping out text nodes in the XML file and making those the
> translation strings for the .po files?

Yes. You're right.
1. 'xml2po --output XXX.pot' generates empty .po file that is called .pot file.
2. 'msginit --input XXX.pot --output XXX.po --locale ja' generates .po file
  for Japanese.
3. translates messages in *.po files.
4. 'xml2po --po-file XXX.po --language ja XXX.xml' generates XML file
  that has translated contents.

> You would end up with lots of
> problems with that, since things like span, as you pointed out, get
> separated from larger strings.

Yes. I want to extract meaningful text chunks from a document.
Both too small chunks and too large chunks are not good. It seems
that a paragraph is a good chunk size.


> If you have any specific ideas I'd love to hear. I'm not sure what kind of
> better solutions we could consider-- the only thing I can imagine is that it
> would be easier to do the translation on the raw docstrings, prior to the
> data getting turned into HTML (that way you could also translate for other
> formats).

I also think about the idea.
I want to split the raw docstrings into more small chunks.
(paragraphs will be better.)
And extracted chunks will be translated by gettext framework.
(GNU gettext provides many utilities for translators.)
This is the same way that is used by Sphinx.

> Of course that would mean that formatting would get swallowed up
> along with the translation strings, and you would need to redo this stuff,
> something you mentioned you don't want to deal with.

Yes. But it's more better rather than I write a full translated text as
a separated file. For example, I write both README.md.en and
README.md.ja.

> Unless there was an extra syntax in the docstrings to explicitly show the
> "text" part of the docstring to use in a translation string, it would have
> to just dump the entire raw docstring... but adding in extra syntax would
> look bulky and awkward, since it would be used on almost every
> paragraph/sentence.

I also don't want to introduce a new extra syntax.

> So, all this to say, maybe xml2po gets you closer to
> your goal than anything we can do within YARD itself? Again, I'm not sure
> what we can do differently, but I'm open to suggestions!

As the first step, what about we add .po generate feature to YARD?
If we get .po for docstring, we will be able to translate it by
fast_gettext gem.


Thanks,
--
kou

Re: [YARD] I18N support Loren Segal 9/14/11 11:40 PM

On 9/14/2011 5:34 AM, Kouhei Sutou wrote:
> I don't want to add any new syntax.
> It seems better that we use a paragraph in tag value for
> a translation string. tag.value.split(/\n{2,}/) will be a easy and good
> enough implementation.

I think that makes a lot of sense.

> I don't want to use generated text by markup parser because it
> causes some noises. For example, program code syntax highlight
> has many<span>s. They are noises for translation.
>
Agreed.

>
>>> I want to use (1) style because I don't want to write the same markup
>>> twice.
>>>
>> Understandable.
> Ah, sorry. It's my mistake. "(1)" should be "(2)".
>

Oh, well that makes things way easier!


> I also think about the idea.
> I want to split the raw docstrings into more small chunks.
> (paragraphs will be better.)
> And extracted chunks will be translated by gettext framework.
> (GNU gettext provides many utilities for translators.)
> This is the same way that is used by Sphinx.
>

Yes, again, this seems to be the easiest thing to do.


>
> As the first step, what about we add .po generate feature to YARD?
> If we get .po for docstring, we will be able to translate it by
> fast_gettext gem.

I would be all for that. If you have time to implement this, that would
be great!

The next step would be to gettextify YARD itself, since there are many
strings in templates etc. that are not I18N friendly.

Loren

Re: [YARD] I18N support kou 9/15/11 9:51 PM
Hi,

2011/9/15 Loren Segal <lse...@soen.ca>:

>> As the first step, what about we add .po generate feature to YARD?
>> If we get .po for docstring, we will be able to translate it by
>> fast_gettext gem.
>
> I would be all for that. If you have time to implement this, that would be
> great!

OK. I'll try it.

> The next step would be to gettextify YARD itself, since there are many
> strings in templates etc. that are not I18N friendly.

It sounds good for me. :-)


Thanks,
--
kou

Re: I18N support etagwerker 9/27/11 7:48 AM
Hi,

I am interested in this thread, as I'd like some of the text to be in
Spanish.
This sounds great. Has anyone started working on this? I'd be happy to
contribute.

Please let me know.

Thanks,
Ernesto

>
> Thanks,
> --
> kou
Re: [YARD] Re: I18N support kou 9/27/11 8:08 AM
Hi,

2011/9/27 etagwerker <etagw...@gmail.com>:

>> >> As the first step, what about we add .po generate feature to YARD?
>> >> If we get .po for docstring, we will be able to translate it by
>> >> fast_gettext gem.
>>
>> > I would be all for that. If you have time to implement this, that would be
>> > great!
>>
>> OK. I'll try it.
>>
>> > The next step would be to gettextify YARD itself, since there are many
>> > strings in templates etc. that are not I18N friendly.
>>
>> It sounds good for me. :-)
>
> This sounds great. Has anyone started working on this? I'd be happy to
> contribute.
>
> Please let me know.

I'm not working on this yet. (Sorry...)
If you can start working on this, please start this. :-)


Thanks,
--
kou

Re: [YARD] I18N support kou 10/12/11 9:28 PM
Hi,

I'm sorry for my late work.

2011/9/16 Kouhei Sutou <k...@cozmixng.org>:


> 2011/9/15 Loren Segal <lse...@soen.ca>:
>
>>> As the first step, what about we add .po generate feature to YARD?
>>> If we get .po for docstring, we will be able to translate it by
>>> fast_gettext gem.
>>
>> I would be all for that. If you have time to implement this, that would be
>> great!
>
> OK. I'll try it.

I wrote a POT formatter:
  https://github.com/lsegal/yard/pull/395

POT means PO template. POT is used as a master file to generates PO files.

>> The next step would be to gettextify YARD itself, since there are many
>> strings in templates etc. that are not I18N friendly.
>
> It sounds good for me. :-)

Can I go to the next step?


Thanks,
--
kou

Re: [YARD] I18N support Loren Segal 10/14/11 2:40 PM

On 10/13/2011 12:28 AM, Kouhei Sutou wrote:
> Hi,
>
> I'm sorry for my late work.

No problem!

>
> 2011/9/16 Kouhei Sutou<k...@cozmixng.org>:
>> 2011/9/15 Loren Segal<lse...@soen.ca>:
>>
>>>> As the first step, what about we add .po generate feature to YARD?
>>>> If we get .po for docstring, we will be able to translate it by
>>>> fast_gettext gem.
>>> I would be all for that. If you have time to implement this, that would be
>>> great!
>> OK. I'll try it.
> I wrote a POT formatter:
>    https://github.com/lsegal/yard/pull/395
>
> POT means PO template. POT is used as a master file to generates PO files.

At first glance, it's very simple and clean. It looks like a go.

>
>>> The next step would be to gettextify YARD itself, since there are many
>>> strings in templates etc. that are not I18N friendly.
>> It sounds good for me. :-)
> Can I go to the next step?

Yes, once I merge (will happen soon) we can start stringifying the YARD
stuff.


Loren

Re: [YARD] I18N support kou 10/19/11 5:35 AM
Hi,

2011/10/15 Loren Segal <lse...@soen.ca>:

>> I wrote a POT formatter:
>>   https://github.com/lsegal/yard/pull/395
>>
>> POT means PO template. POT is used as a master file to generates PO files.
>
> At first glance, it's very simple and clean. It looks like a go.

Thanks. :-)

>>>> The next step would be to gettextify YARD itself, since there are many
>>>> strings in templates etc. that are not I18N friendly.
>>>
>>> It sounds good for me. :-)
>>
>> Can I go to the next step?
>
> Yes, once I merge (will happen soon) we can start stringifying the YARD
> stuff.

OK. I'll go to the next step.

--
kou