Google Groups

Re: [YARD] YARD 0.8.0 Development Plans


Loren Segal Feb 5, 2012 12:57 PM
Posted in group: YARD

On 2/5/2012 9:21 AM, Kouhei Sutou wrote:
> Hi,
>
> In<4F2E650...@soen.ca>
>    "Re: [YARD] YARD 0.8.0 Development Plans" on Sun, 05 Feb 2012 06:16:29 -0500,
>    Loren Segal<lse...@soen.ca>  wrote:
>
>>> For document (including YARD's document):
>>>     1. extract text to be translated from docstring.
>>>     2. generate .pot file for extracted text.
>>>     3. apply transation to docstring.
>>>     4. provide a Rake task that extracts text and merges
>>>        extracted text for easy to maintain.
>>>     5. document "how to create i18n supported document".
>> By document you mean documentation, yes?
> Yes. I meant documentation. Sorry.
>
>>                                           So from what I
>> understand, the goal is two-fold:
>>
>> 1. Create tools that helps documentation authors create .pot
>> files via a Rake task
>> 2. Apply these tools on YARD's own docs so that we can start
>> internationalizing YARD docs
> Yes. It's accurate.
>
>> If this is accurate,
>>
>> 1. Can you explain a little bit about the Rake task? Why not
>> a YARD command, for example? Like, `yard i18n
>> [--extract|--merge]<files>`. We could perhaps associate a
>> Rake task that runs this CLI command like the YardocTask,
>> for convenience.
> For extraction, it's OK for implementing as a YARD command.
> In my branch, it's `yard doc --format pot`.

Okay this makes things clearer. Perhaps we could still benefit from a
separate i18n command, if only to make it easier to document clearly to
users. Users will probably not realize to use `yard doc --format pot`,
but we can wrap that into an i18n command that does this, and then it
shows up clearly in the `yard help` command listing, something like:

   i18n     Extracts strings from documentation for internationalization

>
>> 2. It's one thing to extract the strings into .pot files for
>> YARD docs, but it's a completely different thing to actually
>> apply translations. We could use the tools to extract the
>> docs, but we probably won't see any translations any time
>> soon. So I'm not sure if this step will be of much use just
>> yet.
> Yuuta Yamada is working on it. :-)
>    https://github.com/yuutayamada/yard/tree/ja
>
> And his work had been almost done. So we can apply YARD's
> i18n feature soon.

Wow that is insane! Thank you so much!

>> If so, are there any tools to do automate extraction?
> There is `rgettext` that is bundled in gettext gem. It can
> extract translation target messages from *.rb and *.erb. But
> gettext gem isn't maintained yet. So we need some work on
> it. I already created a Ripper based text extraction feature
> as a YARD handler. But it may be better that text extraction
> tool is a separated tool rather than YARD's feature.

Right, it makes sense as a separate tool. The tool could just be a yard
plugin, of course. yard-rgettext. But you probably don't really need
YARD handlers for this stuff.

> We only use _() and N_() for label string not key. For
> example, define_tag's the first argument is a label string:
>
> lib/yard/tags/library.rb:
>        define_tag N_("Abstract"),           :abstract

Right, I figured as much, that is why I mentioned it would be easy to
automate. We don't have many cases where Strings are used as keys-- for
example, in that tab line above, :abstract is a symbol because it is a
key... we tend to follow that convention fairly consistently, so we
might actually benefit from a better rgettext tool.

> Most label strings are in templates. For example,
> templates/default/layout/html/objects.erb has a label
> string. It will be converted like the following:
>
>    before:
>      <h2>Namespace Listing A-Z</h2>
>
>    after:
>      <h2><%= _("Namespace Listing A-Z") %></h2>
>
> Anyway, I'll do manually works that surround label strings
> with _() or N_(). :-)

Perhaps we could use something like nokogiri to automate this? Text
nodes are pretty easy to find in an X(HT)ML document. Though the ERB
might confuse the parser.

Automation is important because (a) we want to make it easy for template
customizers to make i18n-friendly templates, and (b) if we expand YARD's
own templates, we don't want to be hampered by the extra work involved
in i18nizing everything-- the more tools can help the better.

If you worked on tools to do this more easily instead of the time spent
painstakingly replacing all those instances manually, we end up with a
system that's easier to use in the future. I don't know how much work
such a tool would be though, so this is just food for thought.

>
>
>> I would basically want to know, from a documentation
>> writer's perspective, how exactly will YARD be storing and
>> using this data? For example, we will have tools to extract
>> the .pot files `yard i18n` or a rake task), and then the
>> user will fill in those pot files with translations. Right?
> It's not right.
>
> Here is an ASCII art about workflow on gettext system:
>    http://www.gnu.org/software/gettext/manual/gettext.html#Overview
>
> .pot is a PO template file. Documentation writer generated
> .po file for each language from .pot file. In the ASCII art,
> it shows as "PACKAGE.pot ->  msgmerge ->  LANG.po".
>
> We use `msginit` for creating the initial LANG.po from
> PACKAGE.pot but it's not showed in the ASCII art. (We can
> also use `cp` for it. But we need some works after `cp`.)
>
> Documentation writer fills in LANG.po file with translations
> by favorite editor. In the ASCII art, it shows as
> "LANG.po ->  PO editor ->  New LANG.po".

Thanks for the explanation and link. I will read more about this in the
coming week so I'm up to speed on what is being done!

Perhaps you can answer this before I find it in the docs though, but why
is there a .pot and .po, if .mo is the final stage? Could we not just
automate this stage from .pot into .po as well? Forgive my ignorance if
that is a stupid question-- I'm just trying to make this as easy on our
users as possible, so the more things YARD can automate, the better.

>> So a few questions:
>>
>> 0. (Just to be sure) the .pot files are the "final" files
>> right? Will they need to run some tool against these .pot
>> files to turn them into something else? Or will YARD just do
>> that step internally? I'm not *too* familiar with gettext,
>> but I vaguely know of .po and .mo-- that's about all I know.
> No. .po file is the final file. .po file is generated from
> .pot file.
>
> The ASCII art says .mo file is the final file as
> "New LANG.po ->  msgfmt ->  LANG.gmo ->  install ->  /.../LANG/PACKAGE.mo"
> but we can do it internally. So document writer don't care
> about .mo file.

How exactly does .po get translated into .mo inside of YARD? Do we
depend on any external system tools that the Ruby stdlib does not
provide? I've always wanted to keep the amount of dependencies to a
minimum. Ruby does not ship with any gettext libraries, correct? That's
something we will have to look into.

>> 1. Where would these .pot files be stored (or whatever final
>> files that are created)? Would there be some conventional
>> location to store them? This is important for the last
>> question...
> po/#{LANG}.po is a conventional location. In my branch,
> locale/#{LANG}/#{PACKAGE}.po is used.

So, the .pot files are never stored, they are just an intermediary phase
to .po, which is stored in their source repository, yes?

>
>> 2. How will this work from the runtime API perspective?
>> Remember, YARD's plugin/extensibility support is an
>> important part of the project. For instance, if someone
>> loads up the Registry (YARD::Registry.load!), grabs an
>> object (o = Registry.at('YARD::CodeObjects::Base')) and asks
>> for the docstring (o.docstring), they'll get the
>> docstring. Should we be autotranslating the docstring into
>> their language? Or should we make them use _(o.docstring) to
>> translate on their own? Perhaps we should have a
>> o.docstring.translated to make this more obvious to users
>> trying to write plugins, if we don't auto-translate.
> We don't provide auto-translatable docstring. In my branch,
> o.docstring and o.docustring.to_s return non translated text
> but o.docstring.document (new method) and
> o.docstring.summary return translated text.

Hmm, this might be a little confusing. I will wait until the pull
request is made to look at it fully, though. Indeed, Docstring will be
problematic because it extends String, so that will be an interesting
problem.

>> 3. Regarding the runtime API again-- a user already
>> typically loads the registry via Registry.load. But, if .pot
>> files are external to the registry, they will ALSO have to
>> load up the translations in another command for something
>> like o.docstring.translated to work. Is this handled by
>> gettext as well?
> It means that we need to handle tanslations for one or more
> projects at once. Right? For example, we may process YARD's
> documentation and RSpec's documenation at once.

Yes, this is an example of a concern. For example, rubydoc.info runs
YARD live on the server, and generates HTML for the projects at runtime
inside of a Rack handler using the Server architecture, so it could be
serving many projects in the same process. We run it in separate threads
(which gettext supposedly supports for loading different languages), and
Registry is also local to a thread, so this would work for the most
part. If we had .mo files local to the Registry, then our translation
data would also be thread local, and would pose no problems for a setup
like rubydoc.info. But there might be other issues that I haven't
thought about.

>
> I didn't care about it. We can load many .po files and use
> them separatedly but we need some works for it.

Indeed, don't worry about this just yet. We will review the possible
compatibility issues when we have the initial implementation done. It's
something to keep in mind, though.

>
>>                   Could we theoretically compile the
>> translations and drop them into the `.yardoc` db directory?
> Yes. We can compile .po file to .mo file and loads .mo file.
> But we need per project translation mechanism. For example,
> we support uses YARD's .mo file and RSpec's .mo file
> separatedly.
>
>> That way they would be associated with the registry and
>> could be loaded when the Registry is loaded in a single
>> command. If the translations are outside the .yardoc
>> directory, it would be more difficult to load them together
>> (you would have to specify both paths). Note that if my
>> assumption about question 0 is wrong, and .pot files are not
>> the final files, it would actually make sense to "compile"
>> these translations into the .yardoc db alongside the
>> registry, but again, I don't know exactly how this works.
> I used separated load path mechanism. I also think that
> compiled file (.mo file) is put into .yardoc directory when
> .yardoc directory is created.

Okay, that will make the API much easier to use if we can do this. I
think we should look at attaching translation data to the Registry,
then-- by creating some new attributes/methods and serializing them with
the YardocSerializer and RegistryStore classes.

>
>> I'm just wondering if you've given these any thought. If
>> not, don't worry, I don't need answers to these just yet, as
>> long as we're thinking about how to answer these questions
>> before the release.
> Thanks for sending questions. They are very helpful because
> I didn't know about what should I explain.

Your explanations are super helpful, thank you!

Loren