Code comment/string translation

31 views
Skip to first unread message

cla...@2xlibre.net

unread,
Jun 10, 2013, 8:20:18 AM6/10/13
to sphin...@googlegroups.com
In the Django documentation, we have a lot of code snippets containing strings and comments.

One example: https://docs.djangoproject.com/en/dev/topics/db/queries/#one-to-many-relationships

I wonder if it would be technically possible to either auto-detect such content to mark them for translation, or to add some special markup to specify some translatable chunks in code snippets.

Claude

Takayuki Shimizukawa

unread,
Jun 19, 2013, 8:52:49 PM6/19/13
to sphin...@googlegroups.com
Hi Claude,

Currently, the code sample has been excepted from the candidate for translation.
https://bitbucket.org/birkenfeld/sphinx/src/b4abca7/sphinx/util/nodes.py#cl-40
The change including a code sample is not so difficult.

However, since it is necessary to treat a new-line and a blank
strictly, treating by PO file becomes difficult.
I have no idea to resolve the issue... Proposals and/or pull requests
are welcome :)
--
Takayuki SHIMIZUKAWA
http://about.me/shimizukawa


2013/6/10 <cla...@2xlibre.net>:
> --
> You received this message because you are subscribed to the Google Groups
> "sphinx-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sphinx-dev+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Claude Paroz

unread,
Jun 20, 2013, 10:28:24 AM6/20/13
to sphin...@googlegroups.com
Le jeudi 20 juin 2013 à 09:52 +0900, Takayuki Shimizukawa a écrit :
> Hi Claude,
>
> Currently, the code sample has been excepted from the candidate for translation.
> https://bitbucket.org/birkenfeld/sphinx/src/b4abca7/sphinx/util/nodes.py#cl-40
> The change including a code sample is not so difficult.

Thanks for the pointer.

> However, since it is necessary to treat a new-line and a blank
> strictly, treating by PO file becomes difficult.

Not necessarily. PO files can handle newlines/tabs/spaces. Of course,
the translator has to be careful not to mess with them.

> I have no idea to resolve the issue... Proposals and/or pull requests
> are welcome :)

Would be an option to code-block directive possible?

Example:
Original:

.. code-block:: python
:translatable-lines: 2,4

def my_function():
""" A translatable docstring """
call_to_some_method()
return "Needed work done!"

Extracted message:
msgid " \"\"\" A translatable docstring \"\"\"\n"
" return \"Needed work done!\""

At build time, just replace every translatable line with its translated
equivalent line.

Feasible, or am I dreaming?

Claude

Robert Lehmann

unread,
Jun 20, 2013, 11:20:26 AM6/20/13
to sphin...@googlegroups.com
Hi Claude,

I'd propose pulling full-text comments out of code, replacing them with footnotes or markers instead:

    questionable_code_line()  # <--
    another_one()  # (1)

I think having any kind of markup in the msgids is a source for error.  Same for code.  Also, combining multiple code lines into one msgid will only cause unhappiness.

I like the idea of automatically extracting comments only — I'd be happy to review a patch! — but consider it non-trivial.  First off, we'd probably want to care for multiple languages (ie., commenting styles.)  Also, code comments cannot be easily wrapped so that lines might be unreasonably inflated/deflated in translations.

We could theoretically re-use Pygments' tokenization, but that's already imperfect in cases like Python's multi line comments.  A quick test revealed it also does not separate comment markers from the comment string, so that we *would* have to special-case every language (or, do some black magic to extract the comment characters, think /* comment */, from Pygments' lexer description.)  On the upside, it distinguishes between docstrings and normal strings.  On the downside, docstrings are extra hard anyways because they *can* contain structured markup.

I'm not sure if there's good answers to all of these questions.  If someone is up to the task, go ahead!

Cheers,
Robert




Claude Paroz

unread,
Jun 20, 2013, 3:08:21 PM6/20/13
to sphin...@googlegroups.com
Le jeudi 20 juin 2013 à 17:20 +0200, Robert Lehmann a écrit :
> Hi Claude,

Hi Robert,

> I'd propose pulling full-text comments out of code, replacing them
> with footnotes or markers instead:
>
>
> questionable_code_line() # <--
>
> another_one() # (1)

This might be a solution in certain cases, but not in our particular use
case, where pages can be rather long. It'd lower the general readability
of the document and would never be accepted by English writers.
Moreover, it doesn't address translatability of strings inside
parameters.

> I think having any kind of markup in the msgids is a source for error.
> Same for code.

Sure, I understand, but we have already reSt markup in the current i18n
infrastructure. We should tend for less markup when possible, but
anyway, I'd rather have a markup-encumbered string than no translatable
content.

> Also, combining multiple code lines into one msgid will only cause
> unhappiness.

Maybe, maybe not. As translator, I'm happy when I have content to
translate!

> I like the idea of automatically extracting comments only — I'd be
> happy to review a patch! — but consider it non-trivial. First off,
> we'd probably want to care for multiple languages (ie., commenting
> styles.) Also, code comments cannot be easily wrapped so that lines
> might be unreasonably inflated/deflated in translations.
>
> We could theoretically re-use Pygments' tokenization, but that's
> already imperfect in cases like Python's multi line comments. A quick
> test revealed it also does not separate comment markers from the
> comment string, so that we *would* have to special-case every language
> (or, do some black magic to extract the comment characters, think /*
> comment */, from Pygments' lexer description.) On the upside, it
> distinguishes between docstrings and normal strings. On the downside,
> docstrings are extra hard anyways because they *can* contain
> structured markup.

I also briefly explored the automatic extracting of comments, and indeed
there are many technical challenges to solve.
That's why my proposal was less ambitious, at the expense of being a
little less translator-friendly in the translatable strings.

Thanks for you input!

Claude
Reply all
Reply to author
Forward
0 new messages