Mechanism for extracting translated text from docutils/nodes

Hoang Tran

unread,

Nov 23, 2019, 8:06:13 PM11/23/19

to sphinx-users

Hi,

This is my first time posting on this forum, so please be gentle with me. Currently I'm working as a translator for Blender project, translating reference manual to Vietnamese. I have completed the UI and done some chapters on the translation set, which can be used as a database for untranslated text. I'm only interested to output to PO file. My current solution is to

1. Write an extension, and making use of 'app.connect('doctree-resolved', doctree_resolved)'.

2. Within 'doctree_resolved', I use a for loop with extract_messages(doctree) to traverse and extract all messages to be translated.

3. I use 'node.walk' with a 'visitor' (extends nodes.TreeCopyVisitor) to traverse all children nodes of the current node.

4. In the visitor, I use the 'default_visit' to traverse the children (recursively) to the end instance of nodes.Text where I use 'astext()' to extract the English text, then use a lookup routine to find appropriate 'translation' for the text.

5. Due to the fact that some texts are REQUIRED to have the original English appending to it (ending), a methodology tor leave crumbs for Vietnamese readers, who would like to reference back to English original HTML texts. These items are often from 'inline', 'title', 'rubric' etc.. so I use a flag to identify when translation texts (if exists) will include the original in it (and called this combination as translation text).

6. I use Message and Catalog to store English (msgid) and translation (msgstr) and write them out to a separate directory before merging them with existing translations by diff.

My questions:

1. How should I store translation text? Currently I'm forced to insert a variable in the 'docutils.Node' to store the translation (for testing).

2. How can I use the same 'extract_messages(doctree)' mechanism to extract English and Translation at the same time? Currently everything is tied to the overloaded 'astext()' method and that seemed to go to __repr__(self), how would I approach to solve this problem?

Best regards,

Hoang Tran

All the code is attached.

TranslatePO.py

Hoang Tran

unread,

Nov 25, 2019, 5:17:56 AM11/25/19

to sphinx-users

I found a rather 'clunky' way for solving this, that is to make use of 'rawsource', which containes gave accents and quotes, plus other symbols, and translate each part of the message separately, ie. paragraph, inline/literal, text etc.. separately, then store them in a list within the visitor. Below the call 'node.walk(visitor)', combine them all together to make a translation string. Workable but not very good. Any better ideas?

oaeorao12

unread,

Nov 25, 2019, 5:26:46 AM11/25/19

to sphinx-users, เบอร์กุ

ส่งจากสมาร์ทโฟน Samsung Galaxy ของฉัน

-------- ข้อความดั้งเดิม --------

จาก: 'Hoang Tran' via sphinx-users <sphinx...@googlegroups.com>

วันที่: 25/11/19 17:17 (GMT+07:00)

ถึง: sphinx-users <sphinx...@googlegroups.com>

เรื่อง: [sphinx-users] Re: Mechanism for extracting translated text from docutils/nodes

I found a rather 'clunky' way for solving this, that is to make use of 'rawsource', which containes gave accents and quotes, plus other symbols, and translate each part of the message separately, ie. paragraph, inline/literal, text etc.. separately, then store them in a list within the visitor. Below the call 'node.walk(visitor)', combine them all together to make a translation string. Workable but not very good. Any better ideas?

--
You received this message because you are subscribed to the Google Groups "sphinx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sphinx-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sphinx-users/d4bdc633-8c5e-46f4-8fe6-aa5afb039f4e%40googlegroups.com.

Reply all

Reply to author

Forward