Hello,
I'm one of the authors of the po4a project (
https://po4a.org), that
helps translating the documentation.
The idea is to extract the translatable content of the documents into
PO files that are comonly used by the translators of open-source
programs, get the translators do their job, and then reinject the
translated content in the structure of the original document.
We have parsers for many formats, such as POD, manpages, asciidoc,
markdown, xml, and some others. The project exists since almost 2
decades and we are now used in production for the translation of many
manpages in all major distributions, for the translation of the
manpages documenting the git project, for the f-droid web pages, for
the whole fedora documentation, etc.
My problem is that our parsers are currently written as a ugly bunch
of regexps that are hard to work with, and I am considering converting
to something more robust.
Our parsers don't really need to access the AST, but they are more of
a filter calling the translate() function on the parts that need.
Every parser takes a document to analyse + a translation catalog
(called PO file) associating a set of strings to their transation in a
given language.
This produce an output document where the content of the input doc was
replaced by the translations found in the catalog + a list of strings
that the input doc contains. This list is used to update the
translation catalogs when the input document changes.
Input document --\ /---> Output document
\ TransTractor:: / (translated)
+-->-- parse() --------+
/ \
Input PO --------/ \---> Output PO
(extracted)
Let's take a little Markdown example:
| A nice title
| ============
|
| The first paragraph.
|
| * Item 1
| * Item 2
I need the following calls to be issued during the parsing:
| pushline( translate ("A nice title", "input:1") );
| pushline("============");
| pushline("");
| pushline( translate("The first paragraph.", "input:4") );
| pushline("");
| pushline(" * " . translate("Item 1", "input:6") );
| pushline(" * " . translate("Item 2", "input:7") );
| pushline("");
All the po4a magic lays into the translate() function, that add its
parameters to the output PO file while returning the translation found
in the input PO file for that string (or the string itself if no
translation was found). The second parameter of translate is the
location in the input file.
So, after this long context, I guess that my question would simply be:
how would you address this problem with Marpa?
I found [1], that provide a Marpa parser for the Markdown format.
First subquestion: is this parser use the latest recommendations to
Marpa (right input language and such) as I think?
[1]
https://github.com/rns/MarpaX-Languages-CommonMark-AST/
In some sense I feel that this example is too complex for what I need
because it seems difficult to dump a Markdown file from the AST. Am I
wrong here? If I'm correct so far, what would be the easiest to dump
the parser file with no modification, eg using actions? Or maybe I'm
misleaded and Marpa is not exactly the tool I'm looking for?
I have the feeling that what I need is very simple, but I fail to nail
it done, so I'd really appreciate any idea or insight that you could
provide.
Thanks in advance,
Mt.
--
Fear is no philosophy of life. -- Kurt Von Hammerstein.