Using Marpa in a project helping to translate the documentation

32 views

Skip to first unread message

Martin Quinson

unread,

Apr 22, 2020, 8:38:27 PM4/22/20

to marpa-...@googlegroups.com

Hello,

I'm one of the authors of the po4a project (https://po4a.org), that
helps translating the documentation.

The idea is to extract the translatable content of the documents into
PO files that are comonly used by the translators of open-source
programs, get the translators do their job, and then reinject the
translated content in the structure of the original document.

We have parsers for many formats, such as POD, manpages, asciidoc,
markdown, xml, and some others. The project exists since almost 2
decades and we are now used in production for the translation of many
manpages in all major distributions, for the translation of the
manpages documenting the git project, for the f-droid web pages, for
the whole fedora documentation, etc.

My problem is that our parsers are currently written as a ugly bunch
of regexps that are hard to work with, and I am considering converting
to something more robust.

Our parsers don't really need to access the AST, but they are more of
a filter calling the translate() function on the parts that need.

Every parser takes a document to analyse + a translation catalog
(called PO file) associating a set of strings to their transation in a
given language.

This produce an output document where the content of the input doc was
replaced by the translations found in the catalog + a list of strings
that the input doc contains. This list is used to update the
translation catalogs when the input document changes.

Input document --\ /---> Output document
\ TransTractor:: / (translated)
+-->-- parse() --------+
/ \
Input PO --------/ \---> Output PO
(extracted)

Let's take a little Markdown example:
| A nice title
| ============
|
| The first paragraph.
|
| * Item 1
| * Item 2

I need the following calls to be issued during the parsing:
| pushline( translate ("A nice title", "input:1") );
| pushline("============");
| pushline("");
| pushline( translate("The first paragraph.", "input:4") );
| pushline("");
| pushline(" * " . translate("Item 1", "input:6") );
| pushline(" * " . translate("Item 2", "input:7") );
| pushline("");

All the po4a magic lays into the translate() function, that add its
parameters to the output PO file while returning the translation found
in the input PO file for that string (or the string itself if no
translation was found). The second parameter of translate is the
location in the input file.

So, after this long context, I guess that my question would simply be:
how would you address this problem with Marpa?

I found [1], that provide a Marpa parser for the Markdown format.
First subquestion: is this parser use the latest recommendations to
Marpa (right input language and such) as I think?

[1] https://github.com/rns/MarpaX-Languages-CommonMark-AST/

In some sense I feel that this example is too complex for what I need
because it seems difficult to dump a Markdown file from the AST. Am I
wrong here? If I'm correct so far, what would be the easiest to dump
the parser file with no modification, eg using actions? Or maybe I'm
misleaded and Marpa is not exactly the tool I'm looking for?

I have the feeling that what I need is very simple, but I fail to nail
it done, so I'd really appreciate any idea or insight that you could
provide.

Thanks in advance,
Mt.

--
Fear is no philosophy of life. -- Kurt Von Hammerstein.

signature.asc

Jeffrey Kegler

unread,

Apr 23, 2020, 8:49:48 AM4/23/20

to Marpa Parser Mailing LIst

I would certainly address the problem using Marpa, but I'm kinda biased. :-)

From what I read, you do *not* need to capture any nesting of blocks. RNS's parser doesn't do that so it might make a fine start. (I've never used it.)

You don't need to produce a full AST, it seems, so the following is not directly relevant to your question, but let me talk about how I might go about how I might go about parsing Markdown for display purposes, for which case the block nesting is essential, I would probably try a two-layer approach -- one level of parser to capture the line-by-line directives and individual pieces, and an upper level which captures the block structure. The upper level would certainly be in Marpa, and probably the lower level as well.

I hope this is helpful.

--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/marpa-parser/20200423003604.GS8215%40cafuron.

Reply all

Reply to author

Forward

0 new messages