With great interest I have read the paper on the Marpa parser. This summer I have created my own Earley parser, a probabilistic one, and I am looking to implement some of Marpa's innovations in my own.Right now I am working on error handling. One person suggested I implement synchronizing rules like in shift-reduce parsers. I figured panic mode should behave a bit differently for probabilistic ambiguous grammars, so I settled on the following (text taken from issue on Github):Consider the following grammar:
S -> AB C; (0.5) S -> A B ; (0.5) A -> a a ; (0.5) B -> b b ; (0.5) C -> c c ; (0.5) AB -> a a ; b b ; (0.5) A -> a <error> ; (0.5) B -> b <error> ; (0.5) AB -> a <error> ; (0.5)
On the input
a a ; b z ; c c ;
with goalS
. When encounteringz
, the parser could apply bothB -> b <error> ;
andAB -> a <error> ;
. We would like to considerAB
because that's the only way to makeS
. So we need to consider both error rules.Now consider the same grammar, except the first rule is
S -> AB;
on the same input. We have established that we want to consider both error rules, and in this case both make a valid parse. However, we want the probability ofAB -> a <error> ;
being applied to be lower than that ofB -> b <error> ;
, because we want to keep the number of ignored tokens minimal.That's why I have decided to make the error rule probability multiply every time a token is ignored. The inner score of applying
B -> <error> ;
is then be 0.5^2, and the inner score ofAB -> <error> ;
is 0.5^4.
I suspect that you have more sophisticated knowledge than I do about Earley parsing, so I seek out your opinion: do you think this is a good idea? It also makes me a bit uncomfortable that above grammar fails terminally on a a ; b z ; c c ; cc ; , because the parser gets out of panic mode after encountering a semicolon. But maybe that should be a problem for another error handling strategy.Which bring me to the Ruby slippers error handling. I don't know if I have a good idea of how this works. The way I interpret is as follows: the parser might stall on input which prohibits it from succeeding (no states follow from predict, no states follow from scan or there are no states to complete). Because the chart will always contain some active states, you know what the parser expects to succeed and you could let the user decide whether to manipulate the input stream so that the parser may proceed given this information (eg, drop token, insert a token, etc). Is that correct?Thanks for your help!Maarten Trompper
--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.