Leo reduction for a cyclic grammar

34 views
Skip to first unread message

Timothy Allen

unread,
Jan 1, 2019, 3:12:51 PM1/1/19
to marpa-...@googlegroups.com
Thanks, Jeffrey, for answering my question about empty rules. I haven't decided if I'll eliminate them the same way Marpa does, but now I feel much more confident that I can do it if necessary.

Right now, I'm studying Leo reduction, via Loup Vaillant's tutorial and also a PEG-and-Earley tutorial somebody pointed me to. I haven't gotten to actual parsing yet, I'm still trying to nail down reduction, but I'm sure I'll get there eventually.

Consider the following grammar, whose starting symbol is <A>:

<A> ::= "x" | <B>
<B> ::= <A>

When fed the input "x", an Earley recogniser without right-reduction would wind up with these items in the final set:

<A> ::= "x" · (0)
<A> ::= <B> · (0)
<B> ::= <A> · (0)

...and sure enough, if I don't attempt Leo reduction, that's what I get.

If I do attempt Leo reduction, I have a problem. I'm looking for the "topmost" reduction, but because the grammar has a cycle, there's no actual top. This cycle isn't fatal; since I'm memoising this reduction, I can drop a sentinel in the cache before I recurse, and if I find a sentinel I know I'm done. However, this means that the "topmost" item will be determined by the order in which I iterate over Earley items to perform completion steps; whichever one I grab first will become the topmost.

If I happen to pick <A> ::= <B> · (0) as my topmost, it will survive into the final Earley set, and I will have two items for the start symbol spanning the entire input, and therefore I can build two parse trees: one for 0 iterations of the cycle, and one for 1 iteration. That seems tidy.

On the other hand, if I happen to pick <B> ::= <A> · (0) as my topmost, the final Earley set will only have one item for the start symbol spanning the entire input, and therefore I can build only one parse tree: the one for 0 iterations of the cycle.

I'd rather not have a recogniser that behaves inconsistently; Earley parsing is supposed to be the "Just Works™" algorithm, after all. I can think of a few alternatives:

  • I could declare that cyclic grammars are Bad, and automatically reject them. Seems a bit heavy-handed.
  • I could declare that cyclic grammars are Unsupported, and allow them but make no guarantees about finding all the parse trees. I notice that libmarpa detects cyclic grammars and warns about them, but I can't find any documentation saying cyclic grammars are any less reliable than proper ones, just that they're less efficient.
  • If there were some principled way to choose a "topmost" item, I could ensure I always picked the "right" one, and then I would be able to make guarantees like "only the simplest parse tree is found" or "only the simplest parse tree plus one iteration of the cycle".

What should I do?

Jeffrey Kegler

unread,
Jan 1, 2019, 3:26:45 PM1/1/19
to Marpa Parser Mailing LIst
If I had libmarpa to do over again (and actually I probably will revise it at some point), I would not support cyclic grammars.  It involved a lot of very very tricky code for a feature nobody uses.

libmarpa, in effect, truncates cycles at runtime -- if the grammar would cycle, Marpa throws that parse away, so that it is only producing non-cyclic parses from cyclic grammars.  This could be duplicated via a complex rewrite, rather than done at runtime.  But runtime or grammar rewrite time, there is no demand for this feature.  Everybody seems very happy to correct their grammar to eliminate the cycle.

Hope this helps, jeffrey

--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages