How does Marpa handle empty rules?

Skip to first unread message

Tim Allen

Dec 27, 2018, 2:55:55 AM12/27/18
to marpa parser
This isn't strictly speaking about *using* Marpa, but this seems to be the epicentre of Earley parsing on the Internet, so I hope people don't mind my asking questions here.

I've recently been persuaded by Loup Vaillant's glowing description to learn more about Earley parsing and work through his tutorial. As part of that, I came across an Open Letter to Loup Vaillant, which mentions:

The Marpa algorithm uses a rewrite based on the ideas of Aycock & Horspool, and Marpa allows the user to specify semantics for empty rules and nullable and nulling symbols. The fact that Marpa's parse engine is using a rewritten grammar, free of empty rules, is not visible to the user.

I've had a look at the Aycock & Horspool paper, and although I'm not terribly used to academic papers, I observed the "automatically advance past nullable symbols" solution that Vaillant describes, and a bit about condensing common groups of rules which seems like it isn't needed for correctness, just for efficiency.

I can imagine rewriting a grammar to avoid nullable symbols by duplicating rules. For example if a grammar has a rule:

A := B E C

...where E is nullable, you could mechanically rewrite it to the pair of rules:

A := B C
A := B E C

...which is much like the "automatically advance past nullable symbols" trick, but in the grammar instead of the recogniser. However, that doesn't solve the problem entirely; what happens with a trivial grammar like:

A :=

(that is, the only rule only matches the empty string) I can't see how it's possible to rewrite that to a grammar without nullable rules or symbols.

Jeffrey Kegler

Dec 27, 2018, 10:16:21 AM12/27/18
to Marpa Parser Mailing LIst
For the

A ::= # empty

case, you do the rewrite of the grammar again, eliminating all A's.  (If A is not nulling, but nullable, you create two symbols, and rewrite for all the alternatives.)  This is done recursively.  Null parses (parses of the empty string), and trivial grammars should be special-cased.  (They can be computed in advance, so this is straightforward, if tedious.)

You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
For more options, visit
Reply all
Reply to author
0 new messages