Earley complexity over grammar size

40 views
Skip to first unread message

emh

unread,
Nov 1, 2018, 8:14:01 PM11/1/18
to marpa parser
In general, it seems that Earley, especially with Marpa enhancements, matches or improves on most any parsing algorithm. But its complexity seems to be a judged with respect to the length of the string, not the size of the grammar.

I’m not great at complexity proofs, and I was wondering if Earley retains its performance advantages for very large numbers of rules, over CYK. If I understand correctly, it has to add Earley items for every rule, top down, and examine them. So for an NLP project I am doing, with many thousands of necessarily distinct rules, would its space/time complexity suffer greatly?

For example, if every rule started with a distinct terminal, would Earley have to add and scan every one in S(0), while CYK would in constant time (and less space) select just the relevant rule(s)? As in:

start: r1 | r2 | r3 | … | r_i

r1: "Print" …
r2: "Set" …
r3: "Walk" …
….
r_i: T_i …

Then in S(0) we have to add and scan rules r1 … r_i, while in CYK it would immediately be able to select a very small subset of r1 … r_i (in CNF), without examining all other rules.

Thanks for your help.

Jeffrey Kegler

unread,
Nov 1, 2018, 8:35:55 PM11/1/18
to Marpa Parser Mailing LIst
Forgive me if I'm overexplaining -- you obviously know a lot of these matters, but any over-explanations will help other readers, so I hope you'll be patient with them.

Quibble 1: "Complexity" since the 60s has usually referred to results in terms of Landau (big-O) notation.  These are only relevant for infinite sets, which do not of course occur in practice.  Nonetheless Big-O has proved so helpful in practice that its limits are sometimes forgotten.  The grammar you cite is constant-sized, so in that sense there is no room for a complexity advantage either way.

Quibble 2: CYK is always cubic (O(n**3)), so that it can have a complexity advantage only for inputs which Earley/Leo (=Marpa) is cubic, which are rare in practical use.  But these do occur in NLP.  So to keep this a horse race, let's assume we're talking about one of the grammars where Earley/Leo goes cubic.

Foreshadowing: Elsewhere I've listed the 5 algorithms that IMHO might stand the test of time and survive in some form: regular expressions, Earley/Leo, recursive descent, PEG and CYK.  So I believe there are some grammars for which CYK is better than Marpa -- how many and how interesting they are is another question.

OK, now for handicapping the horse race:

Marpa would have to put a prediction for every one of the rules you list into one of its Earley sets, but for predictions this can be done by flipping a bit in a vector -- in a given Earley set, each prediciton depends only on the rule, which is an integer less than a constant.  Last I recall, Marpa does not use such a bit vector, but that's only because it's necessity is not proven -- Marpa seems to be fast enough.  After that 1st Earley set only the rule whose initial terminal was found in the input will generate entries in the Earley sets.

CYK would behave similarly but the selection of the terminal would be out of pairs with every other possible token in the grammar -- if the grammar is truly large, we are going quadratic on a very large number.  And this propagates up the tree.

These are, I believe, valid and important point, but admittedly I have acted as Marpa's lawyer, puffing up my client's case and poking holes in the case for CYK.  For some grammars, I think a case could be made the other way, and the verdict of a candid finder of fact might in fact go to CYK.

I hope this helps, jeffrey
  

--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

emh

unread,
Nov 1, 2018, 8:47:14 PM11/1/18
to marpa parser
Thanks! That helps.
Reply all
Reply to author
Forward
0 new messages