Re: Parsing as taking the derivative of a grammar

37 views
Skip to first unread message

Anton Dyudin

unread,
Jan 12, 2016, 5:26:46 PM1/12/16
to marpa-...@googlegroups.com
I was referring to the constant factor; O(n|G|) is claimed by the paper, via "our grammars blew up exponentially so we had to compact them continuously, now they seem not to though a pathological case can probably be constructed". Which isn't the same as "complexity class has been proven", sure.

But my main goal here is to understand marpa, and in this case specifically where the intuition that
  start: (alt (cat 'a' 'b') (cat 'a' <start> 'b')))
  = D_a => (alt 'b' (cat <start> 'b'))
  = D_a =>   (alt null (cat (alt 'b' (cat <start> 'b')) 'b'))
            => (cat (alt 'b' (cat <start> 'b')) 'b')
  = D_b => (alt 'b' (cat null 'b')) => 'b'
is equivalent to
  0.   {[start -> a b]:0 [start ->•a start b]:0}
  1.a {[start -> a•b]:0 [start -> a•start b]:0 [start -> •a b]:1 [start -> •a start b]:1}
  2.a {[start -> a•b]:1 [start -> a•start b]:1 [start -> •a b]:2 [start -> •a start b]:2}
  3.b {[start -> a start•b]:0}
modulo tree/by-reference structure, is off base.

On Tuesday, January 12, 2016, Jeffrey Kegler <jeffre...@jeffreykegler.com> wrote:
I didn't study Might-Darais closely -- Russ Cox did, and I'm relying on his observations.

You can (and I do) think of packratting and backtacking as the same approach, it's just deciding whether to take the efficiency hit in space or in time.

My idea is that, if you want a strong parserr, you should use a parse engine designed (like Earley's) to be strong from the ground up.  Almost everybody else starts with a more or less weak parser, and hacks on the power, knowing that's worst-case inefficient, but hoping it will work out in enough cases of interest.  Since Earley's is O(n) for every deterministic CFG, I don't see why folks insist on going for the weaker algorithms.  But my approach is definitely the minority one so far.

Re "2 seconds to parse a 31-line Python file" -- I really don't pay much attention to specific speed claims in theoretical papers -- as opposed to big-O worse-case results.  With Marpa, you don't have to hope your grammar is O(n) -- you can know for sure.  Resorting to claims for specific grammars on specific processors to my mind shows the approach has run out of steam.

Note that I *do* do benchmarks to compare implementations in my blog, but I don't report those in theoretical contexts.  Readers of theory papers should care how fast your algorithm is, not how fast your current processor is.

On Tue, Jan 12, 2016 at 9:53 AM, Anton Dyudin <antec...@gmail.com> wrote:
Would I be correct to infer that memoization is a form of backtracking, and packratting is what the "fixpoint" computation is doing? And is the issue with those being expensive on real-world size grammars/input? 2 seconds to parse a 31-line Python file seems a bit steep, though not exponential.

--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeffrey Kegler

unread,
Jan 12, 2016, 7:28:23 PM1/12/16
to Marpa Parser Mailing LIst
This takes me deeper into derivatives than I've every gone, and probably beyond where I can help you.  I skimmed Might-Darais at the time, and then read Russ Cox's appreciation carefully.  Russ seemed to say everything that needed to be said [ except he didn't mention Marpa :-) ], and I moved on.  I've never studied the derivatives method in detail.

Many data structures mix rule, location of rule & location within rule in a data structure, and they can seem all to look the same.  The 2nd one you list seemed to be Earley tables. I'm not too clear on what the 1st is (I'm not good at reading LISP), but if I had to guess it is a state transition table, rather than a parse table. If so, they're quite different things.

Anton Dyudin

unread,
Jan 12, 2016, 9:08:58 PM1/12/16
to marpa-...@googlegroups.com
My intention was to compare stages of a "grammar" as defined in the derivatives paper with a sequence of Earley sets similar in syntax to the ones in yours, yes. A more mathy way to present the former sequence would perhaps be:
  &start = (A + B) U (A + &start + B)
Which then gets transformed through
  -A : B U (&start + B)
  -A : 0 U ((B U (&start + B) + B)
        => (B U (&start + B)) + B
  -B : (ε U (0 + B)) + B    -B. &start = (0 + B) U (0 + &start + B) => 0
        => B

(I gloss over some invariants not present in the original paper, having to do with the derivation step mapping from {sym, &ref, U, +} to one that additionally contains 0 and ε, and the compaction step removing those back again; my question is of high-level correspondence with marpa, which itself does not admit nullable or void rules)

The fundamental projection that seems present to me is that a differentiation grammar formally keeps only the remaining input, but all of it in the same data structure; compare
  0. start: A + B ; A + &start + B
  1. -A : B ; &start + B # start_1(g) := (g + B)
  2. -A : start_1(B ; &start + B) # "predict"
  3. -B : start_1(ε ; 0) => ε + B => B # "reduce"
to the marpa
  0. start: •A + B; •A + &start + B
  1. -A :  (A+)•B ; (A+)•&start + B
           ; start_1:•A + B; •A + &start + B
  2. -A : start_1: (A+)•B; (A+)•&start + B
      ; start_2: (etc. unused)
  3. -B : start_1: (A+B)• => start(_0) : (A+&start+)•B

The main difference seems to be that the explicit tree sharing leaves the rule names around to inspect, while Might-Darais has to stick them in separate tagged εs.

NB. Regardless of the existence or fictouisness of such an equivalence, writing these emails has helped my understanding of Earley tremendously, to the point where I would almost be comfortable trying to write an implementation; or possibly taking a deeper look at libmarpa, whose interface docs I had skimmed and bounced off of. Thank you for being around to answer questions :3
Reply all
Reply to author
Forward
0 new messages