Parsing as taking the derivative of a grammar.

42 views
Skip to first unread message

antec...@gmail.com

unread,
Jan 12, 2016, 12:59:22 AM1/12/16
to marpa parser
Are the ideas behind marpa at all related to the "differentiation" described in http://matt.might.net/articles/parsing-with-derivatives/? My surface understanding of Earley Sets is as something similar, but with match-beginning metadata for ?performance? and more affordance for ambiguity.

Jeffrey Kegler

unread,
Jan 12, 2016, 1:02:55 AM1/12/16
to Marpa Parser Mailing LIst
They are not really similar.  Derivatives are a method for implementing regular expressions.  The Dwight-Marais parser is IIRC essentially regular expressions with backtracking.

On Mon, Jan 11, 2016 at 9:57 PM, <antec...@gmail.com> wrote:
Are the ideas behind marpa at all related to the "differentiation" described in http://matt.might.net/articles/parsing-with-derivatives/? My surface understanding of Earley Sets is as something similar, but with match-beginning metadata for ?performance? and more affordance for ambiguity.

--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anton Dyudin

unread,
Jan 12, 2016, 1:34:04 AM1/12/16
to marpa parser
Regular expressions as in no recursion? The grammar
S = S ( S ) ∪ ε
is mentioned specifically as a context-free example, and how it necessitates laziness + memorization.

Jeffrey Kegler

unread,
Jan 12, 2016, 6:16:25 AM1/12/16
to Marpa Parser Mailing LIst
With backtracking and/or packratting, any parse engine for any subset of context-free can be extended to context-free.

One problem I've had in promoting Marpa, is that veteran watchers of the field have seen "revolutionary new parsers" announced every few years.  These have often been "X with backtracking/packratting", where X could be any parse engine.  The experts learned they could get away with not bothering to investigate claims of new developments.

Anton Dyudin

unread,
Jan 12, 2016, 12:53:39 PM1/12/16
to marpa parser
Would I be correct to infer that memoization is a form of backtracking, and packratting is what the "fixpoint" computation is doing? And is the issue with those being expensive on real-world size grammars/input? 2 seconds to parse a 31-line Python file seems a bit steep, though not exponential.

Jeffrey Kegler

unread,
Jan 12, 2016, 3:34:55 PM1/12/16
to Marpa Parser Mailing LIst
I didn't study Might-Darais closely -- Russ Cox did, and I'm relying on his observations.

You can (and I do) think of packratting and backtacking as the same approach, it's just deciding whether to take the efficiency hit in space or in time.

My idea is that, if you want a strong parserr, you should use a parse engine designed (like Earley's) to be strong from the ground up.  Almost everybody else starts with a more or less weak parser, and hacks on the power, knowing that's worst-case inefficient, but hoping it will work out in enough cases of interest.  Since Earley's is O(n) for every deterministic CFG, I don't see why folks insist on going for the weaker algorithms.  But my approach is definitely the minority one so far.

Re "2 seconds to parse a 31-line Python file" -- I really don't pay much attention to specific speed claims in theoretical papers -- as opposed to big-O worse-case results.  With Marpa, you don't have to hope your grammar is O(n) -- you can know for sure.  Resorting to claims for specific grammars on specific processors to my mind shows the approach has run out of steam.

Note that I *do* do benchmarks to compare implementations in my blog, but I don't report those in theoretical contexts.  Readers of theory papers should care how fast your algorithm is, not how fast your current processor is.

On Tue, Jan 12, 2016 at 9:53 AM, Anton Dyudin <antec...@gmail.com> wrote:
Would I be correct to infer that memoization is a form of backtracking, and packratting is what the "fixpoint" computation is doing? And is the issue with those being expensive on real-world size grammars/input? 2 seconds to parse a 31-line Python file seems a bit steep, though not exponential.
Reply all
Reply to author
Forward
0 new messages