2010/7/23 CG <christop...@gmail.com>:
> The problem, I am encountering at this stage is that the grammar
> processor is detecting recursions in the grammar (the first on is on
> RelativeLocationPath).
>
> Is there a way to handle this ? I think the grammar is correctly
> transcripted, but I am totally new to PEG in general, and might miss
> something trivial.
the parser generated by PEG.js has the semantics of a recursive
descent parser. This means that if you used left recursive rules such
as "A = 'a' / A 'a'" in the grammar, the parser would end up in an
infinite cycle -- it would try to match rule A, but for doing that, it
would need to match rule A again, etc. The recursion can span multiple
rules, in which case it is called indirect left recursion. You can
read more about these things on Wikipedia [1] [2] [3].
Because a parser built from a left recursive grammar can end up in an
infinite cycle, PEG.js checks the grammar for both direct and indirect
recursion and returns an error if it finds such cases.
It is possible to mechanically convert any left-recursive grammar to a
right-recursive one. The theoretical algorithm is little bit tricky,
but often it is not needed in its full power -- in most cases, the
left recursion is used to encode arithmetic expressions or lists in
the grammar, and both can be usually rewritten easily. Moreover, with
PEG repetition operators, you can often avoid recursion at all.
For example, consider the RelativeLocationPath rule you mentioned:
RelativeLocationPath
= Step
/ RelativeLocationPath '/' Step
/ AbbreviatedRelativeLocationPath
Together with
AbbreviatedRelativeLocationPath
= RelativeLocationPath '//' Step
it basically says that RelativeLocationPath is a list of Steps
separated by "/" or "//". Once you have deciphered this meaning, the
rule can be easily rewritten into the following one:
RelativeLocationPath
= Step (("/" / "//") Step)*
Simple, isn't it?
Most grammars in various specifications are written for LR parsers and
use left recursion extensively, so certain amount of rewriting is
usually needed before using such a grammar in PEG based tools. To see
more examples of these rewrites, look into the example JavaScript
grammar in PEG.js [4] and compare it with the original grammar in the
ECMAScript specification [5].
David
[1] http://en.wikipedia.org/wiki/Recursive_descent_parser
[2] http://en.wikipedia.org/wiki/Left_recursion
[3] http://en.wikipedia.org/wiki/Parsing_expression_grammar
[4] http://github.com/dmajda/pegjs/blob/master/examples/javascript.pegjs
[5] http://www.ecma-international.org/publications/standards/Ecma-262.htm
--
Everyone gets everything he wants. --Captain Willard in Apocalypse Now
Personal :: da...@majda.cz :: www.majda.cz
Work :: david...@impaladesign.cz :: www.impaladesign.cz
sorry for a late answer.
2010/7/25 CG <christop...@gmail.com>:
> Just one question re performance of peg.js parser. Would would think/
> guess that parsing about one hundred of expressions like with the
> below grammar:
>
> instance('data')/main/indicator/plans/parameters/useCategory
>
> could be realistically processed at run-time (e.g. below one second) ?
you are talking about parsing speed around 5-10 kB/s. With most
grammars, PEG.js should be able to achieve that speed without
problems.
You can run the benchmark included in PEG.js [1] on your machine to
get an idea how fast PEG.js is. On my machine, it runs at ~ 30 kB/s
(but at ~ 100 kB/s for parsing CSS only). This is all on V8.
I want to focus more on performance in PEG.js 0.6, so stay tuned.
David Majda
[1] http://github.com/dmajda/pegjs/tree/master/benchmark/
--
Everyone gets everything he wants. --Captain Willard in Apocalypse Now
Personal :: da...@majda.cz :: www.majda.cz
Work :: david...@impaladesign.cz :: www.impaladesign.cz