I have one question regarding the difference between those two:
I can use recursive predictive parsing, which is very straightforward.
So what's the advantage of non-recursive predictive parsing. To
perform non-recursive parsing, I need to construct FIRST, FOLLOW sets
and use explicit stack. On the other hand, recursive predictive
parsing is very easy to understand. I understand non-recursive calls
have a better performance than recursive one. Is this the only reason?
Thanks,
Michael
more parsing power.
> perform non-recursive parsing, I need to construct FIRST, FOLLOW sets
> and use explicit stack. On the other hand, recursive predictive
> parsing is very easy to understand. I understand non-recursive calls
Parsing algorithms are not constructed with ease of use in mind.
> have a better performance than recursive one. Is this the only reason?
>
Recursive (descent) parsing is leftmost. You have the follow set in
non-recursive parsing to figure out what *can* follow, and so opt to
reduce by a different rule than the leftmost reduction. It helps in
overcoming ambiguities that RDP have.
regards
-kamal
> I can use recursive predictive parsing, which is very straightforward.
> So what's the advantage of non-recursive predictive parsing. To
> perform non-recursive parsing, I need to construct FIRST, FOLLOW sets
> and use explicit stack. On the other hand, recursive predictive
> parsing is very easy to understand. I understand non-recursive calls
> have a better performance than recursive one. Is this the only reason?
In principle, you also need to construct FIRST and FOLLOW sets to use
deterministic recursive predictive parsing (aka recursive descent).
If there are no empty productions, you don't need FOLLOW and if all
productions for the same nonterminal start with different terminal
symbols, FIRST is trivial, so in these cases you can write recursive
descent parsers very easily. But in the very same cases, construction
of LL(1) parse tables is also very easy. For more complicated cases,
you do need to calculate FIRST and FOLLOW even for recursive descent
-- at least if you want to avoid backtracking.
Table-driven LL(1) is no faster than recursive descent, and it can be
slower in some cases. But tables can be made fairly compact, so
table-driven parsers can be smaller. Also, tables are slightly easier
to generate than programs.
Torben
The only reasons that I can think of to distinguish between
recursive-descent predictive parsers and explicit-stack predictive
parsers are esthetics and efficiency. Efficiency includes efficiency
of parser construction, whether manual or using a parser generator, as
well as of parsing.
Your comments about FIRST and FOLLOW sets are misleading. You need to
find the FIRST sets in order to make either kind of predictive parser,
even recursive descent. And you don't absolutely need to find the
FOLLOW sets to make either kind of predictive parser, even the
explicit-stack kind. However, if you don't know the FOLLOW sets, you
won't be able to check at parser construction time that your grammar
is really ammenable to predictive parsing, and you won't be able to
produce as high quality of syntax error messages at parse time. Those
motivations for finding the FOLLOW sets apply equally well to both
kinds of predictive parsers.
Consider what happens when your input does not conform to the expected
structure. If you are going to recover from the error in the source
and continue processing you may need a way to unwind the stack under
your control. Recursive is fine if you have a way to recover from
source errors.
(1) Yes, recursive descent parsing is leftmost, but that word
"leftmost" means that the leftmost nonterminal is expanded out at each
step, rather than anything about a "leftmost reduction" or, as your
phrase "reduce by a different rule" suggests you might have meant,
leftmost production. There is no reduction done at all in a recursive
descent parser, and the choice of productions is not influenced by
their order.
(2) The non-recursive parsing that was being discussed in the original
post was non-recursive *predictive* parsing, i.e., LL(1) top-down
parsing. As such, it has no reductions either and follows the same
sequence of parsing actions. It too is constructing a leftmost
derivation.
(3) Recursive descent parsers do not have ambiguities, because
ambiguity is a property of a grammar or a language, not a parser.
-Max Hailperin
Professor of Computer Science
Gustavus Adolphus College
The common predictive algorithms are LL and LR. You can draw a table
with LL and LR on one axis, Recursive and Explicit Stack on the other.
The two squares in the LL column are discussed in most textbooks, and
also the LR-Explicit Stack box. The last box LR- Recursive is not
usually covered. But there is a less-known, elegant recursive
formulation for LR parsers as well. Google "recursive LR parser" for
a few papers.
Recursive formulations tend to be more readable to humans. Tables and
explicit stacks tend to be more ammenable to automatic generation from
a grammar. An explicit stack might make debugging, animation and
error recovery codes a little simpler and more efficient. That's
about it. Performance differences are marginal: unlikely to matter.
Thanks, guys. I appreciate it.
I think Max was right. According to the red dragon book p.181,
predictive parsing is the special case of recursive descent parsing.
For predictive parsing, it doesn't need backtracking. As I
understanding, predictive parsing can parse LL(0), and recursive
decent can parse LL(1). Also ANTLR can parse LL(*) which uses
predicates. Please correct me if I am wrong.
Michael
Is this what is refered to as Recursive Ascent ?
Aaron
>> there is a less-known, elegant recursive formulation for LR parsers
>> as well. Google "recursive LR parser" for a few papers.
>
> Is this what is refered to as Recursive Ascent ?
It is indeed.
Torben
yes -I meant reduce by a different production.
> descent parser, and the choice of productions is not influenced by
> their order.
>
>
> (2) The non-recursive parsing that was being discussed in the original
> post was non-recursive *predictive* parsing, i.e., LL(1) top-down
> parsing. As such, it has no reductions either and follows the same
> sequence of parsing actions. It too is constructing a leftmost
> derivation.
>
yes -I mis-interpreted it to mean RDP vs L[A]LR(1). I think the answer
has been given by others to mean, you can do error recovery with table
generated parsers.
Will RDP shift in case of shift/reduce conflict the way yacc does?
> (3) Recursive descent parsers do not have ambiguities, because
> ambiguity is a property of a grammar or a language, not a parser.
So, whats the advantage of opting for a rightmost derivation? I think
it helps when you have left-recrusive rules like
E->ET
where E and T are non-terminals.
thanks
-kamal
> Will RDP shift in case of shift/reduce conflict the way yacc does?
A recursive descent parser doesn't ever shift. A recursive descent
parser is a top-down parser, which I have previously termed a
"confirm/expand" parser by analogy with the conventional description
of bottom-up parsers as "shift/reduce" parsers. See
http://compilers.iecc.com/comparch/article/06-04-136
> So, whats the advantage of opting for a rightmost derivation? I think
> it helps when you have left-recrusive rules like
> E->ET
>
> where E and T are non-terminals.
Yes, one advantage is that LR parsers can handle left recursion.
However, that isn't the only advantage. The fundamental advantage is
that the reduce action in a shift/reduce parser takes place once the
parser has already read in the production's entire yield plus the
lookahead beyond it (one more terminal, most commonly), whereas the
expand action in a confirm/expand parser has to take place much
earlier, when only first little bit of the yield has been seen as the
lookahead (again, typically just one terminal). Consider for example
the following situation:
A -> B c
A -> D e
In an LR parser, there will never be a reduce-reduce conflict here,
because by the time the reduction to A is happening, the input has
been read in all the way through the c or e, and hence depending on
which of them was read, it is clear whether to reduce by the first
production or the second.
In fact, given that there is some lookahead, the LR parser can do even
more. Suppose we complete the grammar with
B -> f f f f f
D -> f f f f f
The LR parser will know whether to reduce the 5 repetitions of the
terminal f to B or to D, because it will have seen not only all 5 of
them, but also a lookahead symbol. If that lookahead symbol is c,
then the 5 fs are reduced to B. (And then after shifting the c, B c
is reduced to A.) If the lookahead symbol is e, then the 5 fs are
reduced instead to D. (And then after shifting the e, D e is reduced
to A.)
Contrast this to the situation of an LL(1) parser (including a
predictive recursive descent parser). Right away when it sees the
first f, it needs to choose between the two productions for A, which
it can't possibly do.
This particular grammar is LR(1) but not LL(1), LL(2), LL(3), LL(4),
or LL(5). It is LL(6), because 6 symbols of lookahead would be enough
to get past the 5 fs and to the c or e. But consider replacing the
definitions of B and D with right-recursive ones that can yield
arbitrarily many fs. Then you've got a grammar that isn't LL(k) for
any k, yet is LR(1). And it still doesn't involve left-recursion.
Conversely, any grammar that is LR(k) is LL(k). So LR parsing is
strictly more powerful, and this is true even if one rules out
left-recursion.
> Consider for example the following situation:
>
> A -> B c
> A -> D e
[...]
> In fact, given that there is some lookahead, the LR parser can do even
> more. Suppose we complete the grammar with
>
> B -> f f f f f
> D -> f f f f f
IMO this is an example of a bad language design, which I encounterd
already in many languages. The fact, that a parser generator can handle
such a grammar, is not an excuse for a thoughtless language design. No
offense on you, your example is a very practical one <sigh>.
When we have an look at any application, using the above grammar for
e.g. serializing data, a possibly huge amount of input must be kept in
the stack, before any action can be taken. And more memory is wasted for
the automaton tables. When the language, and consequently also the
grammar is converted into LL(1), by swapping the terms in A, no special
amount of time or memory has to be wasted in the analysis of the input.
DoDi
Hi Torben,
Two things :-
1) How does it work as I cannot remember, it was used in Philip's Elegant
Compiler compiler infrastructure, which I looked at some eight years ago.
Would you please give a description of how it works for those who don't
know.
2) Is it limited to LR(1) lookahead wise ?
Aaron
>> A -> B c
>> A -> D e
>> B -> f f f f f
>> D -> f f f f f
> ... When the language, and consequently also the
> grammar is converted into LL(1), by swapping the terms in A ...
Actually the language already was LL(1), though the grammar wasn't.c
> "Torben "Fgidius" Mogensen" <tor...@pc-003.diku.dk> wrote
>> "Aaron Gray" <ang.u...@gmail.com> writes:
>>>> there is a less-known, elegant recursive formulation for LR parsers
>>>> as well. Google "recursive LR parser" for a few papers.
>>>
>>> Is this what is refered to as Recursive Ascent ?
>>
>> It is indeed.
> 1) How does it work as I cannot remember, it was used in Philip's Elegant
> Compiler compiler infrastructure, which I looked at some eight years ago.
> Would you please give a description of how it works for those who don't
> know.
Basically, you use the call stack as a replacement of the explicit
stack. A shift is a call and a reduce is a return (through several
stack levels).
A reasonably good account can be found in "The Essense of LR Parsing"
by Sperber and Thiemann (PEPM'95, ACM Press). It includes several
different implementations in Scheme and a description of how you can
use partial evaluation to generate specialised parsers from the
general parser.
> 2) Is it limited to LR(1) lookahead wise ?
No.
Torben
Thanks.
>> 2) Is it limited to LR(1) lookahead wise ?
>
> No.
This is very interesting. So you could implement lookahead before
doing the shift as a "guard". So you could get a very fast LR(k)
parser using this technique, although there is the problem of error
recovery which a no recursive table driven parser does much easier.
Interesting diversion :)
Aaron