I can see that LR(1) parsers are probably more powerful because to
create the parsing table, a stack is used. For LL(1) you just look at
the next input and do a trial and error approach (of course you can
left-factor, and most certainly you have to remove left-recursions
otherwise you'll go into an infinite loop - all this may result in
less backtracking although it's not guaranteed). Aside from the more
complex data structures and algorithms used by the LR(1), I don't
really see any grammars that a LR(1) parser can parse that a LL(1)
cannot. Can anyone give me a grammar that is in LR(1) but not in
LL(1)?
Another curiosity is the efficiency of LL(1) and LR(1) algorithms. Due
to backtracking, LL(1) may end up being exhaustive in that it may
backtrack to the point that it reaches the last production in which it
will try to apply. This exhaustive search means LL(1) is bounded by an
exponential (am I correct?). However, the LR(1) algorithms for
producing the parsing table also seems to be exponential; when you
find the "closure" and "goto" of LR(1) items, you look at each item
and in turn each of these items may require you to do "closure" and
"goto" again. Because of this fan-out, it seems LR(1) algorithm for
creating parsing table (btw: I'm talking about the algorithm presented
in the new dragon book) is also bounded exponentially (am I
correct?). So it seems LR(1) requires the same computational power as
LL(1)?
P.S. - Where can I find a complete FAQ on basic compiler theory? The
only FAQ I found is this: http://www.faqs.org/faqs/compilers/, and it
does not have any discussion on theory.
Thank you,
Will
> Another curiosity is the efficiency of LL(1) and LR(1) algorithms. Due
> to backtracking, LL(1) may end up being exhaustive in that it may
> backtrack to the point that it reaches the last production in which it
> will try to apply. This exhaustive search means LL(1) is bounded by an
> exponential (am I correct?). However, the LR(1) algorithms for
> producing the parsing table also seems to be exponential; when you
> find the "closure" and "goto" of LR(1) items, you look at each item
> and in turn each of these items may require you to do "closure" and
> "goto" again. Because of this fan-out, it seems LR(1) algorithm for
> creating parsing table (btw: I'm talking about the algorithm presented
> in the new dragon book) is also bounded exponentially (am I
> correct?). So it seems LR(1) requires the same computational power as
> LL(1)?
>
It may in fact be that the LR(1) and LL(1) algorithms are exponential
(I haven't done a full analysis and my hunch says no), but in practice
(LA)LR(1) is much, much faster on any practical programming language
grammar.
If you think about it, the LR(1) "closure" and "goto" operations would
need a very silly grammar in order to be very slow. In order for it
to be bad, each rule would have to look a bit like this:
A -> A A' ...
-> B B' ...
-> C C' ...
for many different nonterminals in the grammar. I have serious doubts
that grammars of this nature show up that much in practical usage.
Just for kicks, implement an LL(1) parser for C in Elisp and see how
slow it is (I have - it's bad). You could easily imagine an LR parser
being much quicker.
--
Geoff(rey) Wozniak, Limited Duties Instructor
University of Western Ontario
Computer Science (Graduate) Department
London, Ontario, Canada
http://woz.killdash9.org/
I always thought that LL parsing does not backtrack. You just select
the production based on the first input symbol that you see (and the
restrictions that make a grammar LL(1) make sure that only the right
production will match).
Are you considering a backtracking extension to LL(1)? If that is the
case, could you define the extension? (Note that any extension was
probably considered years ago and already has a standard name.)
Here's a (silly) grammar that's LR(1) but not LL(1):
S ::= A B
A ::= x
B ::= x
Since LL does not remember previous states, it will not know that the
first x should reduce to A and the second x should reduce to B. LR uses
the left context to distinguish the two cases.
LL(1) plus backtracking would handle this just fine, of course. A
grammar that's LR(1) but not LL(1)-with-backtracking would need some
infinities, but since the input string is finite the LL backtracker will
(sooner or later) find those productions that were selected by the LR
parser, so my guess is that LL(1)-with-backtracking is at least as
powerful as LR(1). The complexity could easily be exponential though.
The order in which alternatives to productions are tried is important
though, and it may well be that a depth-first search will terminate in
less cases than a breadth-first search (though it's dubious wether LL is
still an appropriate name if the latter technique is used).
> However, the LR(1) algorithms for
> producing the parsing table also seems to be exponential.
Yes it is.
Essentially, the LR table generation algorithm creates a lot of DFAs,
and since constructing a DFA has exponential worst-case complexity, LR
generation must be exponential.
There is an important difference though: LL(1)-with-backtracking is
slow (possibly exponential) during parsing, while LR table generation
is slow only during table construction. LR parsing is linear with
input length (as is LL(1) parsing without backtracking).
> P.S. - Where can I find a complete FAQ on basic compiler theory? The
> only FAQ I found is this: http://www.faqs.org/faqs/compilers/, and it
> does not have any discussion on theory.
The usual source is the "Dragon Book" (sorry, no reference handy). It
covers the basic techniques (operator precedence, LL(1), LR(1),
SLR(1), LALR(1)) and their comparative advantages and disadvantages
quite extensively. However, understanding the algorithms is more work
than simply implementing them, and it's a bit outdated in its
assumptions (memory efficiency is more strongly stressed than today's
computers would require).
Regards,
Joachim
> I was told that LR(1) parsers are much more powerful than the LL(1)
> parsers because they recognize more grammars.
This is true. Every LL(1) grammar is an LR(1) grammar, although there
are LL(1) grammars that are not LALR(1). However, any LR(1) grammar
with left recursion is not an LL(1) grammar. Any LR(1) grammar that
is not left-factored is not an LL(1) grammar.
This is strictly at the grammar level. If you start allowing yourself
to modify the grammar, you are no longer at the grammar level, you are
at the language level. A language is simply a set of strings
(sentences). If you have a grammar, it describes some set of strings.
Thus, a grammar defines a language. The reverse is not true.
There may or may not be a grammar acceptable to your favorite parsing
method that describes that set of strings. Thus, if you have an LR(1)
that contains left-recursion and you can successfully remove that left
recursion, you may have an LL(1) grammar for than language. If so,
the language is LL(1). If you weren't successful, there still maybe
another grammar for the language that is LL(1) and you just don't know
what it is. This makes the statements between languages (and not
grammars) more complex. However, any LL(1) language (i.e. that there
exists an LL(1) grammar for) is also an LR(1) language.
Counter-intuitively, any LR(k) language is also an LR(1) language,
even though there are LR(k) grammars that are not LR(1).
> I can see that LR(1) parsers are probably more powerful because to
> create the parsing table, a stack is used.
Not true. LL(1) parsers use a stack also. The reason LR(1) parsers
are more powerful is because the perform the closure operation you
mention later and they defer the recognition of which production is
being chosen until the production is complete (and to be precise until
1 token of lookahead past the end of the production has been
inspected). LL(1) parsers must chose which production to apply upon
seeing the first token of the production.
> For LL(1) you just look at the next input and do a trial and error
> approach (of course you can left-factor, and most certainly you have
> to remove left-recursions otherwise you'll go into an infinite loop
> - all this may result in less backtracking although it's not
> guaranteed).
Not true. The process is NOT trial and error and there is NO
backtracking. The parser simply looks at the first token and choses
the one (and only) production that given the state of the parse has at
that token as its first token. (A grammar is LL(1) if and only if
that choice is unique, which is why left recursion and
non-left-factored grammars are not allowed). If you allow
backtracking (or trial and error), you are in a completely different
class of parsers (a class that is much more powerful, i.e. all LR(1)
grammars and many non-LR(k) grammars can be solved by backtracking
parsers).
> Another curiosity is the efficiency of LL(1) and LR(1) algorithms.
Both parsing algorithms run in linear time (i.e. for n tokens of
input, k*n units of time are required to perform the parse).
> Due to backtracking, LL(1) may end up being exhaustive in that it
> may backtrack to the point that it reaches the last production in
> which it will try to apply. This exhaustive search means LL(1) is
> bounded by an exponential (am I correct?).
Although backtracking parsers are not LL(1) parsers, backtracking
parsers do sometimes exhibit exponential run-time performance.
> Because of this fan-out, it seems LR(1) algorithm for creating
> parsing table (btw: I'm talking about the algorithm presented in the
> new dragon book) is also bounded exponentially (am I correct?).
I have not seen a worst-case analysis of the space requirements for an
LR parsing table. However, the following argument should hold. Each
state in an LR(1) parsing table consists of a set of items which
correspond to the positions the parser can be in each of the
productions (the so called dotted rules) (*note below). Each rule has
a fixed, finite number of positions. Since each parser state is a
collection of a unique set (note set, not bag, no position is ever
repeated in a state) of those positions, the set of all parser states
is simply a subset of the power set of the positions. Now, this set
is potentially exponential in size in terms of the number of positions
(i.e. the size of the grammar), it is a fixed constant amount for any
grammar and does not vary with the size of input string being
recognized.
In practice, most human written grammars have a number of states that
is less than the square of the number of productions (and are nearly
linear in the number of positions).
*note: For LR(1) (but not LR(0) or LALR(1)), the items also include
the lookahead set, which is a finite set of tokens that might follow
the production. This is again a powerset subset, but again fixed
given the grammar.
Hope this helps,
-Chris
*****************************************************************************
Chris Clark Internet : com...@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)