LALR(1) Lookahead calculation

Chris Hargrove

unread,

Sep 18, 1998, 3:00:00 AM9/18/98

to

I've been mucking around with making a C++ parser generation library
for the hell of it (so I could calculate parser state machines on the
fly based on string-passed rules rather than stick with pre-generated
output from an external tool). I've decided to limit it to LALR(1)
since it seems to allow enough grammar complexity.

Right now the generator is LR(0), and to that end it works like a
champ. But I've run into problems adapting it to calculate the
lookahead tokens required for LALR(1). I've referred to both [Aho]
and [Holub] and while both have several routines for lookahead
determination, they all seem to depend on either organizing the
generator's kernel item closure routine for LR(1) (including the
calculation of first sets during that operation), or doing many passes
over the kernel items to determine spontaneous and propogated
lookaheads. All these algorithms seem to come from the point of view
of a generator that must output precomputed action/goto tables which
are entirely deterministic from that point on.

Since I'm coming from a different standpoint (calculating the
generators on demand as node-driven state machines which will remain
in memory while the generator is used), is there a more simplistic
alternative to these algorithms? Worst case scenario I'll go ahead
and mimic the pregenerated case and use the algorithm in [Aho], but I
wanted to know if recent literature addressed any newer methods
tailored to this scenario (i.e. all information accessible to the
parser generator is accessible to the parser itself). It's not a
common case, so my search so far has been to little avail.

Any ideas would be appreciated,
--
Chris Hargrove
Programmer, 3DRealms Entertainment
chr...@3drealms.com
http://www.3drealms.com
--
Send compilers articles to comp...@iecc.com, meta-mail to
compiler...@iecc.com. Archives at http://www.iecc.com/compilers

Chris F Clark

unread,

Sep 22, 1998, 3:00:00 AM9/22/98

to

> Right now the generator is LR(0) . . . adapting it to calculate the
> lookahead tokens required for LALR(1) . . . is there a more simplistic
> alternative to these algorithms?

Yes, find David Spector's paper on "state splitting". It's in SIGPLAN
Notices from ~1986. However, here's the kernel of the idea for your
use. Simply generate the LR(0) machine. Then, when you come to a
state where you might need to reduce, look back along the path that
got you there and if the lookahead from the parent production forces a
reduce, then reduce (or deal with the conflict if it results from
desiring a reduce there). Do this calculation for each rule that has
a reduction in the state.

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : c...@world.std.com
Compiler Resources, Inc. CompuServe : 74252,1375
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)
------------------------------------------------------------------------------
Web Site in Progress: Web Site : http://world.std.com/~compres

Michael Sperber [Mr. Preprocessor]

unread,

Sep 22, 1998, 3:00:00 AM9/22/98

to

>>>>> "Chris" = Chris Hargrove <chr...@3drealms.com> writes:

> Since I'm coming from a different standpoint (calculating the
> generators on demand as node-driven state machines which will remain
> in memory while the generator is used), is there a more simplistic
> alternative to these algorithms? Worst case scenario I'll go ahead
> and mimic the pregenerated case and use the algorithm in [Aho], but I
> wanted to know if recent literature addressed any newer methods
> tailored to this scenario (i.e. all information accessible to the
> parser generator is accessible to the parser itself). It's not a
> common case, so my search so far has been to little avail.

Here are references to some very fast algorithms:

@article{DeRemerPennello1982,
title="Efficient Computation of {LALR}(1) Look-Ahead Sets",
author="Frank DeRemer and Thomas Pennello",
pages="615--649",
journal=toplas,
year=1982,
month=oct,
volume=4,
number=4
}

@Article{ParkChoe1987,
author = "Joseph C.H. Park and Kwang-Moo Choe",
title = "Remarks on Recent Algorithms for {LALR} Lookahead Sets",
journal = sigplan,
year = 1987,
volume = 22,
number = 4,
pages = "30--32",
month = "April"
}

@article{ParkChoeChang1985,
title="A New Analysis of {LALR} Formalisms",
author="Joseph C. H. Park and K. M. Choe and C. H. Chang",
pages="159--175",
journal=toplas,
year=1985,
month=jan,
volume=7,
number=1
}

@Article{Ives1986,
author = "Fred Ives",
title = "Unifying View of recent {LALR(1)} Lookahead Set Algorithms",
journal = sigplan,
year = 1986,
volume = 21,
number = 7,
pages = "131--135",
month = "July",
note = "Proceedings of the SIGPLAN'86 Symposium on Compiler Construction"
}

@Article{Ives1987,
author = "Fred Ives",
title = "Response to Remarks on Recent Algorithms for {LALR} Lookahead Sets",
journal = sigplan,
year = 1987,
volume = 22,
number = 8,
pages = "99--104",
month = "August"
}

@UNPUBLISHED{Ives1987-lalr,
author = "Fred Ives",
title = "An {LALR(1)} Lookahead Set Algorithm",
note = "Unpublished manuscript",
year = 1987
}

-- =

Cheers =8-} Mike
Friede, V=F6lkerverst=E4ndigung und =FCberhaupt blabla

Robert Corbett

unread,

Sep 24, 1998, 3:00:00 AM9/24/98

to

Chris F Clark <c...@world.std.com> wrote:
>> Right now the generator is LR(0) . . . adapting it to calculate the

>> lookahead tokens required for LALR(1) . . . is there a more simplistic
>> alternative to these algorithms?
>

>Yes, find David Spector's paper on "state splitting".

State splitting is not a way of computing LALR(1) lookahead sets.
State splitting is a way of producing an LR(1) parser from a parser
based on the LR(0) state set.

State splitting is very hard. I have read several papers purporting
to describe state splitting algorithms. The only one I have seen that
I believe to be correct is the paper "A Practical State Splitting
Algorithm for Constructing LR-Parsers" by Kristensen and Madsen
published by Aarhus University.

Sincerely,
Bob Corbett

Dick Grune

unread,

Sep 26, 1998, 3:00:00 AM9/26/98

to

chr...@3drealms.com (Chris Hargrove) writes:

>I've been mucking around with making a C++ parser generation library
>for the hell of it (so I could calculate parser state machines on the
>fly based on string-passed rules rather than stick with pre-generated
>output from an external tool). I've decided to limit it to LALR(1)
>since it seems to allow enough grammar complexity.

>Right now the generator is LR(0), and to that end it works like a
>champ. But I've run into problems adapting it to calculate the
>lookahead tokens required for LALR(1). I've referred to both [Aho]
>and [Holub] and while both have several routines for lookahead
>determination, they all seem to depend on either organizing the
>generator's kernel item closure routine for LR(1) (including the
>calculation of first sets during that operation), or doing many passes
>over the kernel items to determine spontaneous and propogated
>lookaheads. All these algorithms seem to come from the point of view
>of a generator that must output precomputed action/goto tables which
>are entirely deterministic from that point on.

All the deterministic algorithms in parsing are the result of
optimized precomputation of item sets, which makes for nice theory and
is useful when you have a fixed grammar and a lot of text to parse
with it.

But there is no need to do the precomputation and the optimization.
You can just as well construct the item sets on the fly, while
parsing. This is quite simple for LR(0) (you have most of the code
already) and not difficult for LR(1). This way you never construct
any table, just all the sets you need to parse the text you happen to
have. And there are as many of these as there are tokens in your
text, so the whole algorithm is linear. For an example see our book,
downloadable through http://www.cs.vu.nl/~dick/PTAPG.html.

You cannot do this easily for LALR(1) since that results from a
transformation on the finished LR(1) automaton. And then, perhaps you
can, now that I come to think of it, but it seems to me that just
doing LR(1) would be easier and more effective. Also, once you have
the LR(1) algorithm in place, making it LR(2) should be a breeze.
Given the range of your applications that might be useful too.

Dick Grune | email: di...@cs.vu.nl
Vrije Universiteit | ftp://ftp.cs.vu.nl/pub/dick
de Boelelaan 1081 | http://www.cs.vu.nl/~dick
1081 HV Amsterdam, the Netherlands | tel: +31 20 444 7744

Chris F Clark

unread,

Sep 26, 1998, 3:00:00 AM9/26/98

to

> State splitting is not a way of computing LALR(1) lookahead sets.

However, David's algorithm can also be used to determine lookahead
sets from an LR(0) machine (either LALR ones or LR ones). It does
that in the process of determining where to split the states. You
don't have to split the states to use his look-back method for
computing lookaheads.

Importantly, this is the key to the previous question, the LR(0)
machine contains all the information needed to compute the lookahead
sets. In fact, the incrementally compiled LR(0) machine has almost
enough information--and when it doesn't all you need to do is fill out
a few more states--the ones which have the lookaheads for the parent
productions, which is where the machine will go if one reduces.

> State splitting is very hard.

I also disagree with this. There is one simple algorithm which always
works (although it can make a slightly larger than required machine).
Once, you have determine a state to split off, generate a new set of
LR(0) tables based upon that state (without merging with any previous
tables). The LR(0) tables capture that state as if it were a
sub-grammar to itself. Thus, you effectively generate separate sets
of LR(0) tables for each place where the LR engine would not have
merged them in the first place, which is what the extra information in
the canonical LR items is designed to do (that is keep one from
merging items that have conflicting lookahead).

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : c...@world.std.com
Compiler Resources, Inc. CompuServe : 74252,1375
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)