LR(1) Parser Generator

Tim Josling

unread,

Jul 17, 2001, 11:10:20 PM7/17/01

to

I am looking for the source for an LR(1) compiler, preferably
written in C, and with open source licencing. Google cannot help.
Is anyone aware of anything. I do have a pascal implementation
but I do not like pascal (for no good reason I'm sure) and P2C
core dumps on the program.

LALR implementations are available but it has to be LR(1).

Everyone says LR(1) implementations are too slow or too big but
those books were written when mainframes has 1mb of 'core'.

Tim Josling

Marco van de Voort

unread,

Jul 18, 2001, 7:56:06 PM7/18/01

to

> I am looking for the source for an LR(1) compiler, ...

>
> LALR implementations are available but it has to be LR(1).
>
> Everyone says LR(1) implementations are too slow or too big but
> those books were written when mainframes has 1mb of 'core'.

Maxima on working set of current processors are still that size :-)

Mike Dimmick

unread,

Jul 18, 2001, 8:04:35 PM7/18/01

to

"Tim Josling" <t...@melbpc.org.au> wrote in message

> I am looking for the source for an LR(1) compiler, preferably
> written in C, and with open source licencing. Google cannot help.
> Is anyone aware of anything. I do have a pascal implementation
> but I do not like pascal (for no good reason I'm sure) and P2C
> core dumps on the program.
>
> LALR implementations are available but it has to be LR(1).

Why do you say you need full LR(1)? From memory, the difference is
that states with the same core are merged in the LALR(1)
implementation; that is, if two states are identical but for the
lookahead required to resolve the next-state transition, they are
merged.

If I remember correctly, this means that shift/reduce conflicts cannot
be introduced by merging states - there's a theorem which states that
if a shift/reduce conflict occurs in the LALR(1) machine, it would
also have done so in the LR(1) machine. I believe however that
artificial reduce/reduce conflicts are possible in LALR(1); is this
happening to you?

Bison's user manual (at
http://www.gnu.org/manual/bison/html_chapter/bison_8.html#SEC79) lists
a possible method of resolving these reduce/reduce conflicts.

I would expect that, with sufficient understanding of the Pascal
implementation, and of the Bison source, you might be able to adapt
Bison (or some other open source LALR(1) parser generator) to produce
full LR(1).

> Everyone says LR(1) implementations are too slow or too big but
> those books were written when mainframes has 1mb of 'core'.

True, but full LR(1)s do still frequently have many thousands of states.

I'd like to see more development of adaptive parser generators, which build
only as complex a system as is necessary to parse the language, e.g.

LL(1) [or Parr-style LL^1(k)] for keyword-led productions;
SLR(1), LALR(1), LR(1) for subsets which can be attacked in that manner;
Tomita for really sticky ambiguous problems.

This may be naivety on my part! It's possible that the code required for
the various different state machines would outweigh the possible saving in
state space. Some experimentation would probably be necessary. I don't
really have the skills to write this!

--
Mike Dimmick

Tim Josling

unread,

Jul 23, 2001, 2:22:55 AM7/23/01

to

Mike,

Yes I am getting R/R conflicts, or as the bison manual calls them
'mysterious reduce reduce conflicts'. I then have to try and work
out what is causing them and add some extra bits to the grammar.
Normally this involves flattening out the grammar and duplicating
large slabs of it. Alternatively I have to accept some generic
common grammar and hard code checking to make sure they did not
use the wrong constructs in the wrong places.

I am pretty confident the conflicts are spurious, and that LR(1)
will fix the problem.

If all else fails I will write an LR(1) patch for bison and
report the results here. Unfortunately bison is very tersely
documented.

According to my benchmarks the parse is only 5% of gcc
compilation time, so even a 40% increase would not be a big deal
especially given that I would be spending less time scratching my
head about mysterious conflicts and more time developing my
compiler. The dragon book also mentions the 5% figure.

The stats about LALR being smaller than LR are based on the
assumption that the grammar is LALR to start with. If you have to
add hacks to shoehorn it into LALR then the difference would be
smaller though still significant.

Also if you just use a hash table for the parse table the logic
will actually be a lot simpler - no need for 'check tables' etc
etc. That is the LALR parser has various space/time tradeoffs
that slow it down and this can also be balanced against the time
taken for hashing. The hashing can actually be pretty simple. The
key is just state+symbol which is easily hashed by divide by
constant prime. And good compilers convert constant divides into
less expensive operations.

Tim Josling

Mike Dimmick wrote:
>
> "Tim Josling" <t...@melbpc.org.au> wrote in message
> > I am looking for the source for an LR(1) compiler, preferably
> > written in C, and with open source licencing. Google cannot help.

> ...

Hans Aberg

unread,

Jul 23, 2001, 2:30:02 AM7/23/01

to

Josling <t...@melbpc.org.au> wrote:
>Everyone says LR(1) implementations are too slow or too big but
>those books were written when mainframes has 1mb of 'core'.

Is really the generated LR(1) parser slower than the generated LALR(1)
parser? -- I mean, the latter only differ in that some states of the
former have been merged, as to compress the look-up table, but the
generated state-machine is the same.

Hans Aberg * Email: Hans Aberg <hab...@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>

David R Tribble

unread,

Jul 23, 2001, 2:32:52 AM7/23/01

to

Tim Josling wrote:
> I am looking for the source for an LR(1) compiler, preferably
> written in C, and with open source licencing. Google cannot help.
> Is anyone aware of anything. I do have a pascal implementation
> but I do not like pascal (for no good reason I'm sure) and P2C
> core dumps on the program.
>
> LALR implementations are available but it has to be LR(1).

I am currently working on an LR(1) parser generator for Java (which is
also written in Java), which generates parsers from YACC input grammars.
I have the generator "engine" already working, but as of yet I have not
completed the parser or code generator portions.

I plan to release it as open source. Would this be of interest to you?

> Everyone says LR(1) implementations are too slow or too big but
> those books were written when mainframes has 1mb of 'core'.

I agree. I actually use a modified LR(1) parser generator, which
attempts to merge as many item sets as it can, a la LALR(1) parsers,
so the resulting state tables are not that much bigger than LALR(1)
tables (and will, in fact, be exactly the same size if the input
grammar is LALR(1) or SLR(1)). I call this method "Merged LR(k)", or
"MLR(k)" for short.

-- David R. Tribble, mailto:da...@tribble.com --

Mark

unread,

Jul 23, 2001, 11:25:35 PM7/23/01

to

Tim Josling <t...@melbpc.org.au> wrote in message

> I am looking for the source for an LR(1) compiler, preferably
> written in C, and with open source licencing. Google cannot help.

Hi,

See the COCOM tool set (Russian Armoury) Vladimir Makarov,
vmak...@users.sourceforge.net

http://sourceforge.net/projects/cocom/

The latest version (0.993) has an Earley parser.

Extracted form: http://cocom.sourceforge.net/index.html

4. MSTA (syntax description translator)

The MSTA can emulate YACC (Posix standard or System V Yacc). The MSTA
have the following additional features:

Fast LR(k) and LALR(k) grammars (with possibility resolution of
conflicts). Look ahead of only necessary depth (not necessary given
k). Originally LALR(k) parsers are generated by modified fast
DeRemer's algorithm. Parsers generated by MSTA are up to 50% faster
than ones generated by BISON and BYACC but usually have bigger size.
Extended Backus-Naur Form (EBNF), and constructions for more
convenient description of the scanners. More convenient naming
attributes. Optimizations (extracting LALR- and regular parts of
grammars and implementing parsing them by adequate methods) which
permit to use MSTA for generation of effective lexical analyzers. As
consequence MSTA permits to describe easily (by CFG) scanners which
can not be described by regular expressions (i.e. nested comments).
More safe error recovery and reporting (besides error recovery method
of YACC). Fast generation of fast parsers.

---
Marc

t...@cs.ucr.edu

unread,

Jul 30, 2001, 1:20:40 AM7/30/01

to

Tim Josling <t...@melbpc.org.au> wrote:
: Mike,

: Yes I am getting R/R conflicts, or as the bison manual calls them
: 'mysterious reduce reduce conflicts'. I then have to try and work
: out what is causing them and add some extra bits to the grammar.
: Normally this involves flattening out the grammar and duplicating
: large slabs of it. Alternatively I have to accept some generic
: common grammar and hard code checking to make sure they did not
: use the wrong constructs in the wrong places.

Have you tried to resolve these conflicts by hand by giving priorities
to the rules?

: I am pretty confident the conflicts are spurious, and that LR(1)
: will fix the problem.

Are you sure that your grammar isn't ambiguous?

: If all else fails I will write an LR(1) patch for bison and

: report the results here. Unfortunately bison is very tersely
: documented.

On the order of 20 years ago, I saw a paper that gave an algorithm
that producted LR(1) tables that were not significantly larger than
LALR tables. As I recall, the algorithm split LALR states only when
necessary to avoid R/R conflicts.

: According to my benchmarks the parse is only 5% of gcc

: compilation time, so even a 40% increase would not be a big deal
: especially given that I would be spending less time scratching my
: head about mysterious conflicts and more time developing my
: compiler. The dragon book also mentions the 5% figure.

The issue isn't parse time, but rather table size and table-generation
time. But you're right: the definition of "big" doubles at Moore's
rate and there have been many doubling periods since the Dragon book
was written.

Tom Payne

Sönke Kannapinn

unread,

Aug 2, 2001, 2:31:00 AM8/2/01

to

> On the order of 20 years ago, I saw a paper that gave an algorithm
> that producted LR(1) tables that were not significantly larger than
> LALR tables. As I recall, the algorithm split LALR states only when
> necessary to avoid R/R conflicts.

Pager, D. (1977). A practical general method for constructing LR(k) parsers.
Acta Informatica 7, S. 249-268.

Tim Josling

unread,

Aug 6, 2001, 4:02:52 AM8/6/01

to

I have installed MSTA parser generator. It works fine with no bugs
found to date, with the exception that it does not have an implicit $$
= $1 rule, so I had to go through the grammar and hand code them
all. I can now try turning LR on and simplifying my grammar.

MSTA also has some good constructs for opional elements, repeated
elements. In my current grammar I have a lot of things lie this

X+opt:
/* Nothing. */
|X {
$$ = $1;
};

and this:

X_rep:
X { $$ = $1;}
|X_rep X { ... };

He even has a construct for comma-separated lists.

So I should be able to reduce the size of the source code of the grammar
by 35% or so.

I am seeing about a 3% slow down in both build time and time to run my
test suite vs using bison.

I will report back how I go.

Tim Josling

t...@cs.ucr.edu wrote:

> Tim Josling <t...@melbpc.org.au> wrote:
> : Mike,
>
>

> Have you tried to resolve these conflicts by hand by giving priorities
> to the rules?

No. Given the huge size of the COBOL grammar, working out how the problem
arises, and what the two rules are that conflict is the main problem. It
is very difficult to be sure that hand coding priorities will have the
desired effect,

As an example of the size of the COBOL grammar, Tiny COBOL just blew the
32k limit for table size.

>
>
> : I am pretty confident the conflicts are spurious, and that LR(1)
> : will fix the problem.
>
> Are you sure that your grammar isn't ambiguous?

Yes I am.