LL(1) vs LALR(1) parsers

Saileshwar Krishnamurthy

unread,

Nov 9, 1995, 3:00:00 AM11/9/95

to

Odd Arild Olsen <oao...@login.eunet.no> writes:

Odd> grammars, but many languages can be described by LL(1)
Odd> grammars. If LL(1) can yield faster, smaller and more
Odd> understandable parsers with better error handling for
Odd> many languages, why isn't this method more elaborated?

As I understand it:

A standard (easily understandable) grammar of a typical programming
language often isn't LL(1). However normally after eliminating
left-recursion and left-factoring (and other transformations) the
grammar can be converted to one that is LL(1). However these
transformtations make the new grammar pretty difficult to understand.
So while the parser may be reasonably easy to understand, it becomes
difficult to use that grammar and embed semantic routines in it -
'coz the original version will probably be what is most intuitive.
However most standard grammars will not need conversion to make them
LR.

Also parser-controlled semantic stacks mesh excellently with LR
parsers, and are pretty much a pain for LL parsers (In LL parsers,
the semantic stack cannot grow in parallel with the parse stack)

Odd> the various Small-C compilers were recursive
Odd> descending. Which common languages can not be parsed by
Odd> LL(1) grammars?

They can be parsed perhaps, but the final LL(1) grammar that
corresponds to the language will probably not be so intuitively
obvious. Whoops, I find it difficult to convince myself whenever I
eliminate left-recursion for a toy grammar !
--
Send compilers articles to comp...@iecc.com,
meta-mail to compiler...@iecc.com.

Sebastian Schmidt

unread,

Nov 10, 1995, 3:00:00 AM11/10/95

to

Odd Arild Olsen <oao...@login.eunet.no> writes:

[LALR allows for more complex grammars but LL(1) can yield faster,
smaller and more understandable parsers with better error handling.]

[There is a LL(1) grammar for C (cgram-ll1)]

In most cases grammars are not only used for parsing but also to build
(at least conceptually) an abstract syntax tree as input for the
semantic analysis step. After performing the transformations required
to make a grammar LL(1), it is very hard to use it to represent
semantic properties of the language.

I didn't take a look at it, but I guess this is the case for
'cgram-ll1' too.

regards
--
e-mail: s...@iaxp01.inf.uni-jena.de
phone: (+49) 3641 631398
fax: (+49) 3641 631400

Scott Stanchfield

unread,

Nov 10, 1995, 3:00:00 AM11/10/95

to

LL(1) becomes even more desirable when you add syntactic and semantic
predicates, as the Purdue Compiler Construction Tool Set (PCCTS) does.
The grammar becomes more readable (you don't need to have several obscure
extra rules or evil communication between the lexer and parser.)
I agree with the LL(1) arguments of "Odd Arild," especially when it comes
to debugging. PCCTS' parser generator, antlr, generates a recursive-descent
parser that is very readable. Each rule is a function, which makes it
easy to stop anywhere you want, and when walking through your parser, you
don't have to step over obscure yacc-generated code.
The grammar is actually fairly easy to construct, as antlr supports
EBNF. (You still need to do all the fun left-factoring necessary in LL --
but it's not as bad, as you can have subrules.)

For more information, follow comp.compilers.tools.pccts. The tool is
available from ftp://ftp.parr-research.com/pub/pccts. Check it out!

-- Scott

Terence John Parr

unread,

Nov 14, 1995, 3:00:00 AM11/14/95

to

Saileshwar Krishnamurthy (kr...@cs.purdue.edu) wrote:
: They can be parsed perhaps, but the final LL(1) grammar that

: corresponds to the language will probably not be so intuitively
: obvious.

Just thought I'd throw in my 2 cents on LL(1). If I'm building
a recognizer only, LALR(1) clearly wins. The decision is not
so clear when you talk about the debugging, adding actions, etc...
LL(k>1) is actually a nice solution [LALR(k>1) would be nice too].
The syntactic predicates of PCCTS allow arbitrary lookahead
through selective backtracking. Kinda nice. The famous C++
expressions vs declaration ambiguity can be solved via:

stat : (decl)? // if it looks like a declaration, it is
| expr // else it's an expression.
;

Note that we are smart about this: "a=3;" immediately jumps to
the expr alternative as it's obviously not a declaration.

Please see my upcoming paper with Russell Quong in SIGPLAN Notices
(Jan or Feb 96) called: ``LL and LR Translators Need k>1 Lookahead''.
PS can be had of it now at ftp://ftp.parr-research.com/pub/papers/needlook.ps

Best regards,
Terence Parr
http://www.parr-research.com/~parrt/

steve (s.s.) simmons

unread,

Nov 15, 1995, 3:00:00 AM11/15/95

to

> Why do compiler design text book authors only describe recursive parsers
> as a step stone on their way to bottom up parsers? I know that LALR allows
> for more complex grammars, but many languages can be described by LL(1)
> grammars. If LL(1) can yield faster, smaller and more understandable
> parsers with better error handling for many languages, why isn't this
> method more elaborated? (Holub, as an example, presents full grammar and
> compiler listings for a bottom-up C-compiler, not for top-down).

Intellectual bigotry!!!!!

That is, the automated parsers use a great deal of automata theory
which helps build on the computer science foundation. Recursive descent
parsers are a small matter of programming (SMOP). I do notice among
peers in industry that people are no longer snubbing the idea of writing
a recursive parser when it is appropiate.

Remember you don't take a compiler course to learn how to build a
compiler, surprise... Most people never write a compiler. You take
it for the following reasons:

- Improving your comp. sci. background to understand
automata theory with a very good application.
- Understanding what a compiler may do (or not do) for
your code.
- Learning about big system software, data structures, etc.

Thank you.

Steve Simmons
[I agree that few CS students will write a conventional compiler, but we all
end up having to decode some sort of input language in an application. -John]

Terence John Parr

unread,

Nov 20, 1995, 3:00:00 AM11/20/95

to

[This is a corrected version of <95-1...@comp.compilers> -John]

Saileshwar Krishnamurthy (kr...@cs.purdue.edu) wrote:
: They can be parsed perhaps, but the final LL(1) grammar that
: corresponds to the language will probably not be so intuitively
: obvious.

Just thought I'd throw in my 2 cents on LL(1). If I'm building
a recognizer only, LALR(1) clearly wins. The decision is not
so clear when you talk about the debugging, adding actions, etc...
LL(k>1) is actually a nice solution [LALR(k>1) would be nice too].
The syntactic predicates of PCCTS allow arbitrary lookahead
through selective backtracking. Kinda nice. The famous C++
expressions vs declaration ambiguity can be solved via:

stat : (decl)? // if it looks like a declaration, it is
| expr // else it's an expression.
;

Note that we are smart about this: "a=3;" immediately jumps to
the expr alternative as it's obviously not a declaration.

Please see my upcoming paper with Russell Quong in SIGPLAN Notices
(Jan or Feb 96) called: ``LL and LR Translators Need k>1 Lookahead''.
PS can be had of it now at

ftp://ftp.parr-research.com/pub/pccts/papers/needlook.ps

Best regards,
Terence Parr
http://www.parr-research.com/~parrt/

Bill Leonard

unread,

Nov 22, 1995, 3:00:00 AM11/22/95

to

"steve (s.s.) simmons" <sim...@bnr.ca> writes:
> Intellectual bigotry!!!!!
>
> That is, the automated parsers use a great deal of automata theory
> which helps build on the computer science foundation. Recursive descent
> parsers are a small matter of programming (SMOP). I do notice among
> peers in industry that people are no longer snubbing the idea of writing
> a recursive parser when it is appropiate.

I've noticed that, too, and I really wonder why. At the same time that
we're trying desperately to develop tools that do the programming for you
in other areas of software development, and we already have tools (e.g.,
yacc) that can program a parser for you, why are we regressing?

I don't think it is intellectual bigotry (to use automated parsers vs.
hand-crafted ones), I think it's just common sense. We have automated
means of generating parsers, why not use them?

If I were in charge of a compiler project (and I have been), I'd much
rather devote engineering resources to making the generated code better or
the compiler faster or any of a number of other improvements than in coding
a parser.

For certain compilers, there may be good reasons not to use an automated
parser, but in general I'd recommend using a parser generator. Why
reinvent the wheel?

Mr. Simmons says recursive descent parsers are a "small matter of
programming". How small? If it takes more than a couple of hours to
write, it's too much, because it will also take up time later on for others
to learn and fix. All projects have a fixed amount of programmer
resources, so any time spent on a parser is that much less spent elsewhere.

In my opinion, parser generators have (at least) the following advantages
over hand-crafted parsers:

1. It is easier to find the semantic action code -- it isn't buried
amongst the parsing. This makes for easier maintenance.

2. It is easier to change the language. Even standardized languages
change occasionally, but more importantly you may find that your
interpretation of the language is incorrect. If you're dealing with a
language like Fortran, you'll also find that you're constantly asked
to add extensions to the language. (I suspect C++ will also
experience this phenomenon for years after the language is
standardized.)

I've heard many people say that hand-crafted parsers do better at error
recovery. That may be true if you're comparing to yacc, but there are
other parser generators that do better at error recovery, many of which
have features for tuning the error recovery. Furthermore, hand-crafted
parsers only do better if you spend a considerable amount of time tweaking
it. If you spend a comparable amount of time tuning your generated parser,
I bet you'd get pretty close to the same quality.

Besides that, I've seen cases where a project chose a hand-crafted parser
because the error recovery would allegedly be better, only to end up
spending so much time just getting the parser right that there was no time
to implement good error recovery! Conversely, you can spend so much time
on the error recovery that the generated code stinks!

> Remember you don't take a compiler course to learn how to build a
> compiler, surprise... Most people never write a compiler. You take
> it for the following reasons:
>
> - Improving your comp. sci. background to understand
> automata theory with a very good application.

Many curricula separate automata theory from compilers, and in those cases
the compiler course assumes you already know automata theory.

> - Understanding what a compiler may do (or not do) for
> your code.
> - Learning about big system software, data structures, etc.

Those are all good reasons, but they're not the most important to us when
considering hiring a compiler engineer. As a general rule, we don't hire
college grads in the compiler group if they haven't had at least one course
in compilers.

--
Bill Leonard
Harris Computer Systems Corporation
2101 W. Cypress Creek Road
Fort Lauderdale, FL 33309
Bill.L...@mail.hcsc.com

elli...@logica.com

unread,

Nov 24, 1995, 3:00:00 AM11/24/95

to

Saileshwar Krishnamurthy <kr...@cs.purdue.edu> wrote:

>However most standard grammars will not need conversion to make them
>LR.

LR parsers have many other advantages over LL(1) parsers, apart from
the fact that they don't usually require conversion.

For programming language parsing, the most important factor is
probably that an LR parser can recognise a syntax error in the input
stream as soon as it is possible to do so. This makes error message
output *much* easier.

Charles
---
elli...@logica.com

Jeff Jackson

unread,

Nov 28, 1995, 3:00:00 AM11/28/95

to

>> That is, the automated parsers use a great deal of automata theory
>> which helps build on the computer science foundation. Recursive descent
>> parsers are a small matter of programming (SMOP). I do notice among
>> peers in industry that people are no longer snubbing the idea of writing
>> a recursive parser when it is appropiate.
>
> I've noticed that, too, and I really wonder why.

I think one reason is that people parse with recursive descent w/back
tracking. It just seems more natural (though that doesn't necessarily
mean it's better, for all values of good:-) to parse in recursive
descent. When reading convoluted yacc-like grammars, I always feel
like I'm twisting my brain inside out because its doing everything
backwards from the way I read and write code.

On the other hand, I find the idea of hand coding a recursive descent
parser, or an automata of any sort more trivial than parsing an
integer for that matter, quite repulsive. Recursive descent w/back
tracking can be generated from a grammar and I find such grammars far
easier to read and understand.

On the other hand (all good programmers have three hands, right?), I'm
under the impression, though possibly mistaken as I haven't done a
literature search on the topic, that most theoretical work on error
recovery has been done on LR type parsers.

>> Remember you don't take a compiler course to learn how to build a
>> compiler, surprise... Most people never write a compiler. You take
>> it for the following reasons:
>>
>> - Improving your comp. sci. background to understand
>> automata theory with a very good application.
>
> Many curricula separate automata theory from compilers, and in those cases
> the compiler course assumes you already know automata theory.
>
>> - Understanding what a compiler may do (or not do) for
>> your code.
>> - Learning about big system software, data structures, etc.

Another purpose for compiler classes is not all programs can get their
input data by having the user press an idiot button with a mouse.
Actually parsing, or at least lexically scanning, keyboard input is
done by many programs.
--
Jeffrey Glen Jackson
j...@ssd.csd.harris.com

William D Clinger

unread,

Nov 28, 1995, 3:00:00 AM11/28/95

to

elli...@logica.com writes:
>LR parsers have many other advantages over LL(1) parsers....
>.... an LR parser can recognise a syntax error in the input
>stream as soon as it is possible to do so....

LL(k) parsers also have this ``viable prefix'' property.

By the way, I use a parser generator to generate recursive descent
LL(1) parsers. The parsers generated by my parser generator have
an even stronger property that Fischer and LeBlanc call the
``immediate error-detection'' property. That is, the parser
not only detects the error after reading the shortest possible
prefix of the input, it also detects the error before making any
internal state transitions. This allows for the best possible
error reporting and recovery.

To take advantage of this requires a lot of hand tuning, as
previously noted. But the parser generator guarantees that
you're starting with an LL(1) parser that is known to be correct
and is known to have the immediate error-detection property.

William D Clinger

Drew Dean

unread,

Nov 28, 1995, 3:00:00 AM11/28/95

to

Bill Leonard <Bill.L...@mail.hcsc.com> wrote:
>If I were in charge of a compiler project (and I have been), I'd much
>rather devote engineering resources to making the generated code better or
>the compiler faster or any of a number of other improvements than in coding

^^^^^^^^^^^^^^^^^^^
>a parser.

Bingo! You've just answered why recursive descent parsing is making a
comeback. I don't have the papers handy, but if you look at Ken
Thompson's paper on the Plan 9 C compiler (New C), he says (now that
the language has stablized) he'd write a recursive descent parser,
because the compiler was spending 35% - 40% of its time in the yacc
generated parser. Niklaus Wirth's Oberon compilers (and single pass
Modula-2, for that matter) are quite fast (on slow hardware, by
contemporary standards) partly because they parse quickly. (We're
talking speeds of 5 KLOC/second if I remember properly. On 1-2 MIPS
CPUs...) You may object that Wirth's compilers are not
highly-optimizing, but non-optimized compile/link times are on just
about every programmer's critical path.

--
Drew Dean
<dd...@cs.princeton.edu> http://www.cs.princeton.edu/~ddean PSTN: 609-258-1797

Mohd Hanafiah Abdullah

unread,

Nov 28, 1995, 3:00:00 AM11/28/95

to

"steve (s.s.) simmons" <sim...@bnr.ca> writes:
> parsers are a small matter of programming (SMOP). I do notice among
> peers in industry that people are no longer snubbing the idea of writing
> a recursive parser when it is appropiate.

Bill Leonard <Bill.L...@mail.hcsc.com> wrote:
>I've noticed that, too, and I really wonder why. At the same time that
>we're trying desperately to develop tools that do the programming for you
>in other areas of software development, and we already have tools (e.g.,
>yacc) that can program a parser for you, why are we regressing?

Let's compare LALR parsers with LL ones:

o Classes of grammars (little edge to LALR)
o Development cycle (edge to LALR)
o Error detection and recovery (even)
o Programmer's satisfaction (little edge to LL)
o Speed of execution (little edge to LL)

So, if speed is top priority I would go with LL parsers.

Napi

p.s. I stand corrected :-)
--
http://www.cs.indiana.edu/hyplan/napi.html

Richard A. O'Keefe

unread,

Nov 29, 1995, 3:00:00 AM11/29/95

to

bi...@amber.ssd.hcsc.com (Bill Leonard) writes:
>I don't think it is intellectual bigotry (to use automated parsers vs.
>hand-crafted ones), I think it's just common sense. We have automated
>means of generating parsers, why not use them?

Because what "we" typically have is yacc.
Yacc was wonderful for a PDP-11. I have encouraged several bright students
to use it in projects, and every time it has taken them a long time to
learn how to use it. Yacc sees subtle connections between parts of your
grammar that the programmer finds hard to detect and which are not relevant
to a recursive descent parser. In three student one semester projects, a
masters thesis, and some work of my own, it has taken FAR longer to hack a
grammar around to the point where Yacc would accept it than to write a
recursive descent parser. In a couple of cases where I've tried it, the
recursive descent parser has been smaller in terms of number of lines of
source code, there have been no tables, and the parser has been faster.

>If I were in charge of a compiler project (and I have been), I'd much
>rather devote engineering resources to making the generated code better or
>the compiler faster or any of a number of other improvements than in coding
>a parser.

So would I. That's why I've reverted to writing recursive descent parsers.

(Having several parsers in one program is also much easier when you don't
use Yacc.)

>For certain compilers, there may be good reasons not to use an automated
>parser, but in general I'd recommend using a parser generator. Why
>reinvent the wheel?

Because the most widely available wheel is pretty dreadful.

>Mr. Simmons says recursive descent parsers are a "small matter of
>programming". How small? If it takes more than a couple of hours to
>write, it's too much, because it will also take up time later on for others
>to learn and fix.

In practice, this counts as an argument _against_ using a parser generator.
I find that writing a recursive descent parser takes less time than hacking
a once clear grammar to the point where Yacc will accept it, EVEN NOW THAT
I KNOW YACC. It seems to take _bright_ students >10 hours to learn how to
use Yacc in a basic sort of way.

>All projects have a fixed amount of programmer
>resources, so any time spent on a parser is that much less spent elsewhere.

>In my opinion, parser generators have (at least) the following advantages
>over hand-crafted parsers:

> 1. It is easier to find the semantic action code -- it isn't buried
> amongst the parsing. This makes for easier maintenance.

There are three things required:
syntax
static semantics
intermediate structure building
Yacc provides no real support for static semantics, so it is still buried
as executable form amongst the code.

> 2. It is easier to change the language.

But the structure of a recursive descent parser is so very close to being
a grammar that it is not hard to change the language.

>I've heard many people say that hand-crafted parsers do better at error
>recovery. That may be true if you're comparing to yacc, but there are
>other parser generators that do better at error recovery, many of which
>have features for tuning the error recovery.

Could I suggest that one of the most valuable things that someone could
do for this newsgroup would be to take 10 parser generators (including
a couple of members of the Yacc family, PCCTS, Cocktail, and maybe a
couple of others) and write a really insightful review. The things to
cover are
- given a bright MSc student who already understands the basic
concepts of context free grammars and static semantic checking,
how long does it take the student to learn the tool well enough
to use it to write a parser for ISO Pascal Extended (to name a
syntax-full language with no _intentional_ gotchas)

- how long does it take to write the first full draft of the grammar
(time the student and also someone reasonable experienced)

- how long does it take to _test_ and debug the grammar (and what
support does the tool provide for testing?)

- how large is the grammar with respect to the original reference?

- how much space does the parser take?

- how fast does the parser run (on a selection of machines, say a
Pentium, a current SPARC, an ALPHA)

- anything else, e.g. does the parser generator type check attributes?
(Yacc doesn't.)

For what it's worth, I've started writing the C grammar as a Prolog program.
The program is a Definite Clause Grammar. At the same time it is a
recursive descent parser or a left corner parser, whichever I want.
This covers
syntax
static semantics (symbol table, type checking)
building intermediate structure
It is not restricted to LALR(1) grammars. It can handle the
"object identifier / type identifier" ambiguities in C naturally.
It can even be automatically turned into a program that generates
test data (there are Yacc-like tools for that, but this is the _same_
grammar being used to generate test cases as for parsing). With a few
extra annotations, it should be a legal Mercury program (I won't be sure
until I've finished), and Mercury generates very good code. It will be
interesting to see how well the program performs.

>Furthermore, hand-crafted parsers only do better if you spend a
>considerable amount of time tweaking it.

This is a blanket assertion requiring more than just the author's
say-so. The authors of GNAT would I think disagree.

There are so many parser generators out there that I haven't the resources
to evaluate them all. I have seen some very good ones in the literature.
But what most people _get_ is some sort of Yacc, which is not wonderful.

I did look at <package X>, but when I ran the code through Lint and
strict ANSI checking I got so many complaints that I decided not to
bother any more. A pity, because it had some nice theory behind it.

--
Richard A. O'Keefe; http://www.cs.rmit.edu.au/~ok; RMIT Comp.Sci.

Michael Parkes

unread,

Nov 29, 1995, 3:00:00 AM11/29/95

to

>For programming language parsing, the most important factor is
>probably that an LR parser can recognise a syntax error in the input
>stream as soon as it is possible to do so. This makes error message
>output *much* easier.

I am afraid I must strongly disagree for the following reasons:

1. As you say this only applies to syntax errors. For semantic
errors the opposite applies. An LL parser with attributes
can find the error as soon as possible and parse a larger
grammar because the parse can be controlled by the attributes.

2. Error messages produced by LR grammars usually do not know the
full error context and hence are often in-exact.

3. LL grammars are generally easier to develop, test and
maintain. A number of tools provide automatic grammar verification,
automatic error detection and correction, etc. This is made a
lot easier due to the nature of LL grammars.

What I am trying to say is PRACTICALLY (based on use of both types) that
LL gives a better all-round result.

Regards

Mike

PS. Don't bother to flame me unless you have worked on at least
3 totally different compilers and have first hand experience of this
area.

Mike McCarty

unread,

Nov 29, 1995, 3:00:00 AM11/29/95

to

Bill Leonard <Bill.L...@mail.hcsc.com> wrote:

First, I really don't want to disagree with your position so much as
perhaps temper your willingness to stand quite so far over.

)"steve (s.s.) simmons" <sim...@bnr.ca> writes:
)> Intellectual bigotry!!!!!
)>
)> That is, the automated parsers use a great deal of automata theory
)> which helps build on the computer science foundation. Recursive descent
)> parsers are a small matter of programming (SMOP). I do notice among
)> peers in industry that people are no longer snubbing the idea of writing
)> a recursive parser when it is appropiate.
)
)I've noticed that, too, and I really wonder why. At the same time that
)we're trying desperately to develop tools that do the programming for you
)in other areas of software development, and we already have tools (e.g.,
)yacc) that can program a parser for you, why are we regressing?

I'm not so sure that you should call this "regressing".

First, programs have to be maintained. I work in industry
(telecommunications), and find that most of the "programmers" are EEs.
They don't understand LEX and YACC. They usually do understand
recursive descent. I ripped out a shift reduce parser from our debugger
when I took it over. It had several defects in it because it had been
maintained by several programmers who really didn't completely
understand how it worked. It also had no precedence built in. I
replaced it with a recursive descent parser which has worked perfectly
since, although being maintained by several programmers since. It was
also smaller in actual code size.

Second, there =are= some automated tools around which generate recursive
descent parsers. I like (for myself) using tools which are appropriate
for the job. Bottom-up is not inherently better than top-down. (If we
are talking about technique.)

Third, hand-crafted parsers have been faster (in my experience).
Especially LEX (I know, not a parser) generates slower, bigger output
than a hand-crafter lexical scanner, and I can usually turn out a
lexical scanner in about the same amount of time doing it either way.

)I don't think it is intellectual bigotry (to use automated parsers vs.
)hand-crafted ones), I think it's just common sense. We have automated
)means of generating parsers, why not use them?

Sure, when appropriate. But I work at a company where it is not our
"job" to write parsers. We have experts in telecomm, not parsing. But we
do need special debuggers etc. with command interfaces. These things
need to be maintained by non-parsing-experts.

)If I were in charge of a compiler project (and I have been), I'd much
)rather devote engineering resources to making the generated code better or
)the compiler faster or any of a number of other improvements than in coding
)a parser.

I've been there, too. And I agree with you on this point. But I have
seen hand-generated recursive descent parsers outdo YACC for speed. I
have also seen them outdo precedence driven shift/reduce type expression
analyzers.

)For certain compilers, there may be good reasons not to use an automated
)parser, but in general I'd recommend using a parser generator. Why
)reinvent the wheel?
)
)Mr. Simmons says recursive descent parsers are a "small matter of
)programming". How small? If it takes more than a couple of hours to
)write, it's too much, because it will also take up time later on for others
)to learn and fix. All projects have a fixed amount of programmer
)resources, so any time spent on a parser is that much less spent elsewhere.

This particular argument cuts both ways. It takes time for others to
learn, understand, and fix YACC input. And if it's not "what you do" to
parse, then you will not have parsing experts around to understand what
the parser is and how it does its work.

)In my opinion, parser generators have (at least) the following advantages
)over hand-crafted parsers:
)
) 1. It is easier to find the semantic action code -- it isn't buried
) amongst the parsing. This makes for easier maintenance.

I'm afraid you lost me there. I find YACC style input definitely and
distinctly difficult to read, as it intermixes parsing and semantic
actions. Usually, if the semantic action is more than a line or two, I
just put it in a routine with a name like "do_xxx". I think that we
don't have -any- tools right now which sufficiently separate the
syntactic parts of language processing fromt the semantic part. I wish
we did.

) 2. It is easier to change the language. Even standardized languages
) change occasionally, but more importantly you may find that your
) interpretation of the language is incorrect. If you're dealing with a
) language like Fortran, you'll also find that you're constantly asked
) to add extensions to the language. (I suspect C++ will also
) experience this phenomenon for years after the language is
) standardized.)

By whom? (is it easier to change)

)I've heard many people say that hand-crafted parsers do better at error
)recovery. That may be true if you're comparing to yacc, but there are
)other parser generators that do better at error recovery, many of which
)have features for tuning the error recovery. Furthermore, hand-crafted
)parsers only do better if you spend a considerable amount of time tweaking
)it. If you spend a comparable amount of time tuning your generated parser,
)I bet you'd get pretty close to the same quality.

Error recovery is always difficult. I like the exception handling of
ADA for this kind of thing. But I agree that it is difficult, and I
don't think that the tools we have today are really adequate, not any
of them.

)Besides that, I've seen cases where a project chose a hand-crafted parser
)because the error recovery would allegedly be better, only to end up
)spending so much time just getting the parser right that there was no time
)to implement good error recovery! Conversely, you can spend so much time
)on the error recovery that the generated code stinks!
)
)> Remember you don't take a compiler course to learn how to build a
)> compiler, surprise... Most people never write a compiler. You take
)> it for the following reasons:
)>
)> - Improving your comp. sci. background to understand
)> automata theory with a very good application.
)
)Many curricula separate automata theory from compilers, and in those cases
)the compiler course assumes you already know automata theory.

I have believed for many years that the bachelors level degree only
confers two things:

vocabulary

a certain level of mechanical expertise in handling that
vocabulary

Graduate level degrees only confer two more things

facility in researching
ability to explore

Real understanding comes from:

doing real work
teaching others how to do real work

I learned more calculus from my students when I was a graduate student
than I did from my professor when I took it.

Mike

David Keppel

unread,

Nov 29, 1995, 3:00:00 AM11/29/95

to

>>[Why recursive descent parsing?]

dd...@cs.princeton.edu (Drew Dean) writes:
>[Because it's faster.]

%A Thomas J. Pennello
%T Very Fast LR Parsing
%J Proceedings of the SIGPLAN 1986 Symposium on Compiler Construction;
SIGPLAN Notices
%V 21
%N 7
%D July 1986
%P 145-151
%X * Partial evaluation of the table interpreter with resepct to each
element of the table (though not described as such).
* On a VAX-like machine, 40,000 to 500,000 lines per minute. On an
80286, 37,000 to 240,000 lines per minute.
* FSM converted to assembly language, 2-4X increase in table size.

;-D on ( Parsetial Evaluation ) Pardo

Pat Terry

unread,

Nov 30, 1995, 3:00:00 AM11/30/95

to

Richard O'Keefe writes (ok.cs.rmit.edu.au)

> Yacc was wonderful for a PDP-11. I have encouraged several bright students
> to use it in projects, and every time it has taken them a long time to
> learn how to use it.

[snip]

> That's why I've reverted to writing recursive descent parsers.

[snip]

> I KNOW YACC. It seems to take _bright_ students >10 hours to learn how to
> use Yacc in a basic sort of way.

[snip - call for someone to evaluate various generators]

We have found that students can get up and running with Coco/R in a very short
time. It's definitely worth a look if you're teaching beginners. And you
can get C, Oberon, Pascal or Modula versions.

Pat Terry, Computer Science, Rhodes University, GRAHAMSTOWN 6140, RSA
cs...@cs.ru.ac.za or cs...@giraffe.ru.ac.za or pdt...@psg.com
Voice +27-461-318291 or +27-461-318292 FAX +27-461-25049

Patrick Bridges

unread,

Dec 1, 1995, 3:00:00 AM12/1/95

to

David Keppel writes:

>%A Thomas J. Pennello
>%T Very Fast LR Parsing
>%J Proceedings of the SIGPLAN 1986 Symposium on Compiler Construction;
>SIGPLAN Notices
>%V 21
>%N 7
>%D July 1986
>%P 145-151
>%X * Partial evaluation of the table interpreter with resepct to each
>element of the table (though not described as such).
> * On a VAX-like machine, 40,000 to 500,000 lines per minute. On an
>80286, 37,000 to 240,000 lines per minute.
> * FSM converted to assembly language, 2-4X increase in table size.

> ;-D on ( Parsetial Evaluation ) Pardo

Note that Penello's LR parser did not actually include *actions* or
error recovery... Someone I know actually implemented Penello's scheme
inside of Bison, and this sped up the generated parser on a cut-down C
(used in a local compiler class) grammar from 300% to 600% (on
pre-tokenized input). On the other hand, the generated parsers were
enormous since Penello's scheme encodes most of the parsing tables in
instruction state.
--
*** Patrick G. Bridges bri...@cs.arizona.edu ***

Gord Cormack

unread,

Dec 1, 1995, 3:00:00 AM12/1/95

to

> From: pa...@cs.washington.edu (David Keppel)
>
-------------------------------------------------------------------------------

>
> >>[Why recursive descent parsing?]
>
> dd...@cs.princeton.edu (Drew Dean) writes:
> >[Because it's faster.]
>

> %A Thomas J. Pennello
> %T Very Fast LR Parsing
> %J Proceedings of the SIGPLAN 1986 Symposium on Compiler Construction;
> SIGPLAN Notices

@Article{horspool:90a,
author = "R. N. Horspool and M. Whitney",
title = "Even faster {LR} parsing",
journal = SPE,
volume = "20",
number = "6",
pages = "515--535",
month = June,
year = "1990",
}

--
Gordon V. Cormack CS Dept, University of Waterloo, Canada N2L 3G1
gvco...@uwaterloo.ca http://cormack.uwaterloo.ca/~gvcormac

David Keppel

unread,

Dec 9, 1995, 3:00:00 AM12/9/95

to

>pa...@cs.washington.edu writes:
>>[Penello, "Very Fast LR Parsing", CCC'86, pp. 145-151]

bri...@cs.arizona.edu (Patrick Bridges) writes:
>[Someone implemented Penello's scheme inside of Bison. Parsing was
> 3x-6x faster but the parsers were enormous because the tables were
> encoded as instructions.]

Fraser and Henry observed much the same in their parsing code generator.
They used a variety of table compaction techniques in order to reduce
the table size. For example, identifying similar state sets and then
merging them, with the non-matching entries replaced with a test. From
the technical report:

"Table compaction is vital: for the VAX, the uncompacted data
for _each_ binary operator would take over 2MB. Even with
compaction, a typical BURS code generator for the VAX takes
over 43KB, and further reductions are generally regarded as
deisrable. Compaction complicates table interpretation: what
is logically a simple three-deimensional array access ends up
taking about 40 VAX instructions.

This paper shows that it is better to represent BURS tables
with a combination of hard code and data. ... Predictably, the
hard-coded code generator is faster. A typical compilation
with interpreted tables took 49 VAX instructions/node for
labelling and 188 more to identify the code to output. With
hard code, these dropped to 13 and 35.

Less predictably, hard-coding saves space. The smallest
hard-coded BURS code generator for the VAX takes 21.4KB, less
than half of its interpreted predecessor. Hard-coding exposed
important opportunities for compression that were previously
hidden in the tables, and it allowed tailoring of encodings
for the values at hand."

The technical report ends with

"These tricks apply to more than just BURS tables, and the hard
code generated to support them resembles that generated for
very different compiler tables [Penello 86, ...]. It is thus
perhaps time for a general-purpose table encoder, which could
accept tables and automatically perform some of the
transformations above."

The references I have are:

%A Christopher W. Fraser
%A Robert R. Henry
%T Hard-Coding Bottom-Up Code Generation Tables to Save Time and Space
%J Software \- Practice and Experience
%V 21
%N 1
%D January 1991
%P 1-12

%A Christopher W. Fraser
%A Robert R. Henry
%T Hard-Coding Bottom-Up Code Generation Tables to Save Time and Space
%R 90-01-03
%I University of Washington, Department of Computer Science and
Engeineering
%C Seattle, Washington
%W tr-re...@cs.washington.edu
%D January 1990

All very interesting stuff.

;-D on ( Table-driven research ) Pardo

M.M._van_der_Laan

unread,

Dec 9, 1995, 3:00:00 AM12/9/95

to

I want to add two ideas to this discussion:

About quality of error recovery:
Most errors are NOT detected in the parser, but afterwards. Errors
detected by the parser are of the kind 'missing something" and in
those cases there is no need for clever recovery - the program is
simply wrong, so just skip the statement. Therefore, in my
opinion error recovery is no argument for selection of the
right type of grammar.

For those in favor of handwritten parsers: I also like them
because they are so easy to understand and fast. But first
checking a grammer with yacc and then retyping it as code is
double work. Wouldn't everyone be happy with a simple generator
that checks the unattributed grammar and then generates a
recursive-descent parser that you can fill in by hand? Does
this exist?

Mauk van der Laan

Michael Parkes

unread,

Dec 9, 1995, 3:00:00 AM12/9/95

to

na...@ms.mimos.my says...

>Let's compare LALR parsers with LL ones:
>

>o Error detection and recovery (even)

What is this ?. I have used both LL & LALR based tools and this
experience has convinced me never to use an LALR tool again. The
trouble with LALR tools is simple. An LALR parser has no idea about
the context of the syntatic structure. Hence, this leads to two
massive problems. (1). Writing an LALR parser needs to be done with a
bottom up thought process which is unnatural (well it is for me
anyway). (2). The error detection has no idea of the context so it
can't select a natural recovery point and the error meesages produced
are consequently usually poor.

In summary, LL error recovery is vastly better than LALR. Moreover, a
lot of LL tools support automatic error detection and recovery. Most
of these tools ensure that the grammar is a 1-track LL grammar and
provide almost all the benefits of LALR with a lot of extra bonuses.

As an example of such tools consider the following simple rule:

ThisIsSomeRule( OUT Symbols ) =
(
/* Trap Error in First or Third part */
[[ ERROR,0,0,"The whole rule is wrong" ]
CallFirstPart( OUT Symbols ),
[ /* optional but could be choice, iteration or recursion */
/* Trap error for Second part */
[[ ERROR,0,0,"The optional part is wrong" ]
CallSecondPart( OUT Symbols )
]
],
CallThirdPart( OUT Symbols )
]
);

This is a simple example of how LL grammars can naturally support
error detection and recovery even when the situation is complex and
contains constructs such as choice, iteration or recursion. The above
is a factual example for an actual system. No other source code is
required for complete error dectection and correction by the system in
question. Obviously, lots more eoor traps could be added to trap
all-sorts of other problems without much effort.

Regards,

Mike

Michael Sperber [Mr. Preprocessor]

unread,

Dec 9, 1995, 3:00:00 AM12/9/95

to

Patrick Bridges <bri...@cs.arizona.edu> writes:

> Note that Penello's LR parser did not actually include *actions* or
> error recovery... Someone I know actually implemented Penello's scheme
> inside of Bison, and this sped up the generated parser on a cut-down C
> (used in a local compiler class) grammar from 300% to 600% (on
> pre-tokenized input). On the other hand, the generated parsers were
> enormous since Penello's scheme encodes most of the parsing tables in
> instruction state.

Look in:

@INPROCEEDINGS{SperberThiemann1995,
CROSSREF = {PEPM1995},
AUTHOR = {Michael Sperber and Peter Thiemann},
TITLE = {The Essence of {LR} Parsing},
PAGES = {146-155},
YEAR = 1995
}

The paper shows how to get Pennello's scheme by actually using a
partial evaluator. As it turns out, implementing actions and pretty
much every error recovery scheme under the sun is trivial; we've
recently implemented this, too. The parsers, even though written in
Scheme, and without any fine-tuning, perform about as fast as
Bison/byacc.

Also, since good generated code for the parser is in effect a sparse
encoding of the LR state table, I'm surprised that you should get
enormous parsers. It's certainly not inherent in the approach.

--
Cheers =8-} Chipsy

Michael Parkes

unread,

Dec 12, 1995, 3:00:00 AM12/12/95

to

maat...@euronet.nl says...

>Most errors are NOT detected in the parser, but afterwards. Errors
>detected by the parser are of the kind 'missing something" and in

>those cases there is no need for clever recovery.

This is the case if you just want to skip to the next statement. However, a
better compiler will try to skip the faulty part of the clause and continue.
This is very tricky. Why bother - because if the error is in a declaration
you will prevent lots of follow on errors.

> Therefore, in my
>opinion error recovery is no argument for selection of the
>right type of grammar.

If you had tried some of the latest tools you would understand why they are
related and why a well thought-out tool often requires one type of grammar or
another.

>For those in favor of handwritten parsers: I also like them
>because they are so easy to understand and fast. But first
>checking a grammer with yacc and then retyping it as code is
>double work. Wouldn't everyone be happy with a simple generator
>that checks the unattributed grammar and then generates a
>recursive-descent parser that you can fill in by hand? Does
>this exist?

Handwritten parsers a nice if you have lots of manpower and you enjoy messing
around. However, if you don't then some of the better tools are much better
and give a very nice result. This is because the tools can be bothered to
calculate what most people will not. Such as the path to the nearest recovery
point. The minimum change to the semantics to recover from an error. If you
add new clauses they will automatically rework the error recovery and make it
correct.

Additionally, lots of people here refer to 'lex' and 'yacc'. These tools were
developed around 1976 ! and are very old. Most modern tools support
intergrated lexical, syntax and semantic analysis. Some even support code
generation. Hence, even mentioning 'lex' and 'yacc' implies that better tools
have not been tried.

Example: Some time ago I wrote a small (but complete) Pascal front end in
about 3 man months using a tool. The code did not include a single line of
'C' or 'C++' but rather was a specification of the language. If I hadn't
messed around so much I could had probably done it in 2/3 of the time. I was
very happy with the result and best of all the maintainance was considerably
easier than an equivalent hand coded model. In fact while I am here I will
post a small fragment.

/************************************************************/
/************************************************************/
/* */
/* The declaration of Pascal labels. */
/* */
/* The declaration of labels in Pascal must be done at */
/* the head of a block. */
/* */
/************************************************************/
/************************************************************/

Labels( INOUT env ) =
[
[[ ERROR,None,500,MESSAGE(2000) ] /* Deals with errors. */
IN Label, /* Accepts the token 'LABEL'. */
(
LabelList( INOUT env )
),
IN SemiColon /* Accepts the token ';'. */
]
];

LabelList( INOUT env ) =
[[ ERROR,None,50,MESSAGE(2010) ] /* If error skip current label. */
LabelValidate( INOUT env ),
{
IN Comma, /* Accepts the token ','. */
(
LabelValidate( INOUT env )
)
}
];

LabelValidate( INOUT env ) =
[[ ERROR,1,None,MESSAGE(2020) ] /* Put new label in symbol */
LabelDuplicate /* table if can't complain. */
(
OUT [],
IN env.locallabels[UNION],
IN env.nonlocallabels[OVERRIDE]
)
];

LabelDuplicate( OUT gotolabels,IN gotolabels,IN gotolabels ) =
[[ ERROR,None,20,MESSAGE(2030) ] /* Moan about syntax of */
LabelName /* the label if needed. */
(
IN gotolabels
)
];

/* Must allow jumping out of functions. */
/* Hence must add level numbers to labels */
/* to allow the stack to be reset. */

LabelName( IN [ label -> { NoFlags,UNIQUENUMBER,label } ] ) =
( IN Integer( IN label ) ); /* Accept label token. */

The above code deals with Pascal labels in all respects except lexically.
However, the lexical stuff is simple such as:

Label = NOCASE "LABEL";
SemiColon = ";";

The tool that was used would report any error that was dectected and
automatically recover. Some examples being.

'file.pas', line 4, character 27: Error - Badly formed label.
Expected 'Integer' but found 'Identifier' ('v1').

or

'file.pas', line 4, character 27: Error - Duplicate label '6' (ignored).

All this is automatic no extra coding required. The new label symbols are
added to the symbol table for later processing. So the example is truely
complete.

This particular tool supports all the usual symbol table features such as
scoping and so on. Hence, what I am really trying to say is that it is not
quite 'lex' and 'yacc' is it. This is just one of the really great tools out
there. Go try them !.

Regards,

Mike

Ken Walter

unread,

Dec 16, 1995, 3:00:00 AM12/16/95

to

:>na...@ms.mimos.my says...
:>
:>What is this ?. I have used both LL & LALR based tools and this

:>experience has convinced me never to use an LALR tool again. The
:>trouble with LALR tools is simple. An LALR parser has no idea about
:>the context of the syntatic structure. Hence, this leads to two
:>massive problems. (1). Writing an LALR parser needs to be done with a
:>bottom up thought process which is unnatural (well it is for me
:>anyway). (2). The error detection has no idea of the context so it
:>can't select a natural recovery point and the error meesages produced
:>are consequently usually poor.
:>

:>In summary, LL error recovery is vastly better than LALR. ...

You must have had a very poor implementation. I've worked with a
LALR(k) parser and found it much better than the LL(k) junk over all.
Any parser must address the error recovery problem. My LALR gave a
list of expected tokens and a list a productions in progress. The
parse tables could actual be used to produce a insert or delete
current token to continue the parse. Even recursive decent parsers
have problems recovering if the problem is not addressed.

Ken Walter

Steinar Bang

unread,

Dec 17, 1995, 3:00:00 AM12/17/95

to

maat...@euronet.nl says...

>> ... Wouldn't everyone be happy with a simple generator that checks

>> the unattributed grammar and then generates a recursive-descent
>> parser that you can fill in by hand? Does this exist?

Does PCCTS (Purdue Compiler Construction Tool Set) fill your needs?

Integrated parser and lexer spec., you can have productions take
arguments and return values, constructs a recursive descent parser
(that can be in C, or a self contained C++ class), can build an AST to
be traversed later. etc. etc.

See <URL:news:comp.compilers.tools.pccts> and the free compilers list
(posted to this newsgroup), for more info.

- Steinar

Dinesh Kulkarni

unread,

Dec 17, 1995, 3:00:00 AM12/17/95

to

mpa...@oz.net (Michael Parkes) writes:
>Additionally, lots of people here refer to 'lex' and 'yacc'. These tools were
>developed around 1976 ! and are very old. Most modern tools support
>intergrated lexical, syntax and semantic analysis. Some even support code
>generation. Hence, even mentioning 'lex' and 'yacc' implies that better tools
>have not been tried.

>scoping and so on. Hence, what I am really trying to say is that it is not
>quite 'lex' and 'yacc' is it. This is just one of the really great tools out
>there. Go try them !.

It is not clear which tool you are referring to. More generically,
would you care to recommend any such 'state of the art' tools? Lex &
yacc (or flex & bison) are so popular (IMHO) because they are widely
available on a number of platforms, stable and well-understood (help
in the form of books, examples, experienced users etc. is
available). Are there any 'better' tools that

1. are available on multiple platforms;
2. are at least reasonably stable and complete
(i.e. not incomplete prototypes or orphans abandoned by students who have
graduated);
3. do not cost a fortune?

If there are such tools, I am sure many on this forum would like to
know about them. Until then, lex & yacc will continue to rule - even
if they are far behind the state of the art as you claim.

Thanks.

Dinesh
[People seem to like PCCTS. -John]

Scott Stanchfield

unread,

Dec 18, 1995, 3:00:00 AM12/18/95

to

Tables can be used for quite a bit, but when it comes to debugging a
parser you have written, they can be incredibly obtuse.

An LL(k) recursive-descent parser (hand-coded or generated by a tool
like ANTLR in PCCTS) is easy to understand and debug.)

(I wonder if Interplay has though about Recursive Descent?)

-- Scott
Scott Stanchfield McCabe & Associates -- Columbia, Maryland

Gert A. Tijssen

unread,

Dec 19, 1995, 3:00:00 AM12/19/95

to

maat...@euronet.nl (M.M._van_der_Laan) writes:
>I want to add two ideas to this discussion:

>About quality of error recovery:

>Most errors are NOT detected in the parser, but afterwards. Errors
>detected by the parser are of the kind 'missing something" and in

>those cases there is no need for clever recovery - the program is

>simply wrong, so just skip the statement. Therefore, in my

>opinion error recovery is no argument for selection of the
>right type of grammar.

>For those in favor of handwritten parsers: I also like them
>because they are so easy to understand and fast........

About syntactical error-recovery in the various parsers (hand-written
or automatically generated). I remember that, some 10 years ago, I
saw an automatically generated LL(1) parser for Pascal(generated into
Pascal, easy to read, fast etc.), that has some capabilities that many
(very clever) hand-written parsers don't have.

Consider the following experiment:
Take any correct Pascal program and replace every semicolon (';') by a
space. Then, see what your favourite compiler does with this program.
A good parser will generate the same amount of `` ';' inserted'' error
messages, as the program contained non-redundant semicolons.

Watch and shiver!

I have seen seen compilers, that skip all declarations because the
semicolons are missing, and produce a lot of exotic semantic error
messages.