Why Stroustrup do not like YACC grammar for C++ ?

Dongxiao Yue

unread,

Apr 12, 1994, 12:37:51 PM4/12/94

to

I just read the first few chapters of Bjarne's book the
"The Design and Evolution of C++", it is highly interesting.
In his book, Bjarne mentioned that used yacc for the first
Cfront was the correct choice at that time,
but in retrospect it was a bad choice, and
it should had been down in recursive descent.
But He did not give much explanation, except that at that time there was not
even a yacc grammar for C.

I am wondering if there are other reasons behind his argument. I am
studying Compiler, and I learned generally LL(k) grammar is much more
difficult to design than a LALR(1) grammar, also semantic checking is
somewhat more difficult in LL(k). So last quarter when I implemented a C++
parser for a project, I used yacc++ and C++,
the only extra Lexical analysis is to return
a class_name token by consulting class_name_tables at a couple of locations.

Since I was convinced by Aho's book that LL(1) is less powerful and
harder to implement, I would appreciate if someone (including the creator )
could shed some light on this issue.

Dongxiao Yue

Paul L Fagerburg

unread,

Apr 12, 1994, 1:16:47 PM4/12/94

to

y...@tuna.micro.umn.edu (Dongxiao Yue) writes:

... [stuff about Stroustrup saying he should not have used YACC]

>I am wondering if there are other reasons behind his argument. I am
>studying Compiler, and I learned generally LL(k) grammar is much more
>difficult to design than a LALR(1) grammar, also semantic checking is
>somewhat more difficult in LL(k).

LL(k) can be harder to design, but try wading through

stmt_list: stmt_list stmt
| stmt

in LR style, or

stmt_list: (stmt)+

in LL style.

It's much more intuitive what's going on in the LL grammar. Besides,
there are tools (like the Purdue Compiler Construction ToolSet) that
will parse LL(k) with infinite lookahead, if you want it.

>Since I was convinced by Aho's book that LL(1) is less powerful and
>harder to implement, I would appreciate if someone (including the creator )
>could shed some light on this issue.

LL(k) with infinite lookahead is theoretically *more* powerful than
LALR. Somebody got a PhD off that theory (can't quite remember who).

Add to that the fact that in LALR, you can't have inherited attributes,
only synthesized, and you've got some good arguments for using LL(k).

I've written a Pascal compiler (for a class) in LL(1) by hand, and also
with tools, and I also tried YACC. I found the LL approach much more
feasible, and hell of a lot easier.
--
-------------------------------- Paul Fagerburg (p...@bert.cs.byu.edu)

"We are Microsoft. OS/2 is irrelevant. Unix is irrelevant.
Open systems are futile. Prepare to be assimilated."

Dongxiao Yue

unread,

Apr 13, 1994, 1:06:03 AM4/13/94

to

In <Co5nE...@news.cis.umn.edu> y...@tuna.micro.umn.edu (Dongxiao Yue) writes:

>In his book, Bjarne mentioned that used yacc for the first
>Cfront was the correct choice at that time,

I read the book again, the above statement is inaccurate, very sorry.
quote: "For most people, it(yacc) would have been the right choice.
In retrospect, for me(Stroustrup) and C++ it was a bad mistake. "

Dongxiao Yue

unread,

Apr 13, 1994, 5:07:38 PM4/13/94

to

In <plf.76...@bert.cs.byu.edu> p...@bert.cs.byu.edu (Paul L Fagerburg) writes:

>y...@tuna.micro.umn.edu (Dongxiao Yue) writes:

>LL(k) can be harder to design, but try wading through

>stmt_list: stmt_list stmt
> | stmt

>in LR style, or

>stmt_list: (stmt)+

>in LL style.

The thing I am afraid of is sometimes left factoring and elimination of
left recursion will leave the grammar intractable.

Given the expressions as example, I really don't like this

E ---> T E'
E' --->
|+T E'

in LR
we have E ---> E+T

And I don't think the LR grammar for stmt_list is too bad.

>It's much more intuitive what's going on in the LL grammar. Besides,

I just think the opposite.

>there are tools (like the Purdue Compiler Construction ToolSet) that
>will parse LL(k) with infinite lookahead, if you want it.

Infinte look ahead, does that mean it continues to look ahead until
there is a resolution?

>Add to that the fact that in LALR, you can't have inherited attributes,
>only synthesized, and you've got some good arguments for using LL(k).

That's not a problem, and it is easy to get any attributes if we rely
on ourselves a little more.

for example

var_decl: type decl_list{$2->add_type_to_all($1);}
decl_list: decl_list decl{$$=$1->add_next($2);}
|decl {$$=new symbol($1);}
;

assign decl_list a linked list of symbols.

I am wondering if there is public domain LL(k) parser for C++.

Paul L Fagerburg

unread,

Apr 13, 1994, 8:27:54 PM4/13/94

to

y...@tuna.micro.umn.edu (Dongxiao Yue) writes:

> The thing I am afraid of is sometimes left factoring and elimination of
> left recursion will leave the grammar intractable.
>
> Given the expressions as example, I really don't like this

> E ---> T E'
> E' --->
> |+T E'

> in LR
> we have E ---> E+T

In LL(k), we usually do
E -> T + E
T -> F * T

etc. I can't remember the whole thing, but I have a parser that works
fine on all expressions. Plus, no messy %left %right %unary stuff and
shift/reduce conflicts.

> Infinte look ahead, does that mean it continues to look ahead until
> there is a resolution?

It means that the parser will read to the end of the file if it has to
to disambiguate productions. If you're not afraid of LL(k>1) then it's
simple. If you want LL(1), then you get the E -> T E' and all that crap.

> I am wondering if there is public domain LL(k) parser for C++.

The Purdue Compiler Construction ToolSet (mentioned in last post,
availabe via anon ftp from marvin.ecn.purdue.edu) will produce C or C++
output; you decide.

--
-------------------------------- Paul Fagerburg (p...@bert.cs.byu.edu)

"If I was being executed by injection, I'd clean up my cell real neat. Then,
when they came to get me, I'd say, `Injection? I thought you said
inspection.' They'd probably feel bad, and maybe I could get out of it."

Dongxiao Yue

unread,

Apr 14, 1994, 9:59:54 PM4/14/94

to

In <plf.76...@bert.cs.byu.edu> p...@bert.cs.byu.edu (Paul L Fagerburg) writes:

>y...@tuna.micro.umn.edu (Dongxiao Yue) writes:

>> The thing I am afraid of is sometimes left factoring and elimination of
>> left recursion will leave the grammar intractable.
>>
>> Given the expressions as example, I really don't like this

>> E ---> T E'
>> E' --->
>> |+T E'

>> in LR
>> we have E ---> E+T

>In LL(k), we usually do
>E -> T + E
>T -> F * T

Sorry to repeat something that is in text book.
But
E-> T + E alone will not work since E can also be just T,
to make it LL(1), we must left factor. so my we need the grmmar in
my original post. Maybe we can live with
E --> T | T+ E if LL(k) k>1,
but this will be not efficient compared to LALR(1).
efficient performance is vital for a compiler, I am sick of
sitting there waiting for my program to get compiled.

>etc. I can't remember the whole thing, but I have a parser that works
>fine on all expressions. Plus, no messy %left %right %unary stuff and
>shift/reduce conflicts.

No, the E-->E+T grammar does not need to consider associativity,
only the E--> E+E, E->E*E ... stuff do.

>> Infinte look ahead, does that mean it continues to look ahead until
>> there is a resolution?

>It means that the parser will read to the end of the file if it has to
>to disambiguate productions. If you're not afraid of LL(k>1) then it's
>simple. If you want LL(1), then you get the E -> T E' and all that crap.

>> I am wondering if there is public domain LL(k) parser for C++.

>The Purdue Compiler Construction ToolSet (mentioned in last post,
>availabe via anon ftp from marvin.ecn.purdue.edu) will produce C or C++
>output; you decide.

Thanks alot.

Kym Horsell

unread,

Apr 21, 1994, 10:12:23 PM4/21/94

to

In article <plf.76...@bert.cs.byu.edu> p...@bert.cs.byu.edu (Paul L Fagerburg) writes:
>LL(k) can be harder to design, but try wading through
>stmt_list: stmt_list stmt
> | stmt
>in LR style, or
>stmt_list: (stmt)+
>in LL style.

You are only talking about a matter of *notation*. Notation like
{stmt} has (not much) to do with whether the grammar is
or is not LR(1) -- that is a matter of whether it generates
conflicts in the resulting state machine. Some LR(1) tools in fact
handle an extended notation, including {X} or [X] or even {X}{m,n}.

>LL(k) with infinite lookahead is theoretically *more* powerful than
>LALR. Somebody got a PhD off that theory (can't quite remember who).

If true -- let's see you build one. ;-) But what about *finite* k?
I understand that an LALR grammar exists for any language that has an
LL(k) grammar...

>Add to that the fact that in LALR, you can't have inherited attributes,
>only synthesized, and you've got some good arguments for using LL(k).

This is "misleading". With "negative offsets" even YACC (not really
a good tool in any case) can handle inherited attributes.

>I've written a Pascal compiler (for a class) in LL(1) by hand, and also
>with tools, and I also tried YACC. I found the LL approach much more
>feasible, and hell of a lot easier.

Finally, I would agree that generating an LL(1) parser by hand is
typically a somewhat easier job than an LR(1) parser.

-kym