Recursive Descent Parsers and YACC

Michael D Mellinger

unread,

Nov 15, 1990, 5:43:37 PM11/15/90

to

Can someone give me an estimate on how much faster parsing can be made by
writing a recursive-descent parser instead of using Yacc and Lex? Is there
enough of a difference to consider using a RDP in a commercial C compiler?
PC compilers seem to be very fast when compared to workstation compilers such
as GCC. Is optimization(or lack of) the crucial difference in PC and
workstation compilation speeds?

-Mike
[It is my distinct impression that yacc parsers work fairly quickly.
Particularly on machines with slow procedure calls, they can be faster than
recursive descent. Lex is slow, but flex uses newer techniques so that it
is usually as fast as a hand-coded lexer. Unix compilers are slow because
they use a separate assembly pass and do more optimizations. If you use,
e.g., MS C with all of its optimization options it's no speed demon, either.
-John]
--
Send compilers articles to comp...@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.

David Keppel

unread,

Nov 16, 1990, 4:03:27 PM11/16/90

to

mel...@psuvax1.cs.psu.edu (Michael D Mellinger) writes:
>[How much faster is parsing with Recursive Descent instead of YACC?]

Mike Percy

unread,

Nov 16, 1990, 4:03:54 PM11/16/90

to

mel...@psuvax1.cs.psu.edu (Michael D Mellinger) writes:

>Can someone give me an estimate on how much faster parsing can be made by
>writing a recursive-descent parser instead of using Yacc and Lex? Is there
>enough of a difference to consider using a RDP in a commercial C compiler?

I would tend to believe that table driven parsers (YACC) are faster in
execution than recursive descent parsers. The stack and function call
overhead alone can get to be pretty hairy, especially in deeply recursive
parses.

What is faster, in my experience, is the speed in which you can get a parser
running. Trying to coax a grammer that YACC likes (conflict-free) is
downright a pain, but the recursive descent parser generators I've worked
on/with have much laxer restrictions on the grammar than LR(1). In theory,
one could have a non-deterministic, backtracking RD parser (seen one or two
done in Prolog), but most generators at least require the grammar to be
deterministic (is this the same as LR(1)?

If you want to get a parser up quickly, use an Early's algorithm-based
parser. It will run slow, but it runs the first time, giving you time to
massage the grammar for a faster parser type.

Hank Dietz

unread,

Nov 17, 1990, 12:59:15 PM11/17/90

to

In article <F7p?bk...@cs.psu.edu> mel...@psuvax1.cs.psu.edu (Michael D Mellinger) writes:
>Can someone give me an estimate on how much faster parsing can be made by

>writing a recursive-descent parser instead of using Yacc and Lex? ...

PCCTS (the Purdue Compiler-Construction Tool Set) builds LL(k) parsers
which average about 2-3x as fast as YACC's parsers. The primary
difference between them is that LL stack management can be implemented
directly with machine instructions on most machines, whereas LR stack
handling requires an interpreter... hence the 2-3x difference (for
example, it was about 2x for Pascal).

-hankd

Mark Zenier

unread,

Nov 17, 1990, 2:04:51 PM11/17/90

to

> [Lex is slow, but flex uses newer techniques so that it
> is usually as fast as a hand-coded lexer. ... -John]

What kind of speedup can be gained by using flex instead of lex?

How much does the include file / new buffer stuff in flex 2.3 cost
(in terms of speed)?

Mark Zenier
ma...@ssc.uucp

Henry Spencer

unread,

Nov 17, 1990, 5:05:36 PM11/17/90

to

In article <F7p?bk...@cs.psu.edu> mel...@psuvax1.cs.psu.edu (Michael D Mellinger) writes:

>Can someone give me an estimate on how much faster parsing can be made by
>writing a recursive-descent parser instead of using Yacc and Lex? Is there
>enough of a difference to consider using a RDP in a commercial C compiler?

>...

>[It is my distinct impression that yacc parsers work fairly quickly.
>Particularly on machines with slow procedure calls, they can be faster than

>recursive descent...

Of note here is the recent EUUG paper from Ken Thompson on the Plan 9 C
compiler. (This is a small, fast compiler generating quite good code.)
He uses a yacc parser, but says that the compiler spends a lot of time
parsing and he thinks a recursive-descent parser would be substantially
faster, now that the compiler and the language (which has some changes
from orthodox C) have settled down.

He also notes that using the manufacturer's fancy "call" instruction is
usually a bad idea, since a stripped-down one (fortunately both the VAX
and the NS32k series have such) is generally much faster. His assessment
of relative speeds probably assumes an efficient call sequence.
--
Henry Spencer at U of Toronto Zoology
he...@zoo.toronto.edu utzoo!henry

Bruce Hoult

unread,

Nov 18, 1990, 12:11:29 AM11/18/90

to

In article <F7p?bk...@cs.psu.edu> mel...@psuvax1.cs.psu.edu (Michael D Mellinger) writes:

> Can someone give me an estimate on how much faster parsing can be made by
> writing a recursive-descent parser instead of using Yacc and Lex? Is there
> enough of a difference to consider using a RDP in a commercial C compiler?

Bjarne Stroustrup is on record as saying that he regrets having used yacc to
generate the parser in CFront (AT&T's C++ compiler), not because of any
difference in parsing speed, but because of the extreme difficulty of
providing meaningful syntax errors with yacc, compared to using RD.

Walter Bright used RD in Zortech C++, and it's no slouch in compile speed.

-- Bruce Hoult
Bruce...@actrix.gen.nz

David Taylor

unread,

Nov 19, 1990, 8:01:30 PM11/19/90

to

gri...@hubcap.clemson.edu (Mike Percy) writes:

>What is faster, in my experience, is the speed in which you can get a parser
>running. Trying to coax a grammer that YACC likes (conflict-free) is
>downright a pain, but the recursive descent parser generators I've worked
>on/with have much laxer restrictions on the grammar than LR(1). In theory,
>one could have a non-deterministic, backtracking RD parser (seen one or two
>done in Prolog), but most generators at least require the grammar to be
>deterministic (is this the same as LR(1)?

Recursive descent parser == LL(0)

LR(1) is LESS restrictive.

You can still use a grammar with yacc that has some conflicts as it
has some simple rules for conflict resolution. Some of these
recursive descent parser generators that you mention probably use
similar rules for conflict resolution but don't bother to tell you
about their use.

Look up some of the compiler theory ... it helps when you're trying to
design the grammar.
--
David Taylor Labtam Information Systems Pty. Ltd.
Work: (03) 587-1444 UUCP: !uunet!munnari!labtam.oz!dave
FAX: (03) 580-5581 Internet: da...@labtam.oz.au
Home: (03) 857-5660 (occasionally)

id AA00722

unread,

Nov 20, 1990, 3:41:36 PM11/20/90

to hela!uunet!comp-compilers

In article <11...@hubcap.clemson.edu> gri...@hubcap.clemson.edu (Mike Percy) writes:
>the recursive descent parser generators I've worked on/with [...]

>If you want to get a parser up quickly, use an Early's algorithm-based
>parser.

Does anyone know of any of the above kinds of parser freely available in
source?

-David West d...@iti.org
[Quite a while ago I used an Earley parser in IMP-10, but I haven't seen
one lately. Personally, I think the effort spent in pushing a grammar into
LALR or LL(k) is usually worth it, one often finds surprising ambiguities.
-John]

Vern Paxson

unread,

Nov 21, 1990, 5:47:40 PM11/21/90

to

In article <5...@ssc.UUCP> ma...@ssc.UUCP (Mark Zenier) writes:
> What kind of speedup can be gained by using flex instead of lex?

Here are some old timings (made in April, 1988). They compare various
flex compression options with AT&T lex. These aren't quite guaranteed-
not-to-exceed numbers (flex now takes advantage of rule sets that don't
require backtracking to squeeze out a few more cycles) but fairly close.
The scanner being timed tokenizes C, including recognizing keywords (which
maximizes flex's advantages).

> How much does the include file / new buffer stuff in flex 2.3 cost
> (in terms of speed)?

Virtually nothing. The buffer structures are only consulted when the
end of the buffer is reached (and sometimes when doing an input() or an
unput()). Since the default buffer size is 16K bytes, the overhead is
negligble.

Vern

flex vs. lex timings for a C tokenizer which includes keywords:

Generation times:

lex 83.0 secs
flex 3.9 # fully compressed scanner
flex -cfe 7.1 # uncompressed table, equivalence classes
flex -cf 15.0 # uncompressed table, no equivalence classes

Scanner object file sizes:

lex 41.0K bytes
flex 9.4K
flex -cfe 49.6K
flex -cf 126.5K

Running times on a 28,088 line input (685K characters):

lex 29.8 secs
flex 19.3
flex -cfe 9.0
flex -cf 7.8

The timings were made on a Sun 3/60. All times are user + system CPU time,
and no hashing of identifiers was done.

Summary:

For about the same sized scanner, you get a factor of 3 in performance.
For a 30% faster scanner, you get a scanner 1/4th the size, and it's
generated in 1/20th the time.
For a scanner that's 3 times larger, you get a factor of 3.8 in
performance.

Josef Grosch

unread,

Nov 22, 1990, 9:47:29 AM11/22/90

to

Michael D Mellinger writes:
[Can someone give me an estimate on how much faster parsing can be made by

writing a recursive-descent parser instead of using Yacc and Lex?]

I did a comparison of generated parsers on a SUN 3 with MC 68020 processor
(excluding scanning):

tool speed (tokens per second)

Yacc 16,000
Lalr - our LALR(1) parser generator 35,000
Ell - our LL(1) recursive descent parser generator 55,000

Pure parsing can be made faster by a factor of 2, 3, or more compared to Yacc.
However, as pure parsing takes only 5 to 10 % of the whole compilation time,
it is not the bottle neck and does not matter too much.
Scanning is much more time critical, as it takes 30 to 50 %.
Therefore Lex can not be used for high speed compilers.
Scanner generators like Rex and flex are up to 5 times faster.

Josef Grosch
---
gro...@gmdka.uucp

Mike Percy

unread,

Nov 23, 1990, 11:31:03 AM11/23/90

to

da...@labtam.labtam.oz.au (David Taylor) writes:

>gri...@hubcap.clemson.edu (Mike Percy) writes:
[some of my comments WRT RD parsers]

>Recursive descent parser == LL(0)

My references mention only that RD parsers are used with "extended-tree
grammars," which are then defined. Perhaps I'm using the worng
references, or have missed something, but I never noticed that RD ==
LL(0).

>LR(1) is LESS restrictive.
No argument there.

>You can still use a grammar with yacc that has some conflicts as it
>has some simple rules for conflict resolution. Some of these
>recursive descent parser generators that you mention probably use
>similar rules for conflict resolution but don't bother to tell you
>about their use.

Actually, doesn't yacc and its ilk use LALR? LALR can introduce
conflicts that aren't in LR. As for conflict resolution methods, I'm
sure that they all do similar things.

>Look up some of the compiler theory ... it helps when you're trying to
>design the grammar.

It would also help if the language designers would make life a little
easier... As for the theory, I do look it up. RD parsers weren't even
mentioned in compiler class, except by me. I was told we weren't going
to look at RD parsers. LL was defined, but not explored like LR and
LALR were.

Mike Percy gri...@hubcap.clemson.edu
ISD, Clemson University msp...@clemson.BITNET
(803)656-3780 msp...@clemson.clemson.edu

Jeff Prothero

unread,

Nov 23, 1990, 3:19:14 PM11/23/90

to

gro...@gmdka.uucp (Josef Grosch) writes:
>I did a comparison of generated parsers on a SUN 3 with MC 68020 processor
>(excluding scanning):
>
>tool speed (tokens per second)
>
>Yacc 16,000
>Lalr - our LALR(1) parser generator 35,000
>Ell - our LL(1) recursive descent parser generator 55,000
>
>Pure parsing can be made faster by a factor of 2, 3, or more compared to Yacc.

Note that "LALR(1) parser generator" doesn't have to mean "table driven".
Don't have the reference handy, but someone (I believe MetaWare's Tom
Penello?) has obtained big speedups by using procedural implementations of
the LALR(1) tables. Robert Henry has done much the same thing with code
generation tables in CODEGEN (UW technical report), and observed that table
compression and lookup techniques are essentially application-independent,
and that a general tool could be developed to factor, compress, and
implement tables, including generating fast procedural implementations.

Seems to me procedural table implementations bring us full circle: We
started with fast random code, then switched to slower table-driven code as
an aid to portability and code maintainablility. Now that we can generate
the unportable, unmaintainable code automatically from declarative
specifications, the speed consideration is coming to the fore again. No?

Jeff Prothero
j...@u.washington.edu

Eliot Moss

unread,

Nov 26, 1990, 8:27:14 AM11/26/90

to

Folks, what is this LL(0) stuff? You need at least one token lookahead to
choose the correct production in LL or recursive descent parsing, so LL(k)
makes sense only for k > 0.
--
J. Eliot B. Moss, Assistant Professor
Department of Computer and Information Science
Lederle Graduate Research Center
University of Massachusetts
Amherst, MA 01003
(413) 545-4206; Mo...@cs.umass.edu

Michael D Mellinger

unread,

Nov 25, 1990, 7:08:42 PM11/25/90

to

In article <901122124...@gmdka.uucp> gro...@gmdka.uucp (Josef Grosch) writes:
> Pure parsing can be made faster by a factor of 2, 3, or more compared to
> Yacc. However, as pure parsing takes only 5 to 10 % of the whole
> compilation time, it is not the bottle neck and does not matter too much.
> Scanning is much more time critical, as it takes 30 to 50 %. Therefore Lex
> can not be used for high speed compilers. Scanner generators like Rex and
> flex are up to 5 times faster.

I have been playing around with gcc (the NeXT Objective C version) and have
found that by giving a -Q command-line option gcc will tell you how much
time it spends in each section of compilation. Without optimization turned
on, it seems to spend most its time parsing code. gcc doesn't use Lex, and
it is still much slower than Turbo C or Think C. Here is the result of
compiling a 149 line program. I would like to speed GCC up. Bison is used
when compiling GCC and not YACC. Would switching to another parsing tool
give me the speed-up I desire? A FAST FREE compiler would sure complement a
FREE, slow optimizing compiler. Besides gcc is the fastest compiler
available for the NeXT.

-Mike

handel> cc -Q -c CardDrawView.m
_1_CardDrawView _2_CardDrawView _3_CardDrawView _4_CardDrawView
time in parse: 5.939384
time in integration: 0.000000
time in jump: 0.000000
time in cse: 0.000000
time in loop: 0.000000
time in flow: 0.188038
time in combine: 0.000000
time in local-alloc: 0.000000
time in global-alloc: 0.246630
time in final: 0.168993
time in varconst: 0.281626
time in symout: 0.000000
time in dump: 0.000000

handel> wc CardDrawView.m
149 249 2690 CardDrawView.m
[I suspect that there is considerably more time spent passing code through
the assembler and linker than is spent by yacc. Keep in mind that the yacc
code switches out to action routines that build parse trees and symbol tables,
and such routines will be present no matter how you parse. -John]

John Aycock

unread,

Nov 26, 1990, 11:05:22 PM11/26/90

to

| Reply-To: mo...@cs.umass.edu (Eliot Moss)

|
| Folks, what is this LL(0) stuff? You need at least one token lookahead to
| choose the correct production in LL or recursive descent parsing, so LL(k)
| makes sense only for k > 0.

Not so. Consider the following grammar:

S -> A B C
A -> a
B -> b
C -> c

No lookahead is required, yet a parser can predict that the first token will
be an 'a', the second a 'b', and so on. As an example of where a grammar like
this could be applied, I recently wrote an EBNF->BNF converter for a compiler
class that took input files of the form:

<file> -> <terminal-section> <nonterminal-section> <start> <rules> .
<terminal-section> -> TERMINALS <terminal-list> END .
<nonterminal-section> -> NONTERMINALS <nonterminal-list> END .
&c...

Admittedly, a rather rigid and restrictive format, but it suited the purpose
nicely.
:ja
--
John D. Aycock ayc...@cpsc.ucalgary.ca
(403) 285-8727
[Justin Graver <gra...@ufl.edu> send a similar message. -John]