-Mike
[It is my distinct impression that yacc parsers work fairly quickly.
Particularly on machines with slow procedure calls, they can be faster than
recursive descent. Lex is slow, but flex uses newer techniques so that it
is usually as fast as a hand-coded lexer. Unix compilers are slow because
they use a separate assembly pass and do more optimizations. If you use,
e.g., MS C with all of its optimization options it's no speed demon, either.
-John]
--
Send compilers articles to comp...@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
See also:
%A Frank G. Pagan
%T Comparative Efficiency of General and Residual Parsers
%J SIGPLAN Notices
%V 25
%N 4
%D April 1990
%P 59-68
%X * Concise and good introduction to partial evaluation
* Focus on partial evaluation of parsers generating pure code.
* Hand traslation of Pascal and C.
* Time (2-10X faster) and size (0.5-1.25 larger).
%A Thomas J. Pennello
%T Very Fast LR Parsing
%J Proceedings of the SIGPLAN 1986 Symposium on Compiler Construction;
SIGPLAN Notices
%V 21
%N 7
%D July 1986
%P 145-151
%X * Partial evaluation of the table interpreter with resepct to each
element of the table.
* Speedup: on a VAX-like machine, 40,000 to 500,000 lines per minute.
On an 80286, 37,000 to 240,000 lines per minute.
* FSM converted to assembly language.
* 2-4X increase in table size.
On a machine with fast procedure call and so forth, I GUESS that these
schemes win on the basis of better register allocation. I don't know
of any fundamental big-oh differences between YACC-style and recursive
descent.
;-D oN ( YACCity-YACC -- Don't Parse Back ) Pardo
--
pa...@cs.washington.edu
{rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
>Can someone give me an estimate on how much faster parsing can be made by
>writing a recursive-descent parser instead of using Yacc and Lex? Is there
>enough of a difference to consider using a RDP in a commercial C compiler?
I would tend to believe that table driven parsers (YACC) are faster in
execution than recursive descent parsers. The stack and function call
overhead alone can get to be pretty hairy, especially in deeply recursive
parses.
What is faster, in my experience, is the speed in which you can get a parser
running. Trying to coax a grammer that YACC likes (conflict-free) is
downright a pain, but the recursive descent parser generators I've worked
on/with have much laxer restrictions on the grammar than LR(1). In theory,
one could have a non-deterministic, backtracking RD parser (seen one or two
done in Prolog), but most generators at least require the grammar to be
deterministic (is this the same as LR(1)?
If you want to get a parser up quickly, use an Early's algorithm-based
parser. It will run slow, but it runs the first time, giving you time to
massage the grammar for a faster parser type.
PCCTS (the Purdue Compiler-Construction Tool Set) builds LL(k) parsers
which average about 2-3x as fast as YACC's parsers. The primary
difference between them is that LL stack management can be implemented
directly with machine instructions on most machines, whereas LR stack
handling requires an interpreter... hence the 2-3x difference (for
example, it was about 2x for Pascal).
-hankd
What kind of speedup can be gained by using flex instead of lex?
How much does the include file / new buffer stuff in flex 2.3 cost
(in terms of speed)?
Mark Zenier
ma...@ssc.uucp
Of note here is the recent EUUG paper from Ken Thompson on the Plan 9 C
compiler. (This is a small, fast compiler generating quite good code.)
He uses a yacc parser, but says that the compiler spends a lot of time
parsing and he thinks a recursive-descent parser would be substantially
faster, now that the compiler and the language (which has some changes
from orthodox C) have settled down.
He also notes that using the manufacturer's fancy "call" instruction is
usually a bad idea, since a stripped-down one (fortunately both the VAX
and the NS32k series have such) is generally much faster. His assessment
of relative speeds probably assumes an efficient call sequence.
--
Henry Spencer at U of Toronto Zoology
he...@zoo.toronto.edu utzoo!henry
Bjarne Stroustrup is on record as saying that he regrets having used yacc to
generate the parser in CFront (AT&T's C++ compiler), not because of any
difference in parsing speed, but because of the extreme difficulty of
providing meaningful syntax errors with yacc, compared to using RD.
Walter Bright used RD in Zortech C++, and it's no slouch in compile speed.
-- Bruce Hoult
Bruce...@actrix.gen.nz
>What is faster, in my experience, is the speed in which you can get a parser
>running. Trying to coax a grammer that YACC likes (conflict-free) is
>downright a pain, but the recursive descent parser generators I've worked
>on/with have much laxer restrictions on the grammar than LR(1). In theory,
>one could have a non-deterministic, backtracking RD parser (seen one or two
>done in Prolog), but most generators at least require the grammar to be
>deterministic (is this the same as LR(1)?
Recursive descent parser == LL(0)
LR(1) is LESS restrictive.
You can still use a grammar with yacc that has some conflicts as it
has some simple rules for conflict resolution. Some of these
recursive descent parser generators that you mention probably use
similar rules for conflict resolution but don't bother to tell you
about their use.
Look up some of the compiler theory ... it helps when you're trying to
design the grammar.
--
David Taylor Labtam Information Systems Pty. Ltd.
Work: (03) 587-1444 UUCP: !uunet!munnari!labtam.oz!dave
FAX: (03) 580-5581 Internet: da...@labtam.oz.au
Home: (03) 857-5660 (occasionally)
>If you want to get a parser up quickly, use an Early's algorithm-based
>parser.
Does anyone know of any of the above kinds of parser freely available in
source?
-David West d...@iti.org
[Quite a while ago I used an Earley parser in IMP-10, but I haven't seen
one lately. Personally, I think the effort spent in pushing a grammar into
LALR or LL(k) is usually worth it, one often finds surprising ambiguities.
-John]
Here are some old timings (made in April, 1988). They compare various
flex compression options with AT&T lex. These aren't quite guaranteed-
not-to-exceed numbers (flex now takes advantage of rule sets that don't
require backtracking to squeeze out a few more cycles) but fairly close.
The scanner being timed tokenizes C, including recognizing keywords (which
maximizes flex's advantages).
> How much does the include file / new buffer stuff in flex 2.3 cost
> (in terms of speed)?
Virtually nothing. The buffer structures are only consulted when the
end of the buffer is reached (and sometimes when doing an input() or an
unput()). Since the default buffer size is 16K bytes, the overhead is
negligble.
Vern
flex vs. lex timings for a C tokenizer which includes keywords:
Generation times:
lex 83.0 secs
flex 3.9 # fully compressed scanner
flex -cfe 7.1 # uncompressed table, equivalence classes
flex -cf 15.0 # uncompressed table, no equivalence classes
Scanner object file sizes:
lex 41.0K bytes
flex 9.4K
flex -cfe 49.6K
flex -cf 126.5K
Running times on a 28,088 line input (685K characters):
lex 29.8 secs
flex 19.3
flex -cfe 9.0
flex -cf 7.8
The timings were made on a Sun 3/60. All times are user + system CPU time,
and no hashing of identifiers was done.
Summary:
For about the same sized scanner, you get a factor of 3 in performance.
For a 30% faster scanner, you get a scanner 1/4th the size, and it's
generated in 1/20th the time.
For a scanner that's 3 times larger, you get a factor of 3.8 in
performance.
I did a comparison of generated parsers on a SUN 3 with MC 68020 processor
(excluding scanning):
tool speed (tokens per second)
Yacc 16,000
Lalr - our LALR(1) parser generator 35,000
Ell - our LL(1) recursive descent parser generator 55,000
Pure parsing can be made faster by a factor of 2, 3, or more compared to Yacc.
However, as pure parsing takes only 5 to 10 % of the whole compilation time,
it is not the bottle neck and does not matter too much.
Scanning is much more time critical, as it takes 30 to 50 %.
Therefore Lex can not be used for high speed compilers.
Scanner generators like Rex and flex are up to 5 times faster.
Josef Grosch
---
gro...@gmdka.uucp
>gri...@hubcap.clemson.edu (Mike Percy) writes:
[some of my comments WRT RD parsers]
>Recursive descent parser == LL(0)
My references mention only that RD parsers are used with "extended-tree
grammars," which are then defined. Perhaps I'm using the worng
references, or have missed something, but I never noticed that RD ==
LL(0).
>LR(1) is LESS restrictive.
No argument there.
>You can still use a grammar with yacc that has some conflicts as it
>has some simple rules for conflict resolution. Some of these
>recursive descent parser generators that you mention probably use
>similar rules for conflict resolution but don't bother to tell you
>about their use.
Actually, doesn't yacc and its ilk use LALR? LALR can introduce
conflicts that aren't in LR. As for conflict resolution methods, I'm
sure that they all do similar things.
>Look up some of the compiler theory ... it helps when you're trying to
>design the grammar.
It would also help if the language designers would make life a little
easier... As for the theory, I do look it up. RD parsers weren't even
mentioned in compiler class, except by me. I was told we weren't going
to look at RD parsers. LL was defined, but not explored like LR and
LALR were.
Mike Percy gri...@hubcap.clemson.edu
ISD, Clemson University msp...@clemson.BITNET
(803)656-3780 msp...@clemson.clemson.edu
Note that "LALR(1) parser generator" doesn't have to mean "table driven".
Don't have the reference handy, but someone (I believe MetaWare's Tom
Penello?) has obtained big speedups by using procedural implementations of
the LALR(1) tables. Robert Henry has done much the same thing with code
generation tables in CODEGEN (UW technical report), and observed that table
compression and lookup techniques are essentially application-independent,
and that a general tool could be developed to factor, compress, and
implement tables, including generating fast procedural implementations.
Seems to me procedural table implementations bring us full circle: We
started with fast random code, then switched to slower table-driven code as
an aid to portability and code maintainablility. Now that we can generate
the unportable, unmaintainable code automatically from declarative
specifications, the speed consideration is coming to the fore again. No?
Jeff Prothero
j...@u.washington.edu
I have been playing around with gcc (the NeXT Objective C version) and have
found that by giving a -Q command-line option gcc will tell you how much
time it spends in each section of compilation. Without optimization turned
on, it seems to spend most its time parsing code. gcc doesn't use Lex, and
it is still much slower than Turbo C or Think C. Here is the result of
compiling a 149 line program. I would like to speed GCC up. Bison is used
when compiling GCC and not YACC. Would switching to another parsing tool
give me the speed-up I desire? A FAST FREE compiler would sure complement a
FREE, slow optimizing compiler. Besides gcc is the fastest compiler
available for the NeXT.
-Mike
handel> cc -Q -c CardDrawView.m
_1_CardDrawView _2_CardDrawView _3_CardDrawView _4_CardDrawView
time in parse: 5.939384
time in integration: 0.000000
time in jump: 0.000000
time in cse: 0.000000
time in loop: 0.000000
time in flow: 0.188038
time in combine: 0.000000
time in local-alloc: 0.000000
time in global-alloc: 0.246630
time in final: 0.168993
time in varconst: 0.281626
time in symout: 0.000000
time in dump: 0.000000
handel> wc CardDrawView.m
149 249 2690 CardDrawView.m
[I suspect that there is considerably more time spent passing code through
the assembler and linker than is spent by yacc. Keep in mind that the yacc
code switches out to action routines that build parse trees and symbol tables,
and such routines will be present no matter how you parse. -John]
Not so. Consider the following grammar:
S -> A B C
A -> a
B -> b
C -> c
No lookahead is required, yet a parser can predict that the first token will
be an 'a', the second a 'b', and so on. As an example of where a grammar like
this could be applied, I recently wrote an EBNF->BNF converter for a compiler
class that took input files of the form:
<file> -> <terminal-section> <nonterminal-section> <start> <rules> .
<terminal-section> -> TERMINALS <terminal-list> END .
<nonterminal-section> -> NONTERMINALS <nonterminal-list> END .
&c...
Admittedly, a rather rigid and restrictive format, but it suited the purpose
nicely.
:ja
--
John D. Aycock ayc...@cpsc.ucalgary.ca
(403) 285-8727
[Justin Graver <gra...@ufl.edu> send a similar message. -John]