Good explanation of Recursive Ascent Parsing wanted

Aaron Gray

unread,

Sep 29, 2022, 1:26:55 PM9/29/22

to

I am after a good explanation of Recursive Ascent Parsing as I wish to implement a parser generator to generate recursive ascent C/C++ code from an LR1 grammar.

Regards,

Aaron
[The Wikipedia article isn't bad and has several promising references,
but I wonder if it's worth the effort. RA parsing does what yacc or
bison does, but by turning each state into a function. It's supposed
to be faster than a table driven parser, but LALR parsers are already
so fast, how much faster will it be? Maybe it'll even be slower since
there is a lot more code so it's less likely to fit in a CPU cache.
-John]

Kaz Kylheku

unread,

Sep 29, 2022, 2:04:47 PM9/29/22

to

John, I suspect the idea is that this is suited for languages that
are compiled well and have good tail call support, to that that
the recursion is stackless.

A table-driven parser is an interpreter for a table. Tables can
be compiled to a goto graph, and a goto graph can be represented
by tail recursion which compiles to goto. This can be faster than the
original table-interpreting loop.

Although, those improvements will succumb to Amdahl's law; you would
best be able to demonstrate the improvement in a program which does
nothing but parse a canned token stream: no lexing is going on and the
reductions do nothing (basically the syntax is being validated and
that's all).

It be useful in applications like, say, validating some syntax that is
arriving as a real-time input.

I'd ping the GNU Bison mailing list to see if they are interested
in the technique. It's doable in C; C compilers have good tail
call support. And in any case, the technique doesn't strictly
require functions: a giant yyparse() can be generated which just
contains a self-contained goto graph.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
[You're right but my eyeballs hurt when I think about looking at or
trying to debug the giant goto graph. -John]

Aaron Gray

unread,

Sep 29, 2022, 3:09:31 PM9/29/22

to

On Thursday, 29 September 2022 at 19:04:47 UTC+1, Kaz Kylheku wrote:

> On 2022-09-29, Aaron Gray <> wrote:
> > I am after a good explanation of Recursive Ascent Parsing as I wish to
> > implement a parser generator to generate recursive ascent C/C++ code
> > from an LR1 grammar.
> >
> > Regards,
> >
> > Aaron
> > [The Wikipedia article isn't bad and has several promising references,
> > but I wonder if it's worth the effort. RA parsing does what yacc or
> > bison does, but by turning each state into a function. It's supposed
> > to be faster than a table driven parser, but LALR parsers are already
> > so fast, how much faster will it be? Maybe it'll even be slower since
> > there is a lot more code so it's less likely to fit in a CPU cache.
> > -John]

Hi John, I am working on a modern (initially C++) Source to Source
Compiler Compiler that supports all parsing algorithms. I already have
a LEX and YACC like tools LG and PG, that are tools which are library
based.

I did not find the WikiPedia code very enightening. What I did find
was an obvious explanation, follow the item closure spaces !

What I am actually doing is working on marrying Recursive Descent with
backtracking with Recursive Ascent for the LR expression bits. Rather
than using an operator grammar for them, well that as well, but yes.

> John, I suspect the idea is that this is suited for languages that
> are compiled well and have good tail call support, to that that
> the recursion is stackless.

Kaz, Oh this sounds potentially very efficient, thank you, I will have
a serious play and see if I can get this to work !

> A table-driven parser is an interpreter for a table. Tables can
> be compiled to a goto graph, and a goto graph can be represented
> by tail recursion which compiles to goto. This can be faster than the
> original table-interpreting loop.
>
> Although, those improvements will succumb to Amdahl's law; you would
> best be able to demonstrate the improvement in a program which does
> nothing but parse a canned token stream: no lexing is going on and the
> reductions do nothing (basically the syntax is being validated and
> that's all).
>
> It be useful in applications like, say, validating some syntax that is
> arriving as a real-time input.
>
> I'd ping the GNU Bison mailing list to see if they are interested
> in the technique. It's doable in C; C compilers have good tail
> call support. And in any case, the technique doesn't strictly
> require functions: a giant yyparse() can be generated which just
> contains a self-contained goto graph.

Many thanks !

Aaron

Kaz Kylheku

unread,

Sep 29, 2022, 8:25:36 PM9/29/22

to

On 2022-09-29, Aaron Gray <aaron...@gmail.com> wrote:

> Kaz, Oh this sounds potentially very efficient, thank you, I will have
> a serious play and see if I can get this to work !

If you look at how Yacc programs typically work: they use goto anyway.

The reason I say this is that the parser actions all get compiled into
one big yyparse() function, right into its body, and end up as targets
of the labels of a switch() statement, or possibly gotos.

It's just that the gotos are computed, with the help of the parser
tables.

So basically what you're doing is eliminating the computed goto; instead
of changing some state variable and returning to the top of a loop to
switch on it, you just jump to the destination directly.

I think there are still situations where a computed goto is inevitable.
The LALR parser is a push-down automaton: where the LR(0) items form the
basic state transitions for recognizing the regular language of the
senential patterns. This is augmented by the stack to handle
the recursion in the grammar.

When reductions occur and the stack is popped, it is not statically
known which state the machine will end up in; so there cannot tbe
a hard coded goto or tail call. This is because the same phrase
structure, e,g, "expression" can occur in multiple contexts.

So here, the goto-based or tail-recursion-based implementation still
requires a computed goto. Thus in C a switch statement would be
used, or the GNU C computed labels; or else if tail calls are used,
then perhaps function pointers could be pushed into the stack.

I'm not looking at this at the required level of detail, but my
intuition for the problem space affords me a modicum of assurance. :)

Anton Ertl

unread,

Sep 30, 2022, 9:33:27 PM9/30/22

to

>[Maybe it'll even be slower since

>there is a lot more code so it's less likely to fit in a CPU cache.
>-John]

My experience is: Don't worry about I-cache misses until you have
them. I have seen cases where a lot of code was produced, but it had
enough temporal and spatial locality that I-cache misses were no
problem, as well as a case where we applied a technique that reduced
the code size (with the intent of reducing I-cache misses), but as a
result we saw a slowdown from increased I-cache misses, because the
resulting code had worse spatial locality.

Looking at the numbers in 'Thomas J Penello (1986). "Very fast LR
parsing"', one need not worry much about the I-cache: He describes an
experiment with a grammar with 126 productions, resulting in 192
states, 305 terminal transitions and 226 nonterminal transitions in
the LALR(1) parser; the resulting parse tables for the table-driven
parser has 2022 bytes, while the recursive-ascent parser has 5602
bytes (including 397 bytes in common with the table-driven parser).
I-cache sizes these days are typically 32KB, so this parser would be
completely within the I-cache (even if we assume a factor of two from
going from 80286 code to AMD64 code), so one does not even need to
think about locality. Penello also reports that a 158-production
grammar for standard Pascal with a few minor extensions yielded a
5326-line assembly program (which MASM 3.0 failed to assemble); this
should also easily fit into 32KB.

Concerning recursive ascent, my dim memory of the article I read about
it a long time ago says that on reduction a routine does often not
return to its parent caller, but to some ancestor. Since the
introduction of the return stack predictor in microarchitectures in
the mid-1990s, implementing that directly causes at least one
misprediction (~20 cycles), and possibly there are followup
mispredictions from having the predictor stack go out-of-sync with the
real returns. Looking at the Wikipedia page, it says that on
reduction a counter is returned, so you take the returns one at a time
and count down until you arrive at the desired level. This costs some
cycles, too.

Overall the performance of recursive ascent relative to table-driven
is less advantageous than it may have been in the 1980s (before branch
prediction and I-caches). Penello reports a speedup by a factor 5.5
for the recursive-ascent parser (which uses assembly language) over
"an interpretive LR parser written in Pascal and compiled by a good
optimizing Pascal compiler" on an 80286. It would be interesting to
see a performance comparison between todays Bison skeletons compiled
with a present-day optimizing C compiler and a recursive ascent parser
on present-day hardware.

- anton
--
M. Anton Ertl
an...@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Johann 'Myrkraverk' Oskarsson

unread,

Sep 30, 2022, 9:37:35 PM9/30/22

to

On 9/29/2022 3:26 AM, Aaron Gray wrote:
> I am after a good explanation of Recursive Ascent Parsing as I wish to implement a parser generator to generate recursive ascent C/C++ code from an LR1 grammar.

The best explanation I've read for all of this is Holub's Compiler at

https://holub.com/compiler/

though I don't recall if he covers RA specifically. In any case, since
other posters talked about Lex, Yacc, etc., I want to point out this
book since 1) it's the one I personally learned best from, and 2) isn't
recommended much [anymore.] It's where I learned more about implement-
ing parser generators that any other more modern resource.

If he does talk about RA at all, it's probably as an exercise or in an
off-hand manner, for which the book may not be worth it.

Good luck,
--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk
[I have the book, and he didn't. Keep in mind that the book had a
stupendous number of errors, so be sure to read the 52 pages of
errata at the back. -John]

anti...@math.uni.wroc.pl

unread,

Oct 7, 2022, 1:04:52 PM10/7/22

to

Karlsruhe team (Grosch and collaborators) had somewhat different
conclusions. When they added extra features needed for production
compiler they noted slowdowns on bigger grammars when translating
to code. OTOH they reported quite good results from table-driven
parsers. Reports were available online, but I do not have links
handy (quite possible that old links are broken now).

Of couse, modern machines tend to have larger caches than the
old ones. But also modern machines are troughput oriented
and my suffer more from latency.

--
Waldek Hebisch

Kaz Kylheku

unread,

Oct 8, 2022, 8:11:46 PM10/8/22

to

On 2022-10-06, anti...@math.uni.wroc.pl <anti...@math.uni.wroc.pl> wrote:
> Of couse, modern machines tend to have larger caches than the

> old ones. But also modern machines are throughput oriented

> and my suffer more from latency.

But that's the latency of a single instruction you're talking about,
right?. E.g. a deeper pipeline can worsen the time from instruction
fetch to completion, if the clock isn't jacked up and propagation
delays reduced to compensate.

When do you care about single-instruction-level latency, other than if
concerned about pipeline stalls in some scenarios?

Throughput translates to low latency at the level of blocks of
instructions or procedures. The procedure call returns a result faster
due to throughput, which is less latency.

The latency you perceive on modern machines as an interactive user is
due to cra^H^H^Hrichly functional software stacks.