Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

how to deal with left recursive?

5 views
Skip to first unread message

vincent Feng

unread,
Nov 5, 2009, 10:10:46 PM11/5/09
to
I have found ansi c yacc grammar.
A production of statement list like this

statement_list
: statement
| statement_list statement
;

is left recursive.
How to elimilate left recursive in such case?

Is there exist a solution for parsing such grammar using LL(k) ?

Kaz Kylheku

unread,
Nov 6, 2009, 4:28:00 PM11/6/09
to
On 2009-11-06, vincent Feng <vincen...@yahoo.com> wrote:
> I have found ansi c yacc grammar.
> A production of statement list like this
>
> statement_list
> : statement
> | statement_list statement
> ;
>
> is left recursive.
> How to elimilate left recursive in such case?

Since there are no semantic actions attached to the above grammar which depend
on order, you can just make it right recursive.

statement_list
: statement
| statement statement_list
;

This generates exactly the same symbol strings (i.e. language) as the previous
grammar.

Think about it: left recursion is a silly way to represent a list-like
language construct, anyway!

Why? Because the left recursion corresponds to a tree like this:


/\
/\
/\
/\

Rather than one like this:


/\
/\
/\
/\


So just change it to:


statement_list : statement
| statement statement_list
;

A (non-empty) statement list is either just one statement, or a statement
followed by more statement material.

In the txr language, I parse constructs like this and turn them into
Lisp-like lists.

Take a look here: http://git.savannah.gnu.org/cgit/txr.git/tree/parser.y

For example, there is this production:

clauses : clause { $$ = cons($1, nil); }
| clause clauses { $$ = cons($1, $2); }
;

Why did I make it right recursive? The clue is in the simple semantic actions
on the right. If there is one clause, we build a one-element list. using
cons(<clause>, nil). If we have a clause followed by more clauses, we can cons
the clause up to the front of the list. The cons function just has to allocate
one binary cell and initialize its ``car'' and ``cdr'' fields with the argument
values.

A list of three clauses c1, c2, c3 looks like this cons structure:

/\
c1/\
c2/\
c3 nil

The tree mirrors the grammar.

But of course the list can't be empty, which is something we sometimes
want to allow. That's where this production
comes in: the optional clauses:

clauses_opt : clauses { $$ = $1; }
| /* empty */ { $$ = nil; }
;

``If there are clauses, just copy them. Otherwise, produce the empty
list nil.''

It would silly to have the production left recursive, because we still want to
build a right-recursive list of conses; so we end up doing an append operation:

clauses : clause { $$ = cons($1, nil); }
| clauses clause { $$ = append2($1, $2); }
;

append has to make a copy of the list structure. We could use
the destructive version of append, but that still has to walk down the
list to find the end. Either way we now have O(N*N) complexity to parse a list
of N clauses.

However, there is a cost. The right recursive approach has a deeper parsing
stack. In order to build the tree

/\
c1/\
c2/\
c3 nil

the very first semantic action that will happen will be the reduction which
conses up (c3 . nil) or (c3) if you will. This is followed by a reduction to
(c2 . (c3 . nil)). I.e. there is all this context built up on the yacc stack,
whose depth corresponds to the depth of the list being parsed. Only when you
hit the end of the list, can you start popping off all of the stack context
to produce the list!

We could do this instead. Use left recursion, and build up the list
of clauses in reverse. Then destructively reverse the list in O(N) time.

rev_clauses : clause { $$ = cons($1, nil); }
| rev_clauses clause { $$ = cons($2, $1); }
;

Now when this production is reduced three times over three clauses c1, c2, c3,
we get a structure which looks like this:

/\
c3/\
c2/\
c1 nil

I.e. the backwards list (c3 c2 c1) corresopnding to the dot notation
(c3 . (c2 . (c1 . nil))).

we can fix it by inserting this shim production into the grammar:

clauses : rev_clauses { $$ = nreverse($1); }

This way we get constant yacc stack usage, and construct the list
in linear time, without having to switch to some clumsy list encapsulation that
doesn't support functional programming very well.

Hans-Peter Diettrich

unread,
Nov 6, 2009, 6:25:44 PM11/6/09
to
vincent Feng schrieb:

> I have found ansi c yacc grammar.
> A production of statement list like this
>
> statement_list
> : statement
> | statement_list statement
> ;
>
> is left recursive.
> How to elimilate left recursive in such case?

statement_list
: statement
| statement statement_list
;

or shorter EBNF:

statement_list
: statement [ statement_list ]
;
or

statement_list
: statement { statement }
;

> Is there exist a solution for parsing such grammar using LL(k) ?

The C language is essentially LL(1), with one or two LL(2) cases.

DoDi

Bhateja, Jatin

unread,
Nov 9, 2009, 3:52:33 AM11/9/09
to
LL(K) parsers generators like ANTLR and TXL are the once which strictly
needs grammar to be non-left recursive and left factored (this is less
stringent as if look ahead is more)

However ANTLR3 provide some support for automatic left recursion removal


http://www.antlr.org/wiki/display/ANTLR3/Left-Recursion+Removal


Thanks and Regards
Jatin Bhateja

0 new messages