Q: Recursive Descent w/Backtracking

SCHM...@iccgcc.cs.hh.ab.com

unread,

Oct 21, 1994, 2:46:50 PM10/21/94

to

Hello,

I'm attempting to write a recursive descent parser with backtracking (RDB).
So for example given the following grammar:

S -> cAd
A -> a | ab

It would be able to correctly parse the sentences of this language.

One pitfall with RDB is that if not implemented properly, the order of
productions can affect whether or not a solution is found. For example,
consider the input string w = cabd with the above grammar. A naive
implementation might fail as follows:

S -> cAd -> cad (REJECT)

S
/|\
/ | \
c A d
|
a

The problem is that the production for 'A' succeeded with a match on 'a', we
then try to match a 'b' with a 'd' and fail. What needs to happen is that
we go back to 'A' and try the second alternative 'ab'. This problem
would not have occurred if the order alternatives for 'A' were reversed:

S -> cAd
A -> ab | a

S -> cAd -> cabd (ACCEPT)

S
/|\
/ | \
c A d
|
ab

(the older "Dragon Book" discusses this problem)

This type of backtracking over a tree is difficult to implement using
recursive precedures to represent each production, since if we had a
procedure called 'A' which attempts to match either 'a' or 'ab', we
would have to remember that we called 'A' and then call 'A' again
telling it this time to ignore the first alternative.

I have a rough outline of an algorithm that is designed to handle this
case. It uses a stack to keep track of the set of production choices
which have been applied and the current input pointer (to string w)
in case backtracking has to move the pointer back. The key to the algorithm
is that it also maintains a set of unexpanded nodes (in leftmost order)
yet to be expanded (matched against). Unfortunately, the algorithm is
very unclear as to how the unexplored the nodes are managed so that the
proper node is always explored at the right point in time.

So does anyone have a lucid description of an algorithm which will handle
backtracking over all possibilites of a grammar tree?

Thanks in advance,

-- Greg Schmid schm...@iccgcc.decnet.ab.com
--
Send compilers articles to comp...@iecc.com or
{ima | spdcc | world}!iecc!compilers. Meta-mail to compiler...@iecc.com.

David Moore

unread,

Oct 25, 1994, 12:00:32 AM10/25/94

to

SCHM...@iccgcc.cs.hh.ab.com writes:

>Hello,

>I'm attempting to write a recursive descent parser with backtracking (RDB).
>So for example given the following grammar:

>S -> cAd
>A -> a | ab

>It would be able to correctly parse the sentences of this language.

Look at the programming language Icon:

"The Icon Programming Language" Griswold and Griswold,
Prentice Hall 1971 (ISBN 0-13-449777-5)

This language introduces the idea of a "generator" which is (essentially)
a function that can be backed up into to produce a second result if
you do not like the first one.

You can probably get Icon from an FTP site. Once you understand the
principles, actually implementing what you want in some other language
is not hard. It comes down to keeping a tree of "activation segments"
rather than just a stack, so that you can revisit an activation (pruning
to the right as you go)

Raul Deluth Miller

unread,

Oct 28, 1994, 12:37:28 PM10/28/94

to

Greg Schmid:
: One pitfall with RDB is that if not implemented properly, the order

: of productions can affect whether or not a solution is found.

I have not had that experience.

: So does anyone have a lucid description of an algorithm which will

: handle backtracking over all possibilites of a grammar tree?

For each possible *terminal* in the grammar, (recursively) attempt to
parse the remaining text stream. Return failure only when each of
these possibilities have failed.

--
Raul D. Miller
<rock...@nova.umd.edu>

Igor Chudov

unread,

Oct 28, 1994, 12:45:47 PM10/28/94

to

SCHM...@iccgcc.cs.hh.ab.com wrote:
: Hello,

: I'm attempting to write a recursive descent parser with backtracking (RDB).
: So for example given the following grammar:

: S -> cAd
: A -> a | ab

: It would be able to correctly parse the sentences of this language.

: One pitfall with RDB is that if not implemented properly, the order of
: productions can affect whether or not a solution is found. For example,
: consider the input string w = cabd with the above grammar. A naive
: implementation might fail as follows:

You can avoid the problems by two ways:

1. Admit that it is LL(2) grammar and just construct the parser for LL(2)
grammar;

2. Rewrite your grammar:
S -> cAd
A -> aB
B ->
B -> b

I would choose the latter, because this is most probably what the language
semantics really means.
----------------------------------------------------------------------------
Igor Chudov, Resource Solutions Intl office (918)588-2309
Systems Engineer, for WilTel. home (918)585-5862
E-mail: igor_...@wiltel.com
Internet: ich...@shoe.wiltel.com
1819 South Jackson #32-P Tulsa OK 74107

Stephen J Bevan

unread,

Oct 31, 1994, 5:45:23 PM10/31/94

to

SCHM...@iccgcc.cs.hh.ab.com writes:
I have a rough outline of an algorithm that is designed to handle this
case. It uses a stack to keep track of the set of production choices
which have been applied and the current input pointer (to string w)
in case backtracking has to move the pointer back.

... Unfortunately, the algorithm is very unclear as to how the

unexplored the nodes are managed so that the proper node is always
explored at the right point in time.

So does anyone have a lucid description of an algorithm which will handle
backtracking over all possibilites of a grammar tree?

A simple combinator parser will do this by default i.e. you have to
work hard to get it not to backtrack. Descriptions of combinator
parsers can be found in many books on functional language programming,
and there are also a number of good papers on the subject. See the
FAQ in comp.lang.functional for exact references.

Terence Parr

unread,

Oct 31, 1994, 5:32:43 PM10/31/94

to

Greg Schmid writes about backtracking:

> S -> cAd
> A -> a | ab
>
> It would be able to correctly parse the sentences of this language.
>
> One pitfall with RDB is that if not implemented properly, the order of
> productions can affect whether or not a solution is found. For example,
> consider the input string w = cabd with the above grammar. A naive
> implementation might fail as follows:
>
> S -> cAd -> cad (REJECT)

> [...]

> This type of backtracking over a tree is difficult to implement using
> recursive precedures to represent each production, since if we had a
> procedure called 'A' which attempts to match either 'a' or 'ab', we
> would have to remember that we called 'A' and then call 'A' again
> telling it this time to ignore the first alternative.

Greg, don't worry about this contrived case. Furthermore, backtracking is
needed in only a few cases and mainly for nasty languages like C++. Unless
you have a totally wacko language to parse, don't use a purely backtracking
parser as it has potentially exponential time complexity. If you do want a
purely backtracking parser, there are boatloads out there and you could
write a simple one in a day or two. You could also try a Tomita parser that
makes forests of bottom-up parse trees and then picks the "best" one.

Alternatively, let me make a shameless plug for ANTLR (the parser generator
in PCCTS). ANTLR would handle this one without backtracking because it's
simply LL(2). That is, in rule A, the parser would see input "ab" or "ad"
rather than just input "a"--which would allow it to correctly
(deterministically) predict A->a or A->ab. ANTLR will do backtracking also
if you want, however, it still won't solve your specific example without a
grammar rewrite. Even if we restrict ANTLR to LL(1), it could still solve
your example with a "syntactic predicate" and a rule instantiation:

s : C
( (A D)?
| A B D
)
;

The (...)? is a predicate that indicates you are not sure if the enclosed
construct will match. If it does not, simply try the next viable
alternative. The (A D|A B D) is a simple EBNF subrule (rule without a
label). This selective backtracking is extremely useful in practice; just
ask the NeXT folks doing the combined C/Objective-C/C++ parser with ANTLR.
The PCCTS ftp site is everest.ee.umn.edu in pub/pccts. Our newsgroup
is comp.compilers.tools.pccts.

Best regards,
Terence Parr