My question is that, there is available a C grammar in lex/yacc
that does not contain any shift/reduce conflict?
Can such grammar exsist?
Thanks: Akos Hajnal
[I doubt you'll find any if-then-else constructs where yacc will let you
embed actions in any grammar. -John]
--
Send compilers articles to comp...@iecc.com, meta-mail to
compiler...@iecc.com. Archives at http://www.iecc.com/compilers
You know (if I recall properly)
%nonassoc ELSE
%nonassoc LOWER_THEN_ELSE
...
ifStatement
: IF expr THEN statement %prec LOWER_THAN_ELSE
| IF expr THEN statement ELSE statement
;
Or does it still give a problem?
(Or have I been away from yacc that long???)
-- Scott
Scott Stanchfield - http://www.scruz.net/~thetick
MageLang Institute - http://www.MageLang.com
HAJNAL Akos wrote...
>I downloaded the ANSI C grammar from
>The above grammar contains a shift/reduce conflict at
>the definitions if-then, if-then-else.
>For this reason I can not put actions within these rules,
>i.e. for example after parsing the condition part.
[I believe he wants to put in embedded actions, which you can't do because
it forces yacc to look more than one token ahead. -John]
Yes, you can. You just have to be devious about it, which is the
point of this response. See below.
> My question is that, there is available a C grammar in lex/yacc
> that does not contain any shift/reduce conflict?
> Can such grammar exsist?
To which our moderator replied:
-- [I doubt you'll find any if-then-else constructs where yacc will let you
-- embed actions in any grammar. -John]
Or to specifically answer the question, there cannot be a grammar
without a shift-reduce conflict (it is present in the language,
i.e. you must know whether you have a lookahead of ELSE to decide
whether to shift to the if-then-else alternative or reduce the if-then
alternative), but as I will show below that does not prevent adding
action code to the grammar.
As a response Scott Stanchfield suggested a precedence trick which
directs yacc how to resolve the shift-reduce conflict (by using, as
our moderator pointed out, one token of lookahead) that provides a
nice grammar for me to cut and paste a solution in.
%nonassoc ELSE
%nonassoc LOWER_THEN_ELSE
...
ifStatement
: IF expr THEN statement %prec LOWER_THAN_ELSE
| IF expr THEN statement ELSE statement
;
Given Scott's grammar, I presume what you would like to write is this:
ifStatement
: IF expr { actions for the if clause } THEN statement
{ actions for the then clause }
| IF expr { actions for the if clause } THEN statement
{ actions for the then clause }
ELSE statement
;
The solution is to embedding actions in rules with shift-reduce (or
even reduce-reduce conflicts) conflicts is to introduce empty
non-terminals to hang the actions on. When you embed actions in
rules, yacc introduces empty non-terminals for you, but most versions
aren't smart enough to reuse the same non-terminal if the action code
is repeated. However, you can do it by hand. (Note Yacc++ handles
shift action code without introducing the dummy empty non-terminal
because the model calls out action code explicitly and it also
recognizes when you repeat the same action code in more than one place
and reuses it. For those reasons, Yacc++ would except the definition
above.)
Thus, the solution for the grammar Scott presented is:
ifStatement
: IF expr afterIfExpression THEN statement afterThenStatement
| IF expr afterIfExpression THEN statement afterThenStatement
ELSE statement
;
afterIfExpression: /* empty */ ; { actions for the if clause }
afterThenStatement: /* empty */ ; { actions for the then clause }
The only point is that the same non-terminal must be added to each
rule in the conflict. That means the action must be the same for each
conflicting rule. That means, you are out of luck if you want to do
something "different" immediately after the if-expression of an
if-then and an if-then-else. You must wait until your lexer has given
the parser the ELSE token to distinguish the two cases. That is the
following will not work (it will give you reduce-reduce conflicts):
ifStatement
: IF expr afterIfExpression THEN statement afterThenStatement
;
ifElseStatement
: IF expr afterIfElseExpression THEN statement afterThenELseStatement
ELSE statement
;
afterIfExpression: /* empty */ ;
{ actions for the if clause of an if-then }
afterIfELseExpression: /* empty */ ;
{ actions for the if clause of an if-then-else }
afterThenStatement: /* empty */ ;
{ actions for the then clause of an if-then }
afterThenElseStatement: /* empty */ ;
{ actions for the then clause of an if-then-else }
By the way, Yacc++ handles to one token of lookahead case implicitly.
Thus, if you need only one token of lookahead to decide which action
to apply, Yacc++ will read the lookahead token before applying the
action code. That means you could have different actions for the then
clause since the ELSE token will be the next read, but not for the if
clause.
This example is okay:
ifStatement
: IF expr { actions for the if clause } THEN statement
{ actions for the then clause of an if-then }
| IF expr { actions for the if clause } THEN statement
{ actions for the then clause of an if-then-else }
ELSE statement
;
This example will cause a "shift action code conflict":
ifStatement
: IF expr { actions for the if clause of an if-then } THEN statement
{ actions for the then clause of an if-then }
| IF expr { actions for the if clause of an if-then-else } THEN statement
{ actions for the then clause of an if-then-else }
ELSE statement
;
The equivalent restriction applies for the dummy empty non-terminals
in yacc. You cannot have a distinct afterIfElseExpression, but you
can have a distinct afterThenElseStatement. I'll leave that example
as an exercise for the reader.
There is one last caveat. You cannot use the distinct
afterThenStatement and afterThenElseStatements for controlling the
lexer state if that change of lexer state can effect the parsing of
the ELSE token. The reason is that parser must wait for the presence
or absence of the ELSE token to decide which action to apply. Thus,
the parser will not have told the lexer to change state until after
that token is seen (and its pretty hard for the lexer to change the
state before recognizing the token if the command to do so does not
come until after the token is recognized).
By the way, you can use Scott's precedence "trick" with any of the
above grammars to resolve the shift-reduce conflict without warning.
It's not really a trick, because that is exactly how the precedence
declaration was defined in yacc, and part of why it is defined the way
that it is. Just note that, as our moderator points out, that
precedence declarations are very powerful and can hide other
legitimate conflicts in the grammar, misleading you to believe you
have a correct grammar when it is actually ambiguous. One good way of
doing this is to write the grammar without any precedence declarations
and get it debugged (excluding the precedence of the expression
operators) and then add the precedence operators making certain you do
not change the semantics of the resulting tables.
-Chris Clark
************************************************************************
Compiler Resources, Inc. email: com...@world.std.com
3 Proctor St. http://world.std.com/~compres
Hopkinton, MA 01748 phone: (508) 435-5016
USA 24hr fax: (508) 435-4847
[I need to write up something on grammar flattening, it's a fundamental
technique in writing yacc grammars, and it's not obvious until you see
it a few times. We touch on it in L&Y but not at length. -John]
Language C does not have if-then-else problem, quoted BNF definition
does. Language spec, both usually and in case of C, disambiguates
this so that 'else' is attached to nearest 'if'.
Problem is easy to resolve by splitting balanced (all 'if's have
'else's in 'then' parts) and unbalanced if statements. I've not seen
anybody using this in practice until recently - you can see one at
(mind yourself it's rather long html)
http://java.sun.com/docs/books/jls/html/19.doc.html#52994
It does not solve the action problem, but this one has little to do
with shift/reduce conflict: parser does not know yet if it's parsing
balanced or unbalanced if at this moment. If the same action is to be
performed in the middle of balanced/unbalanced productions - factor
out common part and code actions in the middle of new production.
[That splitting is a standard technique, but it's in the folklore, not
written up in a lot of places. -John]
Try to find a copy of the 1st edition of C: A Reference Manual by
Harbison and Steele. They give an LALR(1) grammar for C that handles
if-then-else without conflicts. The LALR(1) grammar does not appear
in the latest edition.
It is possible to write an LALR(1) grammar for C, assuming a lexical
analyzer that distinguishes typedef-names from identifiers. It is a
pain to do so, and the resulting parser will be slower than a parser
for a grammar with some well-chosen and properly handled conflicts.
Sincerely,
Bob Corbett
The earliest implementation of such splitting I've seen was in Intel's
PL/M-86 compiler, circa 1981.
--
Kirk Hays
Dmitri&Nina Bronnikov <Bron...@inreach.com>'s posting is correct.
The problem is not in the language. It is in the grammar. Any
grammar which does not distinguish statements with if-then's allowed
from statements where all if-then's must have else's will have the
problem. To correct the grammar, you simply re-introduce the
distinction (or apply precedence declarations), which I will not
attempt given my tendency to make mistakes.
A similar problem exists in the following example (excuse me if I have
the recursion or precedence reversed, I don't normally write grammars
this way). The idea of this grammar is that only variables can be
tested to see if they are initialized, but any kind of expression can
be zero. This produces a shift/reduce conflict. (I'm not sure what
the equivalent problem is in LL, although I know this grammar is not
LL(1), since upon seing a VAR you don't know whether to descend into a
expr or not.)
test: init_test | zero_test;
init_test: VAR not INIT;
zero_test: expr not ZERO;
expr: term PLUS expr | term;
term: factor TIMES term | factor;
factor: VAR | CONST;
not: NOT | ;
To correct the problem (getting rid of the shift/reduce conflict and
making the grammar LL(1) at the same time), you need to split the
productions. The corrected version looks like:
test: init_test | zero_test;
init_test: VAR not INIT;
zero_test: VAR not ZERO | non_var_expr not ZERO;
non_var_expr: term PLUS expr | non_var_term;
expr: term PLUS expr | term;
non_var_term: factor TIMES term | non_var_factor;
term: factor TIMES term | factor;
non_var_factor: CONST;
factor: VAR | CONST;
not: NOT | ;
Notice how the problem propagates through a whole series of rules,
creating numerous non_var_ copies of each rule. Also note that the
change to the zero_test rule is required for the grammar to parse
correctly, while the splitting of the expr rules into nov_var_ copies
is simply to remove the shift/reduce conflict. (By the way, if you
split grammars like this be careful, I made three mistakes while
slitting this simple grammar--of course, I do seem error prone this
week.)
This is one case when precedence tricks can help. Correct the
zero_test rule (so that the conflict is between alternatives of the
zero_test rule and not between the zero_test rule and the init_test
rule), but use precedence to fix shift/reduce conflicts within the
zero_test rule. A rule-of-thumb for distinguishing the cases, is that
if the shift reduce conflict involves two alternatives of the same
rule (as in the next grammar where the two alternatives of zero_test
cause the shift/reduce conflict) , it is probably okay to solve by
selecting the shift. However, if the shift/reduce conflict selects
between different rules (as in between init_test and zero_test) then
the grammar should be rewritten to resolve the conflict.
Similarly to being careful when splitting rules, use caution when
applying precedence declarations. One technique is to get the grammar
working without precedence declarations leaving only shift/reduce
conflicts where the correct solution is always to shift and then to
add the appropriate precedence declarations checking that exactly the
same tables are generated.
right NOT; // prefer shift over reduce
test: init_test | zero_test;
init_test: VAR not INIT;
zero_test: VAR not ZERO | expr not ZERO;
expr: term PLUS expr | term;
term: factor TIMES term | factor;
factor: VAR %prec NOT // hack to avoid conflict message
| CONST;
not: NOT | ;
Hope this helps,
-Chris
*****************************************************************************
Chris Clark Internet : com...@world.std.com
Compiler Resources, Inc. CompuServe : 74252,1375
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)
------------------------------------------------------------------------------
Web Site in Progress: Web Site : http://world.std.com/~compres
-- Scott
Scott Stanchfield, Santa Cruz, CA
http://www.scruz.net/~thetick