Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Yacc/Bison - what semantic actions to take on a parse error

12 views
Skip to first unread message

Rod Pemberton

unread,
May 25, 2012, 4:03:08 AM5/25/12
to
"James Harris" <james.h...@gmail.com> wrote in message
news:12-0...@comp.compilers...
> On May 23, 12:19 pm, James Harris <james.harri...@gmail.com> wrote:
>

[... from comp.compilers, without John Levine's comments]

> > Yacc etc allow the special "error" keyword to be used in rules to aid
> > error recovery. Where those rules are there to generate a node of a
> > tree and there has been a parse error what should one tell Yacc to do?
> > Sometimes there's nothing valid one can build a node from and I can't
> > find a good way to communicate the situation to Yacc.
> >
> > I've looked at various options. Some are OK in certain cases but none
> > seem right in the general case. I'll post more details if interested
> > but there may be a simple answer.
> >
> > Anyone know of an easy or a standard answer or can provide some
> > recommendations?
>
>
> I wasn't thinking about using the parse tree after the parse phase so
> much as just completing the parse.
>
> An example may help illustrate. Say were defining a node type X where
> there is nothing special about that node type. We might have a grammar
> construct something like the following. I'll use quotes "..." to
> indicate descriptive text.
>
> %type <X_type> X
> X
> : "a normal X" ';' { $$ = Xnode("specific data"); }
> | error ';' { ACTION; }
> ;
>

Yeah, in one instance some years ago, I avoided the use of %type and %union
altogether...

;-)

I chose to just use a string for YYSTYPE:

#define YYSTYPE char*

That must be defined in both the YACC and LEX grammars.

Then, I define a string in the YACC grammar. Both are in the %{ section for
the YACC grammar:

unsigned char str[4095]

Then, yylval is set equal to str in main() prior to the call to yyparse().

But, if you're doing type checking, I guess you've got no choice but to use
%type for types and create a %union node of all the types.


> The Xnode call constructs a node. The X production expects $$ to be
> set to a node of the given type.
>
> The issue is that the error production cannot create a meaningful node
> so what actions to replace ACTION are appropriate? Here are some
> options.
>
> * Create an X node with dummy values. That would satisfy the type
> checking.
> * Set $<err_msg>$ = "invalid X node"
> * Braces but no action, i.e. {}
> * No action clause so default to $$ = $1;
> * Some combination of YYERROR; and yyerror();
> [...]

If you can catch the error in the LEX grammar for some token rule, you can
fix it prior to it becoming a YACC parsing error. That's your best option,
IMO. Once a LEX rule has found the error, you can "fix" it like so:

save_buffer(); /* your routine to save the buffer */
/* ... is a pointer to the new or correct string to lex */
yy_switch_to_buffer(yy_scan_string(...));

save_buffer() will use a stack to save YY_CURRENT_BUFFER. Of course, if
there is a save_buffer, then there is a restore_buffer, which I call within
yywrap(). This is useful anytime you need to change the stream, like for
include files. Again, that is for LEX *not* YACC.

As for YACC, one YACC error rule I have in a parser looks a bit like this
(text is wrapped by newsreader):

| SOMETOKEN error { printf("error message"); yyclearin; yyerrok;
do_something($1) }

Another looks like this:

| PARSE_ERROR { printf("error message %s %s %d\n", $$, yylval, yychar);
YYABORT; }

You can display yylval and yychar anywhere, like that. Don't use the
YYABORT though.

PARSE_ERROR is a defined token:

%token PARSE_ERROR

The PARSE_ERROR rule is a high-level rule. It's in the rule set defined for
%start . The PARSE_ERROR token gets returned by the LEX grammar for parsing
errors, not by the the YACC grammar ... I.e., {return(PARSE_ERROR)} for the
body of some LEX rule. Those won't be fixed in LEX, but will error out in
YACC.

The grammar also has yyerror() which gets passed a 'char *', gets called by
main - most likely my YYSTYPE... (?), and does an fprintf() with same info
as in PARSE_ERROR rule. You can also check yychar in yyerror for a specific
token such as PARSE_ERROR.


Rod Pemberton



0 new messages