LEX/YACC - grammar problems and printing error messages

Jerry Nettleton

unread,

May 27, 1992, 11:51:14 PM5/27/92

to

I have been developing a data conversion program using lex and yacc in
order to generate a text file to import into a new database. (I know AWK
could be used, but I want to adapt this conversion process for an
interactive mode.) As grammar errors are found, I was wondering how to
associate the source line/column with an appropriate error message. Since
lex deals with tokens and yacc parses the grammar, how can I get the
current line of input to print error messages? Actually, the data is
organized as a multi-line record and I will write the entire record to an
error file and then use a comment line to display the error.

--
Jerry Nettleton
email: ne...@technix.mn.org
...!uunet!cs.umn.edu!kksys!edgar!technix!nett
[Bison has some extra hackery to associate a line and column number with
every token, though it still won't give you the whole line. See the next
message for some suggestions. -John]
--
Send compilers articles to comp...@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.

John R. Levine

unread,

May 28, 1992, 2:38:40 PM5/28/92

to

In article <92-0...@comp.compilers> you write:
>[how can I get the entire line where an error occurs into an error message?]

Yacc promises to complain as soon as it sees a token you can't parse so it's
mostly a matter of having the current line available when the error occurs.

Basically, you have to buffer the line yourself. In AT&T lex one
possibility is to rewrite the input() macro to do line buffering, but
here's a sketch of how to do it in a portable way that should work with
flex, the version of lex that all sensible people use. This should extend
in a straightforward way to multi-line records so long as the lexer can
tell where the record boundaries are.

---begin untested lex code---

%{
char linebuf[500]; /* line buffer for tokens */
int curoffs; /* start of current token */
int clear_on_next = 0;
%}

%%

/* clear buffer after end of line */
\n { add_linebuf(); clear_on_next = 1; return(EOL); }

/* real token */
foo { add_linebuf(); return(FOO); }

/* ignored white space still needs to go in the buffer */
[ \t]+ { add_linebuf(); }

%%

/* initialize the line buffer */
clr_linebuf()
{
linebuf[0] = '\0';
curoffs = 0;
clear_on_next = 0;
}

/* add the current token to the current line */
add_linebuf()
{
if(clear_on_next)
clr_linebuf();

curoffs = strlen(linebuf); /* start of current */
strcpy(linebuf+curoffs, yytext); /* append current */
/* strcpy is faster than strcat */
}

/* report an error */
yyerror(char *errmsg)
{
int curend = linebuf+strlen(linebuf); /* current buf end */
char *p;

/* get the rest of the line if not at end */
if(!clear_on_next) {
for(p = curend; ; ) {
int c = input();

*p++ = c;
if(c == '\n')
break;
}
*p = 0;
/* now give it back so lex can scan it later */
while(p > curend)
unput(*--p);
}

/* linebuf[] now has the whole line, with the current token */
/* at curoffs */

/* print error message and current line */
printf("%s\n%s", errmsg, linebuf);

/* print an X under the most recent token */
printf("%*sX\n", curoffs , ""); /* curoffs spaces, then X */
}
--end untested lex code---

Here's a slightly easier approach which pre-reads the next line after
every newline, but doesn't keep track of the location of the current
token.

--begin more untested lex code---
%%

\n.* { strcpy(linebuf, yytext+1); /* save the next line */
yyless(1); /* give back all but the \n to rescan */
}
--begin more untested lex code---

Regards,
John Levine, jo...@iecc.cambridge.ma.us, {spdcc|ima|world}!iecc!johnl

Criag Wylie

unread,

May 29, 1992, 7:08:17 AM5/29/92

to

The easiest way I have found of associating erors with lines, which allows
for multiple errors on lines as well.

As soon as YACC detects an error it calls yyerror(). Change this to
create a linked list/array of error code/text and character position.

When an end of line is detected check to see if the error list is empty,
if it isn't print the line and then print each of the errors. The
character position allows you to position a marker on the line.

The down side of this is error cascading where one error causes a large
number of spurious errors to be seen as the grammar re-synchs with the
input.

Craig.
[This certainly works, though the original question was how to get a copy
of the entire input line from lex. -John]