Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

C-interpreter, newlines as separators?

3 views

Skip to first unread message

Noel S. Gorelick

unread,

Apr 28, 1995, 3:00:00 AM4/28/95

I am writing what is turning out to be a C-like interpreter. As an
interpreter, the trailing semicolons are a nusiance, and seem kind of
silly most of the time. I would like to be able to optionally use newlines
as statement separators. My problem is shown below:

for (i=0 ; i<10 ; i=i+1) bar(i) // works

for (i=0 ; i<10 ; i=i+1) { // works
bar(i)
}

for (i=0 ; i<10 ; i=i+1) // doesn't work
{
bar(i)
}

These three statements should all be equal, but the the newline trailing
the for() in the last one matches to an empty expression.

What I think I need to do is to conditionally ignore newlines. Only when
trying to match a 'separator' rule, should a newline not be ignored. However,
I'm not savvy enough with yacc/lex to know what I'm doing. Here are the
rules I've got:

expression_statement
: separator { $$ = NULL; }
| expression separator { $$ = $1; }
;

separator
: '\n'
| ';'
;

I tried some code after matching 'expression', to disable/enable eating
of newlines, but this fails, causing the lexer to eat all newlines.

expression_statement
: separator { $$ = NULL; }
| expression { eatNL = 0; } separator { $$ = $1; eatNL = 1}
;

Any help would be greatly appreciated.
Thanks,

-- Noel (ngor...@speclab.cr.usgs.gov)
--
Send compilers articles to comp...@iecc.com,
meta-mail to compiler...@iecc.com.

Theo Norvell

unread,

May 9, 1995, 3:00:00 AM5/9/95

>I am writing what is turning out to be a C-like interpreter. As an

>interpreter, the trailing semicolons are a nuisance, and seem kind of

>silly most of the time. I would like to be able to optionally use newlines
>as statement separators.

Good idea.

>What I think I need to do is to conditionally ignore newlines.

This may not be necessary. Often a better idea is to follow the
C/Algol/etc idea of treating blanks and newlines the same. You can
use an unambiguous grammar that requires no statement terminators or
separators. (As a few people have recently pointed out.)

This grammar can be made unambiguous, by giving precedence and
associativity to the various unary and binary operators. In
fact it is very similar to the grammar of the Turing language,
which is LL(1).

Note that in assignment statements, I only allow very simple
left-hand-sides, this could be extended to allow subscripting, but
as soon as you allow a statement to begin with a parenthesis,
you will lose all hope of an LL grammar. Consider the Block
a := foo
(*p) := bar
It begins too much like
a := foo(*p)
But perhaps LALR is salvagible.

When you allow whole expressions to be statement by themselves,
even this hope fades, as any useful grammar will be ambiguous. Consider
x := f(x)
which could be parsed as x := f followed by (x). Also what about
x - y
which could be parsed as x followed by -y. Obviously allowing
C's "empty statement" is also disastrous as the empty string can be parsed
as any number of empty statements. But it is a useless statement,
as you can always use {}, which need not cause a problem.

By changing the function call syntax (I use square bracket and let the type
checker figure out the difference between subscripts and arguments)
and segregating unary from binary operators you can get to
LALR(1) as evidenced by the yacc grammar for a C-like language below.

In an interactive language there is still the problem that the
end of a statement may not be recognized as such until the first
token of the next statement is read. I suggest using some special
token (I'll call it "done") that the user uses to request that the value
of the preceding statment be printed. The nonterminal Program should be
modified to read
Program : Block done {printVal();} Program | eof ;

Here is the yacc grammar:

%start Program
%token id if then else while do typename return eof
%right colonEqual
%left '+' '-'
%left '*' '/'
%left '~' /* Unary minus */
%%
Program : Block eof
;
Block : Stmt OptSemi Block
| Empty
;
OptSemi : ';'
| Empty
;
Empty :
;
Stmt : '{' Block '}'
| Exp
| if Exp OptThen Stmt else Stmt
| while Exp OptDo Stmt
| typename id /* var declaration */
| typename id '[' PList ']' Block return Exp /* func decl */
;
OptThen : then | Empty
;
OptDo : do | Empty
;
PList : typename id OptSemi PList
| Empty
;
Exp : id OptArgList
| Exp colonEqual Exp
| Exp '+' Exp
| Exp '-' Exp
| Exp '*' Exp
| Exp '/' Exp
| '~' Exp
| '(' Exp ')'
;
OptArgList : ArgList | Empty
;
ArgList : '[' ExpList ']'
;
ExpList : Exp OptSemi ExpList
| Empty
;

Theo Norvell

0 new messages