for (i=0 ; i<10 ; i=i+1) bar(i) // works
for (i=0 ; i<10 ; i=i+1) { // works
bar(i)
}
for (i=0 ; i<10 ; i=i+1) // doesn't work
{
bar(i)
}
These three statements should all be equal, but the the newline trailing
the for() in the last one matches to an empty expression.
What I think I need to do is to conditionally ignore newlines. Only when
trying to match a 'separator' rule, should a newline not be ignored. However,
I'm not savvy enough with yacc/lex to know what I'm doing. Here are the
rules I've got:
expression_statement
: separator { $$ = NULL; }
| expression separator { $$ = $1; }
;
separator
: '\n'
| ';'
;
I tried some code after matching 'expression', to disable/enable eating
of newlines, but this fails, causing the lexer to eat all newlines.
expression_statement
: separator { $$ = NULL; }
| expression { eatNL = 0; } separator { $$ = $1; eatNL = 1}
;
Any help would be greatly appreciated.
Thanks,
-- Noel (ngor...@speclab.cr.usgs.gov)
--
Send compilers articles to comp...@iecc.com,
meta-mail to compiler...@iecc.com.
Good idea.
>What I think I need to do is to conditionally ignore newlines.
This may not be necessary. Often a better idea is to follow the
C/Algol/etc idea of treating blanks and newlines the same. You can
use an unambiguous grammar that requires no statement terminators or
separators. (As a few people have recently pointed out.)
For Pascal-like languages, the grammar can even be LL(1).
Consider this language as an example:
Program --> Block eof
Block --> Statement OptSemi Block | Empty
OptSemi --> ; | Empty
Statement -->
id := Exp
| id(ArgList)
| if Exp then Block else Block fi
| while Exp do Block od
| var id : Type
| proc id(ParamDecls) Block end
| fun id(ParamDecls) Block return Exp
etc
Exp --> monop Exp | Exp binop Exp | (Exp) | id(ArgList) | id
Empty -->
This grammar can be made unambiguous, by giving precedence and
associativity to the various unary and binary operators. In
fact it is very similar to the grammar of the Turing language,
which is LL(1).
Note that in assignment statements, I only allow very simple
left-hand-sides, this could be extended to allow subscripting, but
as soon as you allow a statement to begin with a parenthesis,
you will lose all hope of an LL grammar. Consider the Block
a := foo
(*p) := bar
It begins too much like
a := foo(*p)
But perhaps LALR is salvagible.
When you allow whole expressions to be statement by themselves,
even this hope fades, as any useful grammar will be ambiguous. Consider
x := f(x)
which could be parsed as x := f followed by (x). Also what about
x - y
which could be parsed as x followed by -y. Obviously allowing
C's "empty statement" is also disastrous as the empty string can be parsed
as any number of empty statements. But it is a useless statement,
as you can always use {}, which need not cause a problem.
By changing the function call syntax (I use square bracket and let the type
checker figure out the difference between subscripts and arguments)
and segregating unary from binary operators you can get to
LALR(1) as evidenced by the yacc grammar for a C-like language below.
In an interactive language there is still the problem that the
end of a statement may not be recognized as such until the first
token of the next statement is read. I suggest using some special
token (I'll call it "done") that the user uses to request that the value
of the preceding statment be printed. The nonterminal Program should be
modified to read
Program : Block done {printVal();} Program | eof ;
Here is the yacc grammar:
%start Program
%token id if then else while do typename return eof
%right colonEqual
%left '+' '-'
%left '*' '/'
%left '~' /* Unary minus */
%%
Program : Block eof
;
Block : Stmt OptSemi Block
| Empty
;
OptSemi : ';'
| Empty
;
Empty :
;
Stmt : '{' Block '}'
| Exp
| if Exp OptThen Stmt else Stmt
| while Exp OptDo Stmt
| typename id /* var declaration */
| typename id '[' PList ']' Block return Exp /* func decl */
;
OptThen : then | Empty
;
OptDo : do | Empty
;
PList : typename id OptSemi PList
| Empty
;
Exp : id OptArgList
| Exp colonEqual Exp
| Exp '+' Exp
| Exp '-' Exp
| Exp '*' Exp
| Exp '/' Exp
| '~' Exp
| '(' Exp ')'
;
OptArgList : ArgList | Empty
;
ArgList : '[' ExpList ']'
;
ExpList : Exp OptSemi ExpList
| Empty
;
Theo Norvell