Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Grammar with low-precedence postfix operator?

94 views
Skip to first unread message

Robert Jacobson

unread,
Feb 5, 2015, 1:27:35 PM2/5/15
to
I'm trying to write an ANTLR4 grammar for a language with a low precedence
postfix operator (Wolfram Language with '&', but I use a simple toy grammar
below). I'm struggling with finding the right pattern to express this
language.

Consider my (incorrect) grammar below that parses math expressions:

<snip>
grammar MathExpression;

//Parser Rules
term
: INT
| '(' expr ')' //parentheses
| <assoc=right> term '^' term //exponentiation
| term term //implicit multiplication
;

expr
: term
| ('+' | '-') expr //unary plus/minus
| expr ('/' | '*') expr //division and explicit multiplication
| expr ('+' | '-') expr //addition/subtraction
| expr '++' //increment
;

//Lexer Rules
INT : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
<snip>

The problem with this grammar is that the generated parser chokes on, for
example, the input '2++^3', as '^' cannot follow '++' in this grammar. If I
get rid of implicit multiplication I can collapse the grammar into a single
rule that parses the language I am wanting, but I don't want to give up
implicit multiplication.

What's the pattern for this situation?

Incidentally, this is my first post to this group, and I am charmed to see
several authors of my textbooks among the members here.

Best,

Robert

Kaz Kylheku

unread,
Feb 6, 2015, 10:12:10 PM2/6/15
to
On 2015-02-05, Robert Jacobson <rljac...@gmail.com> wrote:
> I'm trying to write an ANTLR4 grammar for a language with a low precedence
> postfix operator (Wolfram Language with '&', but I use a simple toy grammar
> below). I'm struggling with finding the right pattern to express this
> language.

I suspect this is not nicely solvable by LALR type tools. Only with hacks.

The problem is that the low precedence postfix operator has no starting
marker to look back to.

When this operator occurs in an expression, it means "everything which comes
before me in the expression is my argument".

The problem with this is that it leaves no room for the onstruct to be
*embedded* in another one. Since the operator says "everything is mine!" it
means, by the same token (pun intended) that "I am not a subexpression; I am
the top-level constituent!". And since that is the case, because
it is a postfix operator, it means that the expression ends right there.

If you want 2+2++^3 to mean (2+2)++^3, that means that you need
a special low-precedence versions of ^ (and other operators) for that
case, so the ++ expression can be a constituent of something.

Try something along these lines. This Yacc grammar only gives me one
shift-reduce conflicts, on account of the empty production rule for
pp_factor_continue. I didn't test it; I only skimmed through the state
transitions in y.output:

%token INT PLUSPLUS

%%

expr : mul_expr
| expr '+' mul_expr
| expr PLUSPLUS pp_add_continue
;

mul_expr : factor
| primary '*' factor
;

factor : primary
| factor '^' primary
;

primary : '(' expr ')'
| INT
;

pp_add_continue : '+' mul_expr
| pp_mul_continue
;

pp_mul_continue : '*' factor
| pp_factor_continue
;

pp_factor_continue : '^' primary
| /* NOTHING */
;

Anton Ertl

unread,
Feb 8, 2015, 3:50:09 AM2/8/15
to
Robert Jacobson <rljac...@gmail.com> writes:
>I'm trying to write an ANTLR4 grammar for a language with a low precedence
>postfix operator (Wolfram Language with '&', but I use a simple toy grammar
>below). I'm struggling with finding the right pattern to express this
>language.

Precedence is error-prone (both programmers writing expressions that
the computer silently interprets differently, and programmers reading
the expressions differently from what the computuper understands), and
you should be very reluctant to use it IMO.

For your example involving ^ and ++, I think that one should use
parentheses explicitly in most cases, e.g. (a^b)++^c or a^(b++)^c.
You can leave away the parentheses if there is no ambiguity, as in
your 2++^3, but that's not straightforward to express in BNF.

The right solution for accepting 2++^3 and rejecting a^b++^c might be
to use a parsing algorithm that allows ambiguity (Early or GLR), maybe
prune away some solutions through, e.g., type rules, and accept the
result only if it is unambiguous.

Anyway, on to your problem. The grammar you presented is ambiguous
(when interpreted as a BNF, maybe ANTLR resolves the ambiguity for
you) even for simple stuff like 1+1+1, so I have played around with an
even simpler example. It took me some time, but eventually I found a
solution (inspired by the unambiguous solution to the dangling else
problem); in bison syntax:

%start exprp
%token INT PLUSPLUS

%%
postfix: expr PLUSPLUS
;

term: INT
| '(' expr ')'
;

termp: postfix
| term
;

expr: termp '^' right
| term
;

right: term
| term '^' right
;

exprp: postfix
| expr
;

Bison produces no warnings for this, and with a little luck it also
parses the language you have in mind for the operators ^ and ++.
Maybe it is useful for your larger problem, but I suspect that this
approach becomes too complicated when involving more operators with
more precedence levels.

- anton
--
M. Anton Ertl
an...@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Robert Jacobson

unread,
Feb 8, 2015, 3:53:30 AM2/8/15
to
Thanks for the response. I'll have to digest this for a bit.

On Friday, 6 February 2015 22:12:10 UTC-5, Kaz Kylheku wrote:
> On 2015-02-05, Robert Jacobson <rljac...@gmail.com> wrote:
> > I'm trying to write an ANTLR4 grammar for a language with a low precedence
> > postfix operator (Wolfram Language with '&', but I use a simple toy grammar
> > below). I'm struggling with finding the right pattern to express this > > language.
>
> I suspect this is not nicely solvable by LALR type tools. Only with hacks.

But if I abandon implicit multiplication, collapse the grammar to a single
rule, and let ANTLR handle operator precedence, ANTLR produces a parser that
works. (I haven't made an attempt at a yacc grammar.) Perhaps this is because
ANTLR produces ALL(*) parsers.


> If you want 2+2++^3 to mean (2+2)++^3, that means that you need
> a special low-precedence versions of ^ (and other operators) for that
> case, so the ++ expression can be a constituent of something.
>
> Try something along these lines. This Yacc grammar only gives me one
> shift-reduce conflicts, on account of the empty production rule for
> pp_factor_continue. I didn't test it; I only skimmed through the state
> transitions in y.output:


I'll experiment with this, but this strategy strikes me as impractical for
large grammars because you'd need to duplicate a significant portion of the
grammar for the low-precedence versions of operators that can follow ++, yes?

Stefan Monnier

unread,
Feb 8, 2015, 7:06:24 PM2/8/15
to
> term
> : INT
> | '(' expr ')' //parentheses
> | <assoc=right> term '^' term //exponentiation
> | term term //implicit multiplication
> ;

> expr
> : term
> | ('+' | '-') expr //unary plus/minus
> | expr ('/' | '*') expr //division and explicit multiplication
> | expr ('+' | '-') expr //addition/subtraction
> | expr '++' //increment
> ;

This last production rule is wrong: "expr ++" should produce a "term", not
an "expr".


Stefan

Kaz Kylheku

unread,
Feb 8, 2015, 10:08:52 PM2/8/15
to
If that is so, then ++ shall not have the desired low precedence.

For instance a ^ b ++ be parsed as the exponentiation of two terms,
a and b++, not as ++ applied to the expression a ^ b, as desired.

[Oops, looks like you're right. -John]

Hans-Peter Diettrich

unread,
Feb 10, 2015, 1:50:10 AM2/10/15
to
Kaz Kylheku schrieb:

> For instance a ^ b ++ be parsed as the exponentiation of two terms,
> a and b++, not as ++ applied to the expression a ^ b, as desired.
>
> [Oops, looks like you're right. -John]

I wonder how to apply the C operators ++ or -- to something that is not
a variable (mutable item). What's the intended semantics here?

DoDi

Joe keane

unread,
Feb 10, 2015, 8:47:31 PM2/10/15
to
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>For your example involving ^ and ++, I think that one should use
>parentheses explicitly in most cases, e.g. (a^b)++^c or a^(b++)^c.
>You can leave away the parentheses if there is no ambiguity, as in
>your 2++^3, but that's not straightforward to express in BNF.

gcc does something like this

You can say:

if (x & y == 0)

probable error, warn or reject

if ((x & y) == 0)

OK

if (x & (y == 0))

OK, if that's what you want

I don't know how it is implemented.

if (x == 1 && y == 2)

fine IMHO

if ((x == 1) && (y == 2))

too many parens

if (((x >= 1) && (x <= 2)) || ((y >= 1) && (y <= 2)))

way too many parens IMHO, but shouldn't reject

The reason you don't want to reject these is that,
besides that it's style, they could come from macros.

If you color parens in macros, you could also look out for macros:

#define SUM(X, Y) (X)+(Y)

probable error

#define DOUBLE(X) (2*X)

probable error

Kaz Kylheku

unread,
Feb 11, 2015, 9:40:58 AM2/11/15
to
On 2015-02-10, Joe keane <j...@panix.com> wrote:
> Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>For your example involving ^ and ++, I think that one should use
>>parentheses explicitly in most cases, e.g. (a^b)++^c or a^(b++)^c.
>>You can leave away the parentheses if there is no ambiguity, as in
>>your 2++^3, but that's not straightforward to express in BNF.
>
> gcc does something like this
>
> You can say:
>
> if (x & y == 0)
>
> probable error, warn or reject
>
> if ((x & y) == 0)
>
> OK
>
> if (x & (y == 0))
>
> OK, if that's what you want
>
> I don't know how it is implemented.

You can implement it by adding a boolean attribute to a parse tree
node which indicates whether it had been parenthesized.

Fictitious Yacc production with semantic rule:

expr : '(' expr ')' { $2.parenthesized = 1;
$$ = $2; }


So now inside a rule like this:

expr : expr '&' expr

you can do:

{ prec_paren_check($1, PREC_MUL);
prec_paren_check($3, PREC_MUL);
... }


Where and_prec_paren_check looks something like this:

void prec_parent_check(ast_node *node, int prec)
{
if (node->prec <= prec && !node->parenthesized)
issue_warning("suggest parentheses around %N", node);
}

The idea here is this:

1. Obviously, if an expression is an operand of & and it is
not parenthesized, it must have a higher precedence than &.
The question is how high.

2. If that expression has a higher precedence than multiplication,
then there is no problem for instance ++x & y is clear!

3. If that expression has multiplicative precedence, or lower, then there is a
potential problem: the programmer may be expecting a*b&c to be a*(b&c),
or a<<b&c to be (a<<b)&c. The & operator "wants to" have really high
precedence, but doesn't.

I.e. the prec argument of prec_paren_check is the precedence level which the
operator in question (in this case bitwise end) *should* intuitively have. If
an argument has a precedence at or below this, and is unparenthesized, then
warn.

Robert Jacobson

unread,
Feb 21, 2015, 10:37:29 PM2/21/15
to
Thank you everyone for your help. The solution I found for solving my language
ambiguity uses ANTLR's semantic predicates feature which can turn off
production alternatives at parse time. I use a predicate (below) to suppress
implicit multiplication in the presence of a binary plus operator.

expr
: LEAF
| LPAREN expr RPAREN //parentheses
| <assoc=right> expr POWER expr
| (PLUS | MINUS) expr //unary plus/minus
| expr { _input.LT(1).getType() != PLUS && _input.LT(1).getType() != MINUS}?
expr //implicit multiplication
| expr (DIV | MUL) expr //division and explicit multiplication
| expr (PLUS | MINUS) expr //addition/subtraction
| expr '&' //postfix parenthesization.
;

Best regards,

Robert

0 new messages