PLY can but Bison and ANTLR4 can't? Low-precedence postfix operator with implicit multiplication.

261 views
Skip to first unread message

Robert Jacobson

unread,
Feb 15, 2015, 7:11:06 PM2/15/15
to antlr-di...@googlegroups.com
I am trying to write an ANTLR4 expression-style grammar for a language (namely Mathematica) that has the following features:
  1. the usual simple mathematical expressions
  2. unary plus and minus
  3. implicit multiplication
  4. a low-precedence postfix operator (Mathematica's "&" pure function operator) that "parenthesizes" the maximal complete expression that precedes it
It's an easy exercise to construct simple grammars that satisfy {1, 2, 4} and {1, 2, 3} (and other subsets), but I can't find a satisfactory solution for all four.

A naive (partial) ANTLR grammar (code here) might be the following:

expr
: LEAF
| '(' expr ')' //parentheses
| ('+' | '-') expr //unary plus/minus
| <assoc=right> expr '^' expr
| expr expr //implicit multiplication
| expr ('/' | '*') expr //division and explicit multiplication
| expr ('+' | '-') expr //addition/subtraction
// | expr '&' //postfix parenthesization.
;

However, this grammar parses "1+2+3" as (PLUS 1 (TIMES 2 (UNARYPLUS 3))), which obviously isn't what we want. This issue is easily solvable by rewriting the grammar using the standard grammar pattern for operator precedence, factoring out implicit multiplication into it's own production. However, having multiple productions representing the operator precedence hierarchy instead of having ANTLR sort it out means that the simple alternative " | expr '&' " will no longer work. I've consulted with the folks at comp.compilers, and their feedback indicates that the grammar rewriting necessary to include the low-precedence postfix operator is too complex for a sane person to want to do it for anything but the simplest language.

Can Bison do it with the right combination of precedence directives? Here's a partial Bison grammar (code here):
%left '-' '+'
%left '*' '/' ITIMES INT ID '('
%right UMINUS UPLUS
%right '^'
%%

expr
: ID
| INT
| '(' expr ')'
| expr '^' expr
| '-' expr %prec UMINUS
| '+' expr %prec UPLUS
| expr expr %prec ITIMES
| expr '/' expr
| expr '*' expr
| expr '+' expr
| expr '-' expr
;

Turns out this doesn't work. But here's the interesting thing: if I take this exact Bison grammar and port it to David Beazley's PLY (Python Lex-Yacc) it works! It parses the language I intend. (Here's the code.) I don't get it.

So obviously my question is, is there an ANTLR4 feature that will let me include the implicit multiplication, unary plus/minus, and postfix alternatives in the same production? In other words, is there a way to write a simple ANTLR4 grammar that satisfies requirements 1-4?

Best,

Robert

Mike Lischke

unread,
Feb 16, 2015, 3:21:31 AM2/16/15
to antlr-di...@googlegroups.com
Hi Robert,


However, this grammar parses "1+2+3" as (PLUS 1 (TIMES 2 (UNARYPLUS 3))), which obviously isn't what we want. This issue is easily solvable by rewriting the grammar using the standard grammar pattern for operator precedence, factoring out implicit multiplication into it's own production.

I'd much prefer to make the precedence explicit instead of relying on internal behaviors of the parser generator. As you saw, this can differ.

However, having multiple productions representing the operator precedence hierarchy instead of having ANTLR sort it out means that the simple alternative " | expr '&' " will no longer work.

Since trailing & would be the lowest level I'd expect a rule like:

expression: expr '&'?;

in addition to your blogged grammar to work as top level rule for your expressions. Have you tried that already?


Robert Jacobson

unread,
Feb 16, 2015, 4:06:43 PM2/16/15
to antlr-di...@googlegroups.com
Thanks for the response, Mike.


On Monday, 16 February 2015 03:21:31 UTC-5, Mike Lischke wrote:
I'd much prefer to make the precedence explicit instead of relying on internal behaviors of the parser generator. As you saw, this can differ.


I suppose it's a matter of style, except for certain kinds of conflicts that occur due to limited lookahead, an example of which is my & operator. John Levine describes this situation and a solution involving flattening the grammar to allow the parser to look farther ahead in the token stream on p. 191-192 of his Flex & Bison book. Of course, John is explicit about the fact that the problem arises due to the limitations of LALR(1) parsers. I'm not sure how his discussion applies to ALL(*) parsers. 
 
Since trailing & would be the lowest level I'd expect a rule like:

expression: expr '&'?;

in addition to your blogged grammar to work as top level rule for your expressions. Have you tried that already?


If I am understanding you correctly, your suggestion would only allow '&' to appear at the end of an expression and not as part of a subexpression. My intended language allows for expressions like "1+3&^2" which should be parsed as (POWER (AMP (PLUS 1 3)) 2), whereas your proposal requires that '&' be the last token in the token stream. One could just put the alternative " | expr '&' " at the end of the top level rule (named expr), but doing so would only allow '&' to be followed by operators appearing in that top level rule; hence my desire to keep all alternatives, including implicit multiplication, in the same production rule.

Best,

Robert

Terence Parr

unread,
Feb 16, 2015, 6:24:51 PM2/16/15
to antlr-di...@googlegroups.com
Hi. As you pointed out 1+2 is ambiguous because it could be one plus two or it could be modification of 1 and +2 and ANTLR resolved it using precedence as specified by the grammar. There’s just no way around the language itself being ambiguous; the grammar for it will necessarily be ambiguous as well. Can you tell me why the postfix operator doesn’t work?
Ter
> --
> You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Robert Jacobson

unread,
Feb 16, 2015, 6:57:54 PM2/16/15
to antlr-di...@googlegroups.com
Hi, Ter.


On Monday, 16 February 2015 18:24:51 UTC-5, the_antlr_guy wrote:
 Can you tell me why the postfix operator doesn’t work?
Ter


Yes of course.  Consider the following "almost correct" ANTLR4 grammar (github repo here):
term
: LEAF
| '(' expr ')' //parentheses
;
//Implicit multiplication
factor
: term
| <assoc=right> term '^' factor //power
| term factor //implicit multiplication.
;
//Unary minus/plus
expr
: factor
| ('+' | '-') expr //unary plus/minus
| expr ('/' | '*') expr //division and explicit multiplication
| expr ('+' | '-') expr //addition/subtraction
| expr '&' //low-precedence postfix operator
;

This grammar does almost everything I want it to. The only thing it can't do is parse expressions in which '&' is followed by an operator in a different precedence hierarchy (by which I mean a different production) from itself. For example, with this grammar ANTLR complains about the input "1+3&^2", because '^' is not in the same precedency hierarchy as '&'. Stare at the grammar for a bit and you will see why: this grammar just does not admit expressions of the form "1+3&^2", because "expire '&'" can only ever be reduced to an expr, not to a term or factor, and thus cannot appear where a term or factor appears in the grammar.

The solution (I think?) is to collapse everything into a single production, but I describe the problems with this above.

Best,

Robert

Robert Jacobson

unread,
Feb 18, 2015, 6:24:15 PM2/18/15
to antlr-di...@googlegroups.com
Terrance, I'd like to know what you think of the following potential solution to my problem.

The problem with including every alternative in a single production rule is that ANTLR does not always resolve the ambiguity with the '+' token (similarly '-' token) in the way that I want. So we'd like to have a way to distinguish unary plus from binary plus (similarly unary/binary minus). How can we do so? Well, binary plus always follows a complete expression, while unary plus never follows a complete express. In other words, the parser only sees a binary plus after a reduction, and the parser only sees a unary plus after shifting the previous token. If the lexer can determine if the parser just shifted or reduced, the lexer can emit a unary or binary plus token accordingly. 

My (perhaps naive) question for you is, does ANTLR4 expose enough to the programmer to implement this solution?

If not, one workaround is for me to determine myself a catalog of tokens that are always shifted and determine whether an encountered '+' is following by one of these tokens. Presumably this is doable using lexer modes or something. 

Best,

Robert

Loring Craymer

unread,
Feb 18, 2015, 10:31:10 PM2/18/15
to antlr-di...@googlegroups.com
Robert--

It would probably help if you thought through the problem after reading up on formal parsing theory.  Precedence operators in LALR language processors are not "natural":  they are a kluge to impose the type of (recursive descent) structure that is natural to LL processing.  They are very useful if you plan to do straight transformation from one HLL to another without pausing to do any arithmetic, but not if you plan to do arithmetic for constant folding and the like.  Ter had a student build a precedence processor for ANTLR, inserting semantic predicates in the generated code to impose precedence at runtime; that was 7 or 8 years ago and it has never made it into a formal ANTLR release.  For the most part, though, it is better to "steal" a recursive descent expression decomposition from a C or Java ANTLR grammar and tweak it for your particular problem.  Maintaining an expression grammar as part of a larger grammar is usually a no-op, but become handy when you decide to incorporate a little optimization for code generation.

ANTLR, being LL and not LR, does neither shift nor reduce--those are specific to standard LR parsing.

--Loring

Robert Jacobson

unread,
Feb 19, 2015, 4:59:44 PM2/19/15
to antlr-di...@googlegroups.com
Thank you for taking the time to respond, Loring, and thank you for your advice and comments. 

On Wednesday, 18 February 2015 22:31:10 UTC-5, Loring Craymer wrote:
 They are very useful if you plan to do straight transformation from one HLL to another without pausing to do any arithmetic...

Actually, this is my intended application! 
 
For the most part, though, it is better to "steal" a recursive descent expression decomposition from a C or Java ANTLR grammar and tweak it for your particular problem.  Maintaining an expression grammar as part of a larger grammar is usually a no-op, but become handy when you decide to incorporate a little optimization for code generation.

Yes, stealing grammar bits from another similar language seems like a great strategy. I've been studying parsers written by others for the language I wish to parse, namely Mathematica. The challenge for me has been that expressions in traditional languages like C and Java do not have the features that appear to be the source of all of my parsing problems, and parsers for my language of interest that other people have written seem to lean heavily on the idiosyncrasies of the tools they are written with. 
 

ANTLR, being LL and not LR, does neither shift nor reduce--those are specific to standard LR parsing.


Oh, point taken! However, I don't think the strategy I describe relies on this. As long as I can detect when and expr has been successfully parsed (which is easy with ANTLR's @after action) and also when an expr has not been fully parsed (not clear to me how to do this), then I should be able to distinguish binary plus from unary plus. 


Best,

Robert 

Terence Parr

unread,
Feb 20, 2015, 7:04:53 PM2/20/15
to antlr-di...@googlegroups.com
The problem is that there should never be any real communication between the parser in the lexer. The parser is doing all sorts of look ahead which would prevent any such collaboration. I think you have a context-sensitive problem here and you would need semantic predicates to resolve it, possibly tracking the prior operators as you suggest.
Ter

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Dictation in use. Please excuse homophones, malapropisms, and nonsense. 

Robert Jacobson

unread,
Feb 21, 2015, 3:47:17 PM2/21/15
to antlr-di...@googlegroups.com
Thanks everyone for your help. If you are interested (or if you found this via Google), the solution I found is quite simple:

expr
: LEAF
| LPAREN expr RPAREN //parentheses
| <assoc=right> expr POWER expr
| (PLUS | MINUS) expr //unary plus/minus
| expr { _input.LT(1).getType() != PLUS }? expr //implicit multiplication
| expr (DIV | MUL) expr //division and explicit multiplication
| expr (PLUS | MINUS) expr //addition/subtraction
| expr '&' //postfix parenthesization.
;

Best regards,

Robert

Terence Parr

unread,
Feb 21, 2015, 3:52:49 PM2/21/15
to antlr-di...@googlegroups.com
seems like you should also check for MINUS in that predicate, right?
Ter

Robert Jacobson

unread,
Feb 21, 2015, 3:56:33 PM2/21/15
to antlr-di...@googlegroups.com
On Saturday, 21 February 2015 15:52:49 UTC-5, the_antlr_guy wrote:
seems like you should also check for MINUS in that predicate, right?
Ter


 Oops! Yes, of course you are right. :)
Reply all
Reply to author
Forward
0 new messages