I am trying to write an ANTLR4 expression-style grammar for a language (namely Mathematica) that has the following features:
- the usual simple mathematical expressions
- unary plus and minus
- implicit multiplication
- a low-precedence postfix operator (Mathematica's "&" pure function operator) that "parenthesizes" the maximal complete expression that precedes it
It's an easy exercise to construct simple grammars that satisfy {1, 2, 4} and {1, 2, 3} (and other subsets), but I can't find a satisfactory solution for all four.
A naive (partial) ANTLR grammar (
code here) might be the following:
| expr |
| : LEAF |
| | '(' expr ')' //parentheses |
| | ('+' | '-') expr //unary plus/minus |
| | <assoc=right> expr '^' expr |
| | expr expr //implicit multiplication |
| | expr ('/' | '*') expr //division and explicit multiplication |
| | expr ('+' | '-') expr //addition/subtraction |
| // | expr '&' //postfix parenthesization. |
| ; |
However, this grammar parses "1+2+3" as (PLUS 1 (TIMES 2 (UNARYPLUS 3))), which obviously isn't what we want. This issue is
easily solvable by rewriting the grammar using the standard grammar pattern for operator precedence, factoring out implicit multiplication into it's own production. However, having multiple productions representing the operator precedence hierarchy instead of having ANTLR sort it out means that the simple alternative " | expr '&' " will no longer work. I've consulted with the folks at comp.compilers, and their feedback indicates that the grammar rewriting necessary to include the low-precedence postfix operator is too complex for a sane person to want to do it for anything but the simplest language.
Can Bison do it with the right combination of precedence directives? Here's a partial Bison grammar (
code here):
| %left '-' '+' |
| %left '*' '/' ITIMES INT ID '(' |
| %right UMINUS UPLUS |
| %right '^' |
|
|
| %% |
|
|
| expr |
| : ID |
| | INT |
| | '(' expr ')' |
| | expr '^' expr |
| | '-' expr %prec UMINUS |
| | '+' expr %prec UPLUS |
| | expr expr %prec ITIMES |
| | expr '/' expr |
| | expr '*' expr |
| | expr '+' expr |
| | expr '-' expr |
| ; |
Turns out this doesn't work. But here's the interesting thing: if I take this exact Bison grammar and port it to David Beazley's PLY (Python Lex-Yacc) it works! It parses the language I intend. (
Here's the code.) I don't get it.
So obviously my question is, is there an ANTLR4 feature that will let me include the implicit multiplication, unary plus/minus, and postfix alternatives in the same production? In other words, is there a way to write a simple ANTLR4 grammar that satisfies requirements 1-4?
Best,
Robert