Using rule alternatives in Javascript with visitors.

Frank Adrian

unread,

Jun 6, 2019, 1:10:25 PM6/6/19

to antlr-discussion

When rule alternatives are used in javascript, how can I check to see which alternative was chosen at a context in the parse tree - I've read chapter 7 in the ANTLR book and searched online, but still couldn't find a good example of a javascript visitor that used rule alternatives. According to one relatively clear online article, in Java, one can supposedly check the class of the context node to show which of the visitor methods to call. However, in javascript, the Context classes are not exported from the blahParser.js module and so are undefined in the blahVisitor.js module, making this method infeasible. In my specific case, the productions that are giving me trouble are like the following:

expression

: term #termProduction

| term + expression #plusExpression

;

term

: constant #constantProduction

| ( expression ) #parenthesizedProduction

;

constant

: '1'

;

The unusual thing to me here is that the visitor method is not built for expression, but it seems to be built for term (although it's called blahVisitor.visitTermProduction), even though I remember reading somewhere that a production using rule alternatives will not have a generated visitor.

Can someone point me to a clear example in javascript of a visitor that traverses rule alternatives across multiple productions?

Frank Adrian

unread,

Jun 6, 2019, 8:00:24 PM6/6/19

to antlr-di...@googlegroups.com

OK, I think I figured it out... Reading through the file RuleContext.js, I found two methods - getAltNumber() and setAltNumber(altNum) - and a comment that led me to the option contextSuperClass and told me what it could do for me. To make the alternative number appear in your context node, you create a subclass of ParserRuleContext that overrides the setAltNumber method, squirreling it away in each of your compiler's context nodes, which are now, helpfully, all subclasses of your subclass and can call the method getAltNumber which is also overridden by your subclass to grab the alternative number from where you've stored it. This should allow one to dispatch to the proper generated visitor method for each rule alternative. I say should, because I've just started to implement this and I don't know if all of this works yet. I'll see. All I know is that it doesn't seem as if the documentation is particularly forthcoming or clear on this point.

Frank Adrian

unread,

Jun 6, 2019, 11:38:25 PM6/6/19

to antlr-discussion

A bit more progress... I was able to insert the subclass into the prototype chain. I am now receiving parse alternative numbers. Here is my subclass file:

It seems to me that the alternative numbers for the expression and expressionTerm context nodes (which should each be 1 for the example above) are incorrect. Any idea as to why this ishappening? Or is this a bug in ANTLR? Any idea how to fix it? Or how to debug it?

On Thursday, June 6, 2019 at 10:10:25 AM UTC-7, Frank Adrian wrote:

Frank Adrian

unread,

Jun 6, 2019, 11:49:45 PM6/6/19

to antlr-discussion

Resending my message - hopefully the line breaks will stick this time.

A bit more progress... I was able to insert the subclass into the prototype chain. I am now receiving parse alternative numbers. Here is my subclass file:

var ParserRuleContext = require('antlr4/index').ParserRuleContextfunction

ParserRuleContextWithAlternative(parser, parent, invokingState) {

ParserRuleContext.call(this, parser, parent, invokingState)

this.altNum = -1

return this

}

ParserRuleContextWithAlternative.prototype = Object.create(ParserRuleContext.prototype)

ParserRuleContextWithAlternative.prototype.constructor = ParserRuleContextWithAlternative

ParserRuleContextWithAlternative.prototype.setAltNumber = function(altNum) { this.altNum = altNum }

ParserRuleContextWithAlternative.prototype.getAltNumber = function() { return this.altNum }

exports.ParserRuleContextWithAlternative = ParserRuleContextWithAlternative

I added the require for this module after the require of the visitor in the parser. This seems to work, as I am now receiving the alternative numbers - at least that is what Ican surmise by this parse tree I printed from one of my productions - here's the input:

library C define A: 1

and here's the corresponding parse tree:

(library:1 (libraryDefinition:1 library (identifier:1 X)) (statement:1 (expressionDefinition:1 define (identifier:1 A) : (expression:-1 (expressionTerm:-1 (term:2 (literal:4 1)))))))

Note that the numbers associated with the nodes appear to be correct. The expressionDefinition is the first production of statement, the literalTerm is indeed the second production in

term,and a numeric token is the fourth production in literal. All of this would seem to be correct... except for the -1 alternative numbers in the expression and expressionTerm nodes.

Does anyone know why these nodes have -1 for alternative numbers? Here are the salient definitions (none of the productions upstream of here have rule alternative tags defined):

expression

: expression #termExpression

Any idea as to why this is happening? Is this a bug in ANTLR? Or is this another of my misunderstandings of the API?

Frank Adrian

unread,

Jun 7, 2019, 12:32:13 AM6/7/19

to antlr-discussion

I've done a bit more 'sperimentin'. I've verified that what's happening is that setAltNumber is not being called for the expression and expressionTerm productions. How? I initialized the alt number to a ridiculous value (3000). I ran the parser on my input and

it produced a parse tree showing that the altNumbers for those productions were unchanged from their initial ridiculous value. So that's what's going on. Why is the question. And, of course, how to fix it or work around it.

Mike Lischke

unread,

Jun 7, 2019, 3:03:06 AM6/7/19

to antlr-discussion

Hi Frank,

Does anyone know why these nodes have -1 for alternative numbers? Here are the salient definitions (none of the productions upstream of here have rule alternative tags defined):

You might have hit what was recently asked on SO: https://stackoverflow.com/questions/56466439/alt-number-not-set-properly-for-rules-with-left-recursion.

There's also an ANTLR issue linked where the wrong alt number is mentioned.

Mike
--
www.soft-gems.net

Frank Adrian

unread,

Jun 7, 2019, 4:35:50 AM6/7/19

to antlr-discussion

All of these issues do appear to be related - the two productions that are messed up in my case are the only left-recursive ones in the grammar. I guess I can always go back to recognizing the rule alternative for those two productions by analyzing the structure of the context. Pity, because they're the two largest productions in the system and there are a lot of alternatives to check. Having the alternative number makes this trivial. The good news is it appears as if I can still use the alternative numbers in the other productions in my system - it should make things marginally easier.

What's the track record in the ANTLR world of getting something like this fixed? I'm sort of resigned to having to work around this for now, but it would be nice to have it taken care of. It would be a real time-saver and make the code one has to write to analyze productions like these much simpler.

Mike Lischke

unread,

Jun 7, 2019, 8:45:48 AM6/7/19

to antlr-discussion

All of these issues do appear to be related - the two productions that are messed up in my case are the only left-recursive ones in the grammar. I guess I can always go back to recognizing the rule alternative for those two productions by analyzing the structure of the context.

Why don't you just rewrite these rule to be non-left-recursive? It's easy to do. Under the hood ANTLR4 does the same, the generated ATN is as if the rule were not left recursive. Try changing one and check the reported alts.

What's the track record in the ANTLR world of getting something like this fixed?

Like for any other open source project. If the maintainer(s) have time and interest to fix things they will do it. Otherwise things move only when users fix that and open pull requests (this is how it is for ANTLR).

Mike
--
www.soft-gems.net

michal.o

unread,

Jun 7, 2019, 9:06:33 AM6/7/19

to antlr-di...@googlegroups.com

Hello everyone,

I am using ANTL4 (java, cpp runtime) for several years. I am using it to
parse Verilog, VHDL , ...

Many times I had to manually fix the grammar. I wonder if there is some
program which can for example automatically remove left recursion from
the grammar. I know that it is not so hard to do manually but as grammar
getting bigger and deeper...

Is there at least a simple tool which can detect or even fix ambiguity?
(I know it is a hard problem) For example the standard grammar for VHDL
2008 is full of ambiguities in complex rules so it is very hard to see
and very hard to fix manually.

I am looking also for some tool which can "profile" the grammar.

(I am thinking about writing such a tool, if there is none, because I
need performance and reliability of the grammar, which I am not able to
achieve otherwise.)

Michal

Mike Lischke

unread,

Jun 7, 2019, 9:36:33 AM6/7/19

to antlr-discussion

Many times I had to manually fix the grammar. I wonder if there is some program which can for example automatically remove left recursion from the grammar. I know that it is not so hard to do manually but as grammar getting bigger and deeper...

Removing direct left recursion is indeed a pretty simple thing to do. I explained the principle here: https://stackoverflow.com/questions/41788100/antlr4-mutual-left-recursion-grammar/41789097#41789097. I guess people only keep their left recursive rules because they are often more natural, compared to non-recursive solutions.

Is there at least a simple tool which can detect or even fix ambiguity? (I know it is a hard problem) For example the standard grammar for VHDL 2008 is full of ambiguities in complex rules so it is very hard to see and very hard to fix manually.

I know of none, but a while ago someone posted messages here about a commercial tool that can do some kind of complex analysis. Check the mailing list.

I am looking also for some tool which can "profile" the grammar.

Profiling is possible by either using the IntelliJ plugin from Terence Parr, which reports a couple interesting internals on a parse run with sample input or you can do that yourself by enabling profiling in your parser by calling Parser::setProfile(true). This will replace the normal ParserATNSimulator with a ProfilingATNSimulator instance that produces a ParseInfo structure with results for a parse run (see also Parser::getParseInfo()).

(I am thinking about writing such a tool, if there is none, because I need performance and reliability of the grammar, which I am not able to achieve otherwise.)

I'm the author of the ANTLR4 extension for vscode and I'm thinking a while already about adding profiling support. If you like to co-coperate, we can probably arrange something.

Mike
--
www.soft-gems.net

Frank Adrian

unread,

Jun 7, 2019, 12:29:35 PM6/7/19

to antlr-discussion

Thanks Mike.

Here is my original production (slightly rearranged to put all of the left recursion at the bottom:

expression

: expressionTerm #termExpression

| retrieve #retrieveExpression

| query #queryExpression

| 'cast' expression 'as' typeSpecifier #castExpression

| 'not' expression #notExpression

| 'exists' expression #existenceExpression

| ('duration' 'in')? pluralDateTimePrecision 'between' expressionTerm 'and' expressionTerm #durationBetweenExpression

| 'difference' 'in' pluralDateTimePrecision 'between' expressionTerm 'and' expressionTerm #differenceBetweenExpression

| expression 'properly'? 'between' expressionTerm 'and' expressionTerm #betweenExpression

| expression 'is' 'not'? ('null' | 'true' | 'false') #booleanExpression

| expression ('is' | 'as') typeSpecifier #typeExpression

| expression ('<=' | '<' | '>' | '>=') expression #inequalityExpression

| expression intervalOperatorPhrase expression #timingExpression

| expression ('=' | '!=' | '~' | '!~') expression #equalityExpression

| expression ('in' | 'contains') dateTimePrecisionSpecifier? expression #membershipExpression

| expression 'and' expression #andExpression

| expression ('or' | 'xor') expression #orExpression

| expression 'implies' expression #impliesExpression

and this is what it becomes via the transform your SO reply outlines:

expression: (expressionTerm

| retrieve

| query

| 'cast' expression 'as' typeSpecifier

| 'not' expression

| 'exists' expression

| ('duration' 'in')? pluralDateTimePrecision 'between' expressionTerm 'and' expressionTerm

| 'difference' 'in' pluralDateTimePrecision 'between' expressionTerm 'and' expressionTerm )

( 'properly'? 'between' expressionTerm 'and' expressionTerm

| 'is' 'not'? ('null' | 'true' | 'false')

| ('is' | 'as') typeSpecifier

| ('<=' | '<' | '>' | '>=') expression

| intervalOperatorPhrase expression

| ('=' | '!=' | '~' | '!~') expression

| ('in' | 'contains') dateTimePrecisionSpecifier? expression

| 'and' expression

| ('or' | 'xor') expression

| 'implies' expression

Is this correct? I think I followed the algorithm faithfully. All I will say is that the transformation, though simple, builds a production which seems to take much more code to traverse than the one which could have built using alternative numbers (if they worked).

The other problem I have is that the head sets of expressionTerm and query are non-disjoint - in the case when the first token is an identifier, the parse is ambiguous without further lookahead. Is there any way, other than manually walking the grammar and computing which lookahead tokens to use to disambiguate the choice of parse, to do this? Is there any way to check the class of the context node that's generated by either expressionTerm or query when this production has (say) only one child node, e.g., Could I use something like ctx.getChild(0).prototype === ExpressionTermContext.prototype vs. QueryExpressionContext.prototype or something like that to disambiguate the parse)? Alternatively is there any way to get ANTLR to spit out things like head and lookahead sets to save me from grubbing through the grammar?

Frank Adrian

unread,

Jun 7, 2019, 4:59:49 PM6/7/19

to antlr-discussion

I think I can finally put this to bed. The results... While reading the parser code, I noticed that each generated context class had a method called accept(visitor) which called the visitor method corresponding to that context's class. Luckily, this allowed me to call accept() on a node to visit it without knowing its type. As such, I can use the original grammar and simply call accept() on the nodes, examining the output of the visit to see what kind of node they happened to be. That, plus the fact that anything I had ambiguous in selecting alternatives for were contexts with just one child, where I could simply call accept() to process the node. In any case, I know how to continue writing my parser's visitor.

However, I still can't emphasize enough how much simpler the traversal code would be with a working altNumber for left recursive productions.

But my problem is solved enough. Thanks to all here, especially to Mike, for their suggestions and for listening to my woes.

Loring Craymer

unread,

Jun 7, 2019, 10:28:10 PM6/7/19

to antlr-discussion

Syntactic ambiguity occurs when, for a grammar fragment such as "a b (c | d) e" can be matched by a sequence of characters/tokens, and the subsequence that matched "(c|d)" matches both c and d. Recognizers (the component of a parser that determines how to select alternatives) that accept all syntactic ambiguities are termed "generalized" recognizers (GLL or GLR). Parser generators like ANTLR (3 or 4) and ANTLR Yggdrasil that are based on generalized recognition with an ambiguity resolver (PEG-style for these: choose the first--of "a | b", a is first--of the ambiguities when parsing), resolve ambiguities but report when they occur. For the most part, syntactic ambiguities in grammars are a the result of design decisions, whether intentional or not, and the point of ambiguity warnings is to ask the developer "Are you sure that this ambiguity should be resolved this way, or should you reorder some alternatives?". For formal language grammars, PEG-style resolution is a good approach because it is fast; other approaches are possible: the Stanford Parser uses weighted alternatives and "probabilistic parsing".

The linear GLL engine behind ANTLR Yggdrasil has the potential to do comprehensive ambiguity reporting (including rule-oriented backtrace), although the current version just messily reports them to the console. Analysis reporting is not yet a priority, but I will get around to in before the commercial release. In the meantime, I expect to have the second early access release out next week and will post an announcement when that happens.

--Loring

Frank Adrian

unread,

Jun 8, 2019, 3:10:29 PM6/8/19

to antlr-di...@googlegroups.com

I finally figured out a workaround to the altNumber issue for left-recursive productions. The correct post-actions are called for each production, so you can manually call setAltNumber there on each clause from a left recursive rule. Do this and altNumber is set to the correct value - even for left-recursive productions:

expression

: expressionTerm { $ctx.setAltNumber(1) } #termExpression

| retrieve { $ctx.setAltNumber(2) } #retrieveExpression

| query { $ctx.setAltNumber(3) } #queryExpression

| expression 'is' 'not'? ('null' | 'true' | 'false') { $ctx.setAltNumber(4) } #booleanExpression

| expression ('is' | 'as') typeSpecifier { $ctx.setAltNumber(5) } #typeExpression

| 'cast' expression 'as' typeSpecifier { $ctx.setAltNumber(6) } #castExpression

| 'not' expression { $ctx.setAltNumber(7) } #notExpression

| 'exists' expression { $ctx.setAltNumber(8) } #existenceExpression

| expression 'properly'? 'between' expressionTerm 'and' expressionTerm { $ctx.setAltNumber(9) } #betweenExpression

| ('duration' 'in')? pluralDateTimePrecision 'between' expressionTerm 'and' expressionTerm

{ $ctx.setAltNumber(10) } #durationBetweenExpression

| 'difference' 'in' pluralDateTimePrecision 'between' expressionTerm 'and' expressionTerm

{ $ctx.setAltNumber(11) } #differenceBetweenExpression

| expression ('<=' | '<' | '>' | '>=') expression { $ctx.setAltNumber(12) } #inequalityExpression

| expression intervalOperatorPhrase expression { $ctx.setAltNumber(13) } #timingExpression

| expression ('=' | '!=' | '~' | '!~') expression { $ctx.setAltNumber(14) } #equalityExpression

| expression ('in' | 'contains') dateTimePrecisionSpecifier? expression { $ctx.setAltNumber(15) } #membershipExpression

| expression 'and' expression { $ctx.setAltNumber(16) } #andExpression

| expression ('or' | 'xor') expression { $ctx.setAltNumber(17) } #orExpression

| expression 'implies' expression { $ctx.setAltNumber(18) } #impliesExpression

| expression ('|' | 'union' | 'intersect' | 'except') expression { $ctx.setAltNumber(19) } #inFixSetExpression ;

This allows you to write a simple visitor for the production like this:

cqlVisitor.prototype.visitExpression = function(ctx) {

var altNum = ctx.getAltNumber()

var visitors = [

function() { return this.visitTermExpression(ctx) },

function() { return this.visitRetrieveExpression(ctx) },

function() { return this.visitQueryExpression(ctx) },

function() { return this.visitBooleanExpression(ctx) },

function() { return this.visitTypeExpression(ctx) }

function() { return this.visitCastExpression(ctx) },

function() { return this.visitNotExpression(ctx) },

function() { return this.visitExistenceExpression(ctx) },

function() { return this.visitBetweenExpression(ctx) },

function() { return this.visitDurationBetweenExpression(ctx) },

function() { return this.visitDifferenceBetweenExpression(ctx) },

function() { return this.visitInequalityExpression(ctx) },

function() { return this.visitTimingExpression(ctx) },

function() { return this.visitEqualityExpression(ctx) },

function() { return this.visitMembershipExpression(ctx) },

function() { return this.visitAndExpression(ctx) },

function() { return this.visitorExpression(ctx) },

function() { return this.visitImpliesExpression(ctx) },

function() { return this.visitInFixSetExpression(ctx) }

]

if (altNum && altNum > 0 && altNum <= visitors.length) return visitors[altNum-1]()

return null

}

where each visitor handles only one clause in the grammar. This is much better than the handwritten recognizer you'd have to write for this (I did - it's about three times as long with a lot more logic). I shudder to think what the logic would be if you worked with the production with left-recursion eliminated).

Given that the workaround is so simple, I'd think a fix to the left-recursion altNumber problem would be simple as well. I hope for a speedy resolution.

Mike Lischke

unread,

Jun 9, 2019, 4:09:02 AM6/9/19

to antlr-discussion

expression : expressionTerm { $ctx.setAltNumber(1) } #termExpression | retrieve { $ctx.setAltNumber(2) } #retrieveExpression | query { $ctx.setAltNumber(3) } #queryExpression | expression 'is' 'not'? ('null' | 'true' | 'false') { $ctx.setAltNumber(4) } #booleanExpression | expression ('is' | 'as') typeSpecifier { $ctx.setAltNumber(5) } #typeExpression | 'cast' expression 'as' typeSpecifier { $ctx.setAltNumber(6) } #castExpression | 'not' expression { $ctx.setAltNumber(7) } #notExpression | 'exists' expression { $ctx.setAltNumber(8) } #existenceExpression | expression 'properly'? 'between' expressionTerm 'and' expressionTerm { $ctx.setAltNumber(9) } #betweenExpression | ('duration' 'in')? pluralDateTimePrecision 'between' expressionTerm 'and' expressionTerm { $ctx.setAltNumber(10) } #durationBetweenExpression | 'difference' 'in' pluralDateTimePrecision 'between' expressionTerm 'and' expressionTerm { $ctx.setAltNumber(11) } #differenceBetweenExpression | expression ('<=' | '<' | '>' | '>=') expression { $ctx.setAltNumber(12) } #inequalityExpression | expression intervalOperatorPhrase expression { $ctx.setAltNumber(13) } #timingExpression | expression ('=' | '!=' | '~' | '!~') expression { $ctx.setAltNumber(14) } #equalityExpression | expression ('in' | 'contains') dateTimePrecisionSpecifier? expression { $ctx.setAltNumber(15) } #membershipExpression | expression 'and' expression { $ctx.setAltNumber(16) } #andExpression | expression ('or' | 'xor') expression { $ctx.setAltNumber(17) } #orExpression | expression 'implies' expression { $ctx.setAltNumber(18) } #impliesExpression | expression ('|' | 'union' | 'intersect' | 'except') expression { $ctx.setAltNumber(19) } #inFixSetExpression ;

This allows you to write a visitor for the production like this:

cqlVisitor.prototype.visitExpression = function(ctx) {
var altNum = ctx.getAltNumber()
var visitors = [
function() { return this.visitTermExpression(ctx) },
function() { return this.visitRetrieveExpression(ctx) },
function() { return this.visitQueryExpression(ctx) },
function() { return this.visitBooleanExpression(ctx) },
function() { return this.visitTypeExpression(ctx) }
function() { return this.visitCastExpression(ctx) },
function() { return this.visitNotExpression(ctx) },
function() { return this.visitExistenceExpression(ctx) },
function() { return this.visitBetweenExpression(ctx) },
function() { return this.visitDurationBetweenExpression(ctx) },
function() { return this.visitDifferenceBetweenExpression(ctx) },
function() { return this.visitInequalityExpression(ctx) },
function() { return this.visitTimingExpression(ctx) },
function() { return this.visitEqualityExpression(ctx) },
function() { return this.visitMembershipExpression(ctx) },
function() { return this.visitAndExpression(ctx) },
function() { return this.visitorExpression(ctx) },
function() { return this.visitImpliesExpression(ctx) },
function() { return this.visitInFixSetExpression(ctx) }
]
if (altNum && altNum > 0 && altNum <= visitors.length) return visitors[altNum-1]()
return null
}

Tbh. I fail to understand what you are after with this complicated handling. You can directly override/implement the visitor methods you're interested in, in your (single main) visitor. No need to base this on the alt numbers. These numbers are mostly thought for inspection of the parsing process, e.g. for optimization. Can you explain a bit, what you are actually trying to do?

Mike
--
www.soft-gems.net

Reply all

Reply to author

Forward