Antlr4 performance - org.antlr.v4.runtime.atn.ParserATNSimulator::adaptivePredict

96 views
Skip to first unread message

Steve Ebersole

unread,
Nov 14, 2017, 2:20:09 PM11/14/17
to antlr-discussion
I am trying to hunt down a pretty big performance issue in our attempts to use Antlr 4.  I'm pretty sure this is related to a very complicated rule - at least the way I wrote it :P

I was hoping to get some help in tracking this down or even just understanding exactly what `org.antlr.v4.runtime.atn.ParserATNSimulator::adaptivePredict` is doing so I can hopefully better understand how to fix it.

I've seen discussions on this group that this is common with "expression" rules which is exactly the main culprit in our perf testing.


expression
   : expression DOUBLE_PIPE expression          # ConcatenationExpression
   | expression PLUS expression            # AdditionExpression
   | expression MINUS expression           # SubtractionExpression
   | expression ASTERISK expression         # MultiplicationExpression
   | expression SLASH expression           # DivisionExpression
   | expression PERCENT expression             # ModuloExpression
   | MINUS expression                   # UnaryMinusExpression
   | PLUS expression                    # UnaryPlusExpression
   | caseStatement                         # CaseExpression
   | coalesce                         # CoalesceExpression
   | nullIf                           # NullIfExpression
   | literal                          # LiteralExpression
   | parameter                            # ParameterExpression
   | entityTypeReference                 # EntityTypeExpression
   | path                            # PathExpression
   | function                         # FunctionExpression
   | LEFT_PAREN querySpec RIGHT_PAREN        # SubQueryExpression
   ;


The sub-rules that I think cause problems are the first 6 and then `path`.  `path` for sure is the one that specifically is showing up in our initial profiles.  It is very a very complicated rule:

path
// a SimplePath may be any number of things like:
// * Class FQN
// * Java constant (enum/static)
// * a simple dotIdentifierSequence-style path
// :(
: dotIdentifierSequence # SimplePath
// a Map.Entry cannot be further dereferenced
| ENTRY LEFT_PAREN pathAsMap RIGHT_PAREN # MapEntryPath
// only one index-access is allowed per path
| path LEFT_BRACKET expression RIGHT_BRACKET (pathTerminal)? # IndexedPath
| pathRoot (pathTerminal)? # CompoundPath
;

pathRoot
: identifier # SimplePathRoot
| TREAT LEFT_PAREN dotIdentifierSequence AS dotIdentifierSequence RIGHT_PAREN # TreatedPathRoot
| KEY LEFT_PAREN pathAsMap RIGHT_PAREN # MapKeyPathRoot
| (VALUE | ELEMENTS) LEFT_PAREN collectionReference RIGHT_PAREN # CollectionValuePathRoot
;

pathTerminal
: (DOT identifier)+
;

I'm really just lost as to how to best optimize this to get it to perform adequately - as it stands I wont be able to justify using Antlr4 :(  Please, any help would be very appreciated.

Mike Lischke

unread,
Nov 15, 2017, 3:07:12 AM11/15/17
to antlr-di...@googlegroups.com
Hi Steve,

I was hoping to get some help in tracking this down or even just understanding exactly what `org.antlr.v4.runtime.atn.ParserATNSimulator::adaptivePredict` is doing so I can hopefully better understand how to fix it.

The ALL(*) algorithm is described in a PDF which is available in public. I sent a link once to the mailing list, but can't find it anymore. See if you can locate it via global search, though it's not easy to understand all the details in the algorithm.


I've seen discussions on this group that this is common with "expression" rules which is exactly the main culprit in our perf testing.


expression
   : expression DOUBLE_PIPE expression          # ConcatenationExpression
   | expression PLUS expression            # AdditionExpression
   | expression MINUS expression           # SubtractionExpression
   | expression ASTERISK expression         # MultiplicationExpression
   | expression SLASH expression           # DivisionExpression
   | expression PERCENT expression             # ModuloExpression
   | MINUS expression                   # UnaryMinusExpression
   | PLUS expression                    # UnaryPlusExpression
   | caseStatement                         # CaseExpression
   | coalesce                         # CoalesceExpression
   | nullIf                           # NullIfExpression
   | literal                          # LiteralExpression
   | parameter                            # ParameterExpression
   | entityTypeReference                 # EntityTypeExpression
   | path                            # PathExpression
   | function                         # FunctionExpression
   | LEFT_PAREN querySpec RIGHT_PAREN        # SubQueryExpression
   ;


I have no definite answer for you, but I would try to remove the left recursion in `expression` first. That should lower the amount of prediction work need for this part. The general way for left recursion removal is:

a: a PLUS a | a MINUS a | B | C;

becomes:

a: (B | C) (PLUS a | MINUS a)*;

That means the rule is rewritten such that all non recursive alts appear first, followed by zero or more occurrences of all formerly recursive alts (without the recursive rule reference).


The sub-rules that I think cause problems are the first 6 and then `path`.  `path` for sure is the one that specifically is showing up in our initial profiles.  It is very a very complicated rule:

path
// a SimplePath may be any number of things like:
// * Class FQN
// * Java constant (enum/static)
// * a simple dotIdentifierSequence-style path
// :(
: dotIdentifierSequence # SimplePath
// a Map.Entry cannot be further dereferenced
| ENTRY LEFT_PAREN pathAsMap RIGHT_PAREN # MapEntryPath
// only one index-access is allowed per path
| path LEFT_BRACKET expression RIGHT_BRACKET (pathTerminal)? # IndexedPath
| pathRoot (pathTerminal)? # CompoundPath
;

pathRoot
: identifier # SimplePathRoot
| TREAT LEFT_PAREN dotIdentifierSequence AS dotIdentifierSequence RIGHT_PAREN # TreatedPathRoot
| KEY LEFT_PAREN pathAsMap RIGHT_PAREN # MapKeyPathRoot
| (VALUE | ELEMENTS) LEFT_PAREN collectionReference RIGHT_PAREN # CollectionValuePathRoot
;

pathTerminal
: (DOT identifier)+
;

I'm really just lost as to how to best optimize this to get it to perform adequately - as it stands I wont be able to justify using Antlr4 :(  Please, any help would be very appreciated.

You could disable part by part to see which is especially heavy in terms of prediction time. I see no predicates in the rules, so this is no problem here.


Steve Ebersole

unread,
Nov 15, 2017, 5:45:57 PM11/15/17
to antlr-discussion
Thanks for the reply Mike!

I'll look for the PDF and look it over.

the_antlr_guy

unread,
Dec 9, 2017, 3:33:43 PM12/9/17
to antlr-discussion
hi! also try the intellij plugin. i have a profiler for grammars :)

Steve Ebersole

unread,
Dec 10, 2017, 3:11:51 AM12/10/17
to antlr-di...@googlegroups.com

Is that available from the JetBrains plugin repo?


--
You received this message because you are subscribed to a topic in the Google Groups "antlr-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/antlr-discussion/9qqTpz8sHUE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

the_antlr_guy

unread,
Dec 10, 2017, 11:55:34 AM12/10/17
to antlr-discussion
yep
Reply all
Reply to author
Forward
0 new messages