We're using the official Kotlin ANTLR4 grammar (
https://kotlinlang.org/docs/reference/grammar.html) for a few code analysis tasks in a long running server application. However, we've noticed some problems with the way that the grammar is structured that lead to poor performance—specifically very high memory usage.
(Note that my knowledge of this area is limited, so apologies in advance if I use any incorrect terminology.)
Specifically, the Kotlin grammar includes many pass-through rules that end up being reached even when specific language features are not used. For example, here's the definition for the rule that matches the Kotlin elvis operator:
elvisExpression
: infixFunctionCall (NL* elvis NL* infixFunctionCall)*
;
However, this is also the only way to reach the infixFunctionCall rule, which then passes through to the rangeExpression rule, and so and so forth.
The result is that even parsing a short cascade if ends up generating a massive parse tree because getting from one if to the next involves traversal through dozens of rules that aren't matching but are required to reach the rule that does. Even a moderate cascade if can easily blow the stack, and we're noticing that repeatedly utilizing the Kotlin grammar is causing the ANTLR4 caches to grow quickly and eventually produce memory pressure and eventually a crash. I'm assuming that there is some reason that the grammar is designed in this way. But in practice the design is causing performance and stability problems. (This toolchain is used to do analysis of student submissions in CS1, so the use of moderately-sized cascade if's isn't all that uncommon.)
(This also makes certain analysis tasks more irritating, since the presence of a elvisExpression rule match in the parse tree does not indicate that there was actually an elvis expression used in the code. But performance is the higher-order concern at this point.)
So a few questions. First, are there any potential performance optimizations that we can turn on that would help us continue to use the existing Kotlin grammar? I noticed references to a performance optimized ANTLR fork, but it's not clear if it addresses this specific issue.
Second, assuming that some degree of grammar optimization is required, is there an extremely mechanical way of modifying the existing grammar to address these performance problems, perhaps by reducing the number of rules or chain length? I obviously don't want to do anything that would affect correctness, and we probably will need to occasionally incorporate upstream changes. If this could be automated, that would be amazing :-).
FWIW we're also using the Java grammar, which doesn't seem to have any of the same issues, but is also structurally quite different.