Need pointers for optimizing ANTLR4 lexer/parser

Marc Baumbach

unread,

Jul 24, 2018, 8:12:33 PM7/24/18

to antlr-discussion

Hi!

I'm not the owner of this library (https://github.com/bkiers/Liqp), but as a consumer and current minor contributor, I'm trying to help contribute where I can. I'm not super familiar with how to optimize ANTLR4, but we are having an issue where we have seen a large increase in the time to parse since moving from ANTLR3 to ANTLR4. Here's an issue where we discuss some of the timing differences we've seen: https://github.com/bkiers/Liqp/issues/92. I already implemented a two-stage parser, which helped quite a bit, but we're still pretty far off from the performance we saw in ANTLR3.

I'm interested in some pointers as to what might be the best way to pinpoint the performance problem (Any tools or techniques). I suspect it's a token that's being processed too frequently, but I'm not sure which ones to look for first and trying to debug in Eclipse hasn't been too fruitful. The lexer and parser g4 files can be found here in case anyone can see an immediate issue there: https://github.com/bkiers/Liqp/tree/master/src/main/antlr4/liquid/parser/v4

Any tips would be greatly appreciated and please let me know if I can provide any additional information.

Thanks,

Marc

Loring Craymer

unread,

Jul 25, 2018, 4:45:25 AM7/25/18

to antlr-discussion

ANTLR4 does all recognition analysis at runtime, while ANTLR3 does most analysis prior to parser generation. If you have a complex recognition problem, parsing will be slow, and multi-level recursion makes for complex recognition problems. I suspect that that is why ST4 has not been ported to ANTLR4. ANTLR4 has a switch to do SLL recognition analysis (faster, less accurate), and you might be able to do something with that.

ANTLR Yggdrasil should solve your problem, but I am still about 3 weeks away from an early access release (in part because I ran into a bug while porting my StringTemplate compiler, and I expect it to take me about a week to fix that and retest).

--Loring

Ivan Kochurkin

unread,

Jul 30, 2018, 6:39:53 AM7/30/18

to antlr-discussion

I recommend moving all semantic predicates from the begin to the middle or end parts at least for lexer. I mean try to rewrite the following rule:

OutStart
 : ( {stripSpacesAroundTags && stripSingleLine}? SpaceOrTab* '{{'
   | {stripSpacesAroundTags && !stripSingleLine}? WhitespaceChar* '{{'
   | WhitespaceChar* '{{-'
   | '{{'
   ) -> pushMode(IN_TAG)
 ;

to the following one:

OutStart
 : ( SpaceOrTab* {stripSpacesAroundTags && stripSingleLine}? '{{'
   | WhitespaceChar* {stripSpacesAroundTags && !stripSingleLine}? '{{'
   | WhitespaceChar* '{{-'
   | '{{'
   ) -> pushMode(IN_TAG)
 ;

Also, see JavaScript lexer improvements in my Pul Request.

Marc Baumbach

unread,

Jul 30, 2018, 9:32:11 AM7/30/18

to antlr-di...@googlegroups.com

Thanks for the info Ivan, that does pinpoint the problem, though the specific fix doesn't change the performance. If I remove those two lines entirely, the performance is great. Getting closer. :)

--
You received this message because you are subscribed to a topic in the Google Groups "antlr-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/antlr-discussion/H6i1AfJWezw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward