Parser very slow on Node JS with Java8 Target

182 views
Skip to first unread message

Julian Brendl

unread,
May 26, 2017, 1:10:02 PM5/26/17
to antlr-di...@googlegroups.com
I am trying to parse Java 8 files and create an AST out of them using antlr 4, all in Javascript / Node JS. Everything seems to be going well, with the exception of the actual AST construction. When I call ```parser.compilationUnit()```, the program does not terminate until it eventually runs out of heap memory. 

Here is what I have done so far:

I managed to generate the Java8 antlr files using: 

antlr4 -Dlanguage=JavaScript Java8.g4 -visitor




I created the chars, lexer, tokens and parser following [this](https://github.com/antlr/antlr4/blob/master/doc/javascript-target.md) guide:

const input = JavaExampleFile.code
const chars = new antlr4.InputStream(input)
const lexer = new Java8Lexer.Java8Lexer(chars)
const tokens = new antlr4.CommonTokenStream(lexer)
const parser = new Java8Parser.Java8Parser(tokens)
parser
.buildParseTrees = true
const tree = parser.compilationUnit()


Now the problem arises when I call ```const tree = parser.compilationUnit()```. In the grammar, the compilationUnit is defined as the following:

compilationUnit
   
: packageDeclaration? importDeclaration* typeDeclaration* EOF
   
;

 

It seems that typeDeclaration is causing the issue. The interesting thing is that everything works fine with Java but not with Java8.

Has anyone else dealt with this issue? Any help would be appreciated.

Eric Vergnaud

unread,
May 29, 2017, 11:46:47 AM5/29/17
to antlr-discussion
Not sure about typeDeclaration*. My impression is that this should be typeDeclaration? (you can only have one class, and ? is to cater for empty files).

Burt_...@hotmail.com

unread,
May 30, 2017, 4:06:44 PM5/30/17
to antlr-di...@googlegroups.com
Hello Julian,

Sam Harwell and I have been doing some work on an alternate Node JS based Antlr4 target, we call it antlr4ts.    It incorporates parser optimizations Sam's has put a lot of work into previously C# and Java.   You might find the benchmark directory n the repository interesting re Java grammar and perf, I think it's conceptually quite close to what you are talking about.

antlr4ts isn't getting full-time attention from either of us, and is still in an alpha state.   The performance isn't going to meet that of full-compiled language, but Sam waked me through the setup of running it aginst `java.lang.*` the java runtime source which for legal reasons are not checked in.

Input=java.lang.*, 159 files (1903 KiB, 88916 tokens, checksum 0xfe715ea9).   Total parse time in the range of 2.5-3.5 seconds, depending on options.    Roughly 634 KiB/second.   That seems respectable.

Frankly, I don't now very much about Java or the benchmark, but Sam might answer questions.   We've got a chat room about the project at https://gitter.im/tunnelvisionlabs/antlr4ts, or about them posted to our project's Issues section.




Julian Brendl

unread,
May 30, 2017, 4:23:19 PM5/30/17
to antlr-discussion
Hey Burt,

that actually sounds pretty cool yeah, not too far away from what I am looking for I think. The times look great! The one thing I am wondering however, is whether you support Java8? Because the two grammars I found in the benchmark folder are for Java 1.5, and the normal antlr4 works great for any Java version but Java 8. I'll probably hit up the Gitter. 

Eric Vergnaud

unread,
May 31, 2017, 12:25:42 AM5/31/17
to antlr-discussion
I believe the issue does not come at all from the runtime, but rather from the grammar, which contains semantic predicates written in Java, for JavaLetter and JavaLetterOrDigit notably.
These need to be replaced by corresponding is code

Eric Vergnaud

unread,
May 31, 2017, 1:02:05 AM5/31/17
to antlr-discussion
I mean replaced by JavaScript code

Burt_...@hotmail.com

unread,
May 31, 2017, 11:22:41 AM5/31/17
to antlr-di...@googlegroups.com
On Tuesday, May 30, 2017 at 1:23:19 PM UTC-7, Julian Brendl wrote:
Hey Burt,

that actually sounds pretty cool yeah, not too far away from what I am looking for I think. The times look great! The one thing I am wondering however, is whether you support Java8? 
 
I doubt either Sam or I will put any effort to porting the benchmark to Java8.  The purpose of a benchmark is to remain stable so from outputs running them are comparable.   Antlr4ts could support a Java8 grammar, provided any actions and semantic predicates embedded in the grammar are translated to JavaScript or TypeScript.  
Message has been deleted

Burt_...@hotmail.com

unread,
May 31, 2017, 11:51:57 AM5/31/17
to antlr-di...@googlegroups.com
Eric, It depends if the issue means the performance mentioned in the subject line of this thread, or support for the "Java8" grammar.   

On the later, I think the higher level problem is that grammars containing semantic predicates written in a specific target language have found their way into the antlr / grammars-v4 repository.   The headline and README on that repository says the samples posted there should not contain any actions, but perhaps inline semantic predicates should be excluded from the grammars as well.   

Ideally (and perhaps in the antlr4 language) there should be a way to declare abstract semantic predicates (and actions) that must be implemented in some target language before a concrete recognizer can be compiled.   


On Tuesday, May 30, 2017 at 9:25:42 PM UTC-7, Eric Vergnaud wrote:

Julian Brendl

unread,
May 31, 2017, 11:56:26 AM5/31/17
to antlr-discussion
I think you're right about the grammar being the issue to be honest, to reply to your earlier point, the typeDeclaration* does make sense I think (it is the same in Java7, where it works). It tried it out with a question markt instead of the asterix, and no difference. 
Reply all
Reply to author
Forward
0 new messages