Reusing parser/lexer

812 views
Skip to first unread message

goo...@hunsicker.de

unread,
Apr 18, 2013, 9:54:16 AM4/18/13
to antlr-di...@googlegroups.com
Howdy,

I might be missing something, but how can one reuse parsers/lexers to repeatedly parse some input without the need to create new instances?

I'm setting up parsing this way which works fine:

MyLexer lexer = new MyLexer();
lexer.setInputStream(new ANTLRFileStream(new File("file1")));
MyParser parser = new MyParser(new CommonTokenStream(lexer));
parser.parse();


But when I try to parse another file, things are not working as expected:

lexer.setInputStream(new ANTLRFileStream(new File("file2")));
parser.parse();

The run still succeeds, no errors appear, but no tree is build. Hm.


Ok, maybe I need to manually reset the TokenStream. BufferedTokenStream has a reset() method, but it did not yield the desired effect (the input is not changed, yielding the same tree as with the first run).

Then I tried resetting the token source as well with BufferedTokenStream#setTokenSource(). This produced the following exception:

java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.antlr.v4.runtime.BufferedTokenStream.nextTokenOnChannel(BufferedTokenStream.java:304)
at org.antlr.v4.runtime.CommonTokenStream.adjustSeekIndex(CommonTokenStream.java:65)
at org.antlr.v4.runtime.BufferedTokenStream.setup(BufferedTokenStream.java:247)
at org.antlr.v4.runtime.BufferedTokenStream.lazyInit(BufferedTokenStream.java:241)
at org.antlr.v4.runtime.CommonTokenStream.LT(CommonTokenStream.java:87)
at org.antlr.v4.runtime.Parser.enterRule(Parser.java:430)


Looking at the code, I think the "fetchedEOF" flag should be reset as well. At least that's what I've changed and things started to work, but either there's a bug or there must be a better way to reconfigure the parser, which I didn't find. Any help?

Thanks.


Sam Harwell

unread,
Apr 18, 2013, 11:04:46 AM4/18/13
to antlr-di...@googlegroups.com
For many reasons it's generally a better idea to just create new instances.

Why are you trying to avoid that?

--
Sam Harwell
Owner, Lead Developer
http://tunnelvisionlabs.com

-----Original Message-----
From: antlr-di...@googlegroups.com [mailto:antlr-di...@googlegroups.com] On Behalf Of goo...@hunsicker.de
Sent: Thursday, April 18, 2013 8:54 AM
To: antlr-di...@googlegroups.com
Subject: [antlr-discussion] Reusing parser/lexer

Howdy,

I might be missing something, but how can one reuse parsers/lexers to repeatedly parse some input without the need to create new instances?

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


goo...@hunsicker.de

unread,
Apr 18, 2013, 11:36:39 AM4/18/13
to antlr-di...@googlegroups.com

> For many reasons it's generally a better idea to just create new instances.
>
> Why are you trying to avoid that?

For now, out of habit. I haven't done any performance testing with ANTLR 4 yet.

I have an existing infrastructure originally based on ANTLR 2 where it used to be costly to create new instances. I'm simply trying to introduce an ANTLR 4 parser into the codebase and naturally started with the same tried and true approach of keeping one instance per thread.

Do you mind sharing the reasons to better create new instances? Thanks!

Sam Harwell

unread,
Apr 18, 2013, 11:47:13 AM4/18/13
to antlr-di...@googlegroups.com
In the early days of ANTLR 4 it was very costly to create new parser instances. I put a great deal of work into making sure this wasn't the case prior to releasing ANTLR 4.0. I've done extensive performance testing with ANTLR 4 (even this is a vast understatement), and can confidently say the overhead of creating a new parser instance will not pose a problem for you. :) Even the GoWorks IDE (where peak performance is critical to showing my company's novel approach to IDEs) uses a new parser instance for each code completion operation.

If you do want to use a shared instance, notice the code flow when REUSE_LEXER and REUSE_PARSER are set to true/false in the following unit test. These flags control whether the test uses shared or new instances of the lexer and parser for each file.
https://github.com/antlr/antlr4/blob/master/tool/test/org/antlr/v4/test/TestPerformance.java

Thank you,
--
Sam Harwell
Owner, Lead Developer
http://tunnelvisionlabs.com

-----Original Message-----
From: antlr-di...@googlegroups.com [mailto:antlr-di...@googlegroups.com] On Behalf Of goo...@hunsicker.de

goo...@hunsicker.de

unread,
Apr 18, 2013, 1:41:51 PM4/18/13
to antlr-di...@googlegroups.com
> I put a great deal of work into making sure this wasn't the case prior to releasing ANTLR 4.0. I've done extensive performance testing with ANTLR 4, and can confidently say the overhead of creating a new parser instance will not pose a problem for you. :)

This sounds very promising. ANTLR 4 is really a big step forward! Thanks!

Reply all
Reply to author
Forward
0 new messages