you can do this wiping "every once in a while" and speed will pop back
up within 20 files as it adapts to new input.
Ter
> Sam Harwell <mailto:
s...@tunnelvisionlabs.com>
> June 27, 2013 12:21 PM
>
> I forgot to mention /how/ to manually clear the DFA.
>
> In ANTLR 4.0:
>
> Arrays.fill(parser.getInterpreter().decisionToDFA, null);
>
> In ANTLR 4.1 (coming very soon):
>
> ATN atn = parser.getATN();
>
> for (int i = 0; i < parser.getInterpreter().decisionToDFA.length; i++) {
>
> parser.getInterpreter().decisionToDFA[i] = new
> DFA(atn.getDecisionState(i), i);
>
> }
>
> --
>
> Sam Harwell
>
> Owner, Lead Developer
>
>
http://tunnelvisionlabs.com
>
> *From:*
antlr-di...@googlegroups.com
> [mailto:
antlr-di...@googlegroups.com] *On Behalf Of *Sam Harwell
> *Sent:* Thursday, June 27, 2013 2:08 PM
> *To:*
antlr-di...@googlegroups.com
> *Subject:* RE: [antlr-discussion] Accumulating ATNConfig objects
> during a large batch parsing run
>
> The DFA required to parse your actual input is exceeding the memory
> available to the JVM. Here are some potential solutions to this situation.
>
> 1.Refactoring your grammar to reduce lookahead for decisions with long
> lookahead will reduce the number of DFA states created, which will in
> turn reduce the memory requirements to hold the constructed DFA. Both
> of these characteristics will improve your parser�s performance.
>
> 2.You can manually clear the DFA between files to prevent accumulation
> of states in the DFA. This will /obliterate/ parser performance � it
> could easily cause your parser to take 100x as long to parse the same
> input. At a minimum, I recommend that you clear the DFA only after
> many files have been parsed, e.g. clear the DFA after every 1000 files.
>
> We are working on some additional solutions which are not yet
> available in the public distribution.
>
> 1.In an experimental branch, I changed the way ATN configurations are
> stored in the ATNConfigSet. This change caused a 22-fold reduction in
> the number of ATNConfig instances stored in the DFA. In the reference
> code, we were averaging 43.81 ATNConfig instances per ATNConfigSet. In
> the experimental branch, the number of ATNConfigSet instances remained
> the same, but each had an average of only 1.98 ATNConfig instances.
>
> 2.In another experimental branch, additional optimizations are
> performed during the construction of the DFA, which is shown to reduce
> the number of DFA states created by 4x or more, with a matching
> reduction in both the number of ATNConfigSet instances and ATNConfig
> instances.
>
> 3.In another experimental branch, an automated grammar transformation
> pass during parser generation provided the ability to reduce lookahead
> for certain costly decisions without changing the grammar or resulting
> parse tree. Careful application of this feature to a particular
> grammar resulted in a 1000:1 reduction in the number of DFA states
> stored in the DFA for this grammar. The performance of this feature
> exceeded the performance of item #1 in the previous list, without
> causing the grammar or parse tree to change.
>
> Thank you,
>
> --
>
> Sam Harwell
>
> Owner, Lead Developer
>
>
http://tunnelvisionlabs.com
>
> *From:*
antlr-di...@googlegroups.com
> [mailto:
antlr-di...@googlegroups.com] *On Behalf Of *timhilco
> *Sent:* Thursday, June 27, 2013 1:53 PM
> *To:*
antlr-di...@googlegroups.com
> *Subject:* [antlr-discussion] Accumulating ATNConfig objects during a
> large batch parsing run
>
> I'm trying to lex and parse a large number of source files (`30,000
> COBOL source files). I've build a java program that will look at a
> directory and parse each of the source files. After a processing a
> large number of files, I'm getting a HeapDump OutOfMemory error.
>
> I've created some heap dumps and run a profile against the code and it
> seems like it is accumulating a large number of 'ATNConfig' objects
> that don't seem to get garabage collected after I parse a single file.
> I've tried lots of things to remove the references to these objects
> (nulling the Lexer and Parser between file, etc), but nothing seems to
> work.
>
> I'm not an expert, so I'm not sure I know what I'm talking about, but
> things seem to be pointing to the class ParserATNSimulator and method
> addDFAEdge. The 'configs' field in DFAState seems to be holding a
> ATNConfigSet which contains the ATNConfig objects.
>
> I would like to get rid of these objects between each file parsing, so
> I don't run out of memory.
>
> Looking for solutions to avoid the Out of Memory condition.
>
> Tim Hilgenberg
>
> --
> You received this message because you are subscribed to the Google
> Groups "antlr-discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
antlr-discussi...@googlegroups.com
> <mailto:
antlr-discussi...@googlegroups.com>.
> Sam Harwell <mailto:
s...@tunnelvisionlabs.com>
> June 27, 2013 12:07 PM
>
> The DFA required to parse your actual input is exceeding the memory
> available to the JVM. Here are some potential solutions to this situation.
>
> 1.Refactoring your grammar to reduce lookahead for decisions with long
> lookahead will reduce the number of DFA states created, which will in
> turn reduce the memory requirements to hold the constructed DFA. Both
> of these characteristics will improve your parser�s performance.
>
> 2.You can manually clear the DFA between files to prevent accumulation
> of states in the DFA. This will /obliterate/ parser performance � it
> could easily cause your parser to take 100x as long to parse the same
> input. At a minimum, I recommend that you clear the DFA only after
> many files have been parsed, e.g. clear the DFA after every 1000 files.
>
> We are working on some additional solutions which are not yet
> available in the public distribution.
>
> 1.In an experimental branch, I changed the way ATN configurations are
> stored in the ATNConfigSet. This change caused a 22-fold reduction in
> the number of ATNConfig instances stored in the DFA. In the reference
> code, we were averaging 43.81 ATNConfig instances per ATNConfigSet. In
> the experimental branch, the number of ATNConfigSet instances remained
> the same, but each had an average of only 1.98 ATNConfig instances.
>
> 2.In another experimental branch, additional optimizations are
> performed during the construction of the DFA, which is shown to reduce
> the number of DFA states created by 4x or more, with a matching
> reduction in both the number of ATNConfigSet instances and ATNConfig
> instances.
>
> 3.In another experimental branch, an automated grammar transformation
> pass during parser generation provided the ability to reduce lookahead
> for certain costly decisions without changing the grammar or resulting
> parse tree. Careful application of this feature to a particular
> grammar resulted in a 1000:1 reduction in the number of DFA states
> stored in the DFA for this grammar. The performance of this feature
> exceeded the performance of item #1 in the previous list, without
> causing the grammar or parse tree to change.
>
> Thank you,
>
> --
>
> Sam Harwell
>
> Owner, Lead Developer
>
>
http://tunnelvisionlabs.com
>
> *From:*
antlr-di...@googlegroups.com
> [mailto:
antlr-di...@googlegroups.com] *On Behalf Of *timhilco
> *Sent:* Thursday, June 27, 2013 1:53 PM
> *To:*
antlr-di...@googlegroups.com
> *Subject:* [antlr-discussion] Accumulating ATNConfig objects during a
> large batch parsing run
>
> I'm trying to lex and parse a large number of source files (`30,000
> COBOL source files). I've build a java program that will look at a
> directory and parse each of the source files. After a processing a
> large number of files, I'm getting a HeapDump OutOfMemory error.
>
> I've created some heap dumps and run a profile against the code and it
> seems like it is accumulating a large number of 'ATNConfig' objects
> that don't seem to get garabage collected after I parse a single file.
> I've tried lots of things to remove the references to these objects
> (nulling the Lexer and Parser between file, etc), but nothing seems to
> work.
>
> I'm not an expert, so I'm not sure I know what I'm talking about, but
> things seem to be pointing to the class ParserATNSimulator and method
> addDFAEdge. The 'configs' field in DFAState seems to be holding a
> ATNConfigSet which contains the ATNConfig objects.
>
> I would like to get rid of these objects between each file parsing, so
> I don't run out of memory.
>
> Looking for solutions to avoid the Out of Memory condition.
>
> Tim Hilgenberg
>
> --
> You received this message because you are subscribed to the Google
> Groups "antlr-discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
antlr-discussi...@googlegroups.com
> <mailto:
antlr-discussi...@googlegroups.com>.
> timhilco <mailto:
timh...@gmail.com>
> June 27, 2013 11:53 AM