Parse Tree memory problem

suneel kumar

unread,

Aug 20, 2015, 6:30:08 AM8/20/15

to antlr-discussion

Hi,

In my case i have to build parse tree for all the input files first and then i need to walk through all the trees using walker.

So i am building the parse tree for all the input files and i am storing them in a map.

Some of my input files consists of around 40000 lines. when i send these files first the application is getting hanged performance in task manager is reaching 100%.

But if i parse only one file at a time then it is getting parsed without any problem. I think the problem is due the memory occupied by the tree which has more number of input lines.Can anyone help me to resolve this issue?

Thanks,

Suneel

Jim Idle

unread,

Aug 20, 2015, 7:23:48 AM8/20/15

to antlr-discussion

Do t try and keep it all in memory.

If you do actually need the entire tree then see if you can serialize them to an embedded database.

But can you not just extract the information you want in a more efficient form and then use that for your operations? If not then use a 64 bit JVM and give it lots of d tra memory with -Xms etc

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

suneel kumar

unread,

Aug 20, 2015, 8:36:41 AM8/20/15

to antlr-discussion

Hi Jim,

Firstly thanks for your reply. I am not using any database in my application. It should be database independent. Before calling walker on a particular file i need to analyze all the input files. So i thought analyzing through parser is the best one. So i am building the parse tree in the first step itself and then i am using the same tree for walker in the later part without building the parse tree again to reduce the tree building time. Is there any other better approach for my situation.

Jim Idle

unread,

Aug 20, 2015, 10:03:56 PM8/20/15

to antlr-di...@googlegroups.com

On Thu, Aug 20, 2015 at 8:36 PM, suneel kumar <suneelra...@gmail.com> wrote:

Hi Jim,

Firstly thanks for your reply. I am not using any database in my application. It should be database independent.

Why? If you use say mapdb, then you can use it is an off heap disk backed cache (assuming that the parse tree actually serializes - I have not really looked in to that).

Before calling walker on a particular file i need to analyze all the input files.

Well, what I was trying to say is that if you perform this analysis as you go, then you can listen to the parser, extract all the information you need about that translation unit and store it in an efficient way - you may not then run out of memory.

So i thought analyzing through parser is the best one. So i am building the parse tree in the first step itself and then i am using the same tree for walker in the later part without building the parse tree again to reduce the tree building time. Is there any other better approach for my situation.

Probably, but I don't really know what you are trying to do. If you truly cannot process the files in one visit, but must have ALL the trees available for your second phase, then you're only option is throwing memory at the 64bit JVM. It sounds like your design is incorrect, but you haven't said what you are trying to achieve. Perhaps your question is really, "How should I go about doing XYZ?" and not "How can I make the idea I came up with work?" ?

Are you sure you cannot fill in information as you walk and keep a list of things that you didn't have the information for when you walked that source? Think of it like making object code then linking the object code now that you have all the information after visiting all the translation units. I doubt that you have to keep all the parse trees around - if you DO have to do that, then it's probably better to just parse them all again. The source code will probably still be in disk cache and assuming an efficient parser, then it is usually a trivial amount of time to parse compared to the rest of the code.

Jim

Sam Harwell

unread,

Aug 21, 2015, 2:21:30 PM8/21/15

to antlr-di...@googlegroups.com

It seems like you have two options:

1. Provide more memory. Don’t be shy; I like to run my tests in the JVM with -Xmx12g.

2. Reduce memory needed. There is more than one way to do this:

a. If the memory is used by the parse trees: instead of storing the parse trees directly, store them using a weak reference (or soft reference) and reparse them if the GC collected them. Note that you might not be able to use ParseTreeProperty<T> with this approach, since it links properties directly to a specific object.

b. If the memory is used by the DFA cache: update your grammar rule that need long lookahead to require less lookahead. Another option is to use my fork of the runtime, which is highly tuned for minimizing the memory required to store the DFA. (For parsing with the reference Java grammar, the memory used in the DFA cache by my fork is less than 10% of the memory needed by the reference release of ANTLR 4. Results vary by grammar.)

Sam

Reply all

Reply to author

Forward