CPP : Cutting down on memory footprint help request.

40 views

Skip to first unread message

Ben Griffin

unread,

Sep 18, 2018, 3:17:37 AM9/18/18

to antlr-di...@googlegroups.com

It took longer than I thought to get my head around ANTLR - but once the grammar was correct, and my understanding (at a simple level) of the invocation was in place, it's pretty awesome.

We are wishing to replace our current (hand-rolled and buggy) parser with ANTLR - and it's all working. To a point.

We are implementing a macro language, which processes upwards of 15,000 nodes (think files) at a time, each one of which will be composed of a core set of macros and source-code fields (which themselves are visitable and are, in effect, unparameterised macros). There are about 2,000 core macros and another 50,000 instances of visitable source-code in any given instance.

Some macro definitions are visited over 100k times - and some visitable source is likewise visited 15k or more times.

My first foray on this group was because we weren't aware of the impact of the parse versus the visit. So we now ensure that all source and macros are parsed just once, and then the parsed code is stored in a map as follows (each macro has a unique name):

std::unordered_map<std::string, std::shared_ptr<macro> > macroLibrary;

Using this method, we have brought process execution time to 200% the original hand-rolled parse, which we consider to be acceptable.

The problem is that the map is massive, and the memory cost is prohibitive. Our servers are killing these processes on our larger instances due to prohibitive memory allocation. FYI our own code uses shared_ptr everywhere we can and we aren't seeing significant memory leaks.

Each macro instance consists of the following ANTLR4 objects (namespaces removed):

ANTLRInputStream *_input;

CommonTokenStream *_cts;

mLexer *_mLexer; // We seem to be able to remove this without affecting visit.

mParser *_mParser;

mParserVisitor *_mParserVisitor;

mParser::CodeContext *_mParseTree;

QUESTION 1: Can we delete or remove any of these objects once we have parsed the source-code?

Obviously we need to keep enough to visit the code, but that is all.

QUESTION 2: Are there any other ways that we can reduce the memory impact of the ANTLR code? StackExchange suggests tweaking things like clearing the DFA with ParserATNSimulator.clearDFA(), but my concern is that it may have prohibitively negative performance impact being a static. Someone also mentioned resetting the PredictionContextCache - but again, I don't know if that is used in visits, and I don't know what sort of impact it would have.

Mike Lischke

unread,

Sep 19, 2018, 3:24:07 AM9/19/18

to antlr-discussion

Hi Ben,

We are implementing a macro language, which processes upwards of 15,000 nodes (think files) at a time, each one of which will be composed of a core set of macros and source-code fields (which themselves are visitable and are, in effect, unparameterised macros). There are about 2,000 core macros and another 50,000 instances of visitable source-code in any given instance.

Some macro definitions are visited over 100k times - and some visitable source is likewise visited 15k or more times.

My first foray on this group was because we weren't aware of the impact of the parse versus the visit. So we now ensure that all source and macros are parsed just once, and then the parsed code is stored in a map as follows (each macro has a unique name):

std::unordered_map<std::string, std::shared_ptr<macro> > macroLibrary;

Using this method, we have brought process execution time to 200% the original hand-rolled parse, which we consider to be acceptable.

The problem is that the map is massive, and the memory cost is prohibitive.

How much is "massive" here? 2000 macros, each consisting of the 6 pointers makes it 96K RAM. Nothing I'd call massive.

I don't know what you store in the macros, but don't duplicate the input. It's usually enough to store a macro name and start and end positions of the replacement text.

Our servers are killing these processes on our larger instances due to prohibitive memory allocation. FYI our own code uses shared_ptr everywhere we can and we aren't seeing significant memory leaks.

What controls the life time of the macros? Who is the owner? If the macroLibrary manages them you don't need shared pointers. Just make the map: `std::unordered_map<std::string, macro> macroLibrary;` instead and pass around references.

Each macro instance consists of the following ANTLR4 objects (namespaces removed):

ANTLRInputStream *_input;
CommonTokenStream *_cts;
mLexer *_mLexer;

mParser *_mParser;

mParserVisitor *_mParserVisitor;
mParser::CodeContext *_mParseTree;

QUESTION 1: Can we delete or remove any of these objects once we have parsed the source-code?
Obviously we need to keep enough to visit the code, but that is all.

First we'd need to know if all these instances are created per macro or are just references to instances stored somewhere else. There's certainly no need to create an own input stream, lexer + parser for each macro. After all, you have a file containing these macros, which is parsed by a single parser instance. You could remove the references and provide access by the outer class that holds the macroLibrary, but as I have shown above, that wouldn't make a really big difference.

QUESTION 2: Are there any other ways that we can reduce the memory impact of the ANTLR code? StackExchange suggests tweaking things like clearing the DFA with ParserATNSimulator.clearDFA(), but my concern is that it may have prohibitively negative performance impact being a static. Someone also mentioned resetting the PredictionContextCache - but again, I don't know if that is used in visits, and I don't know what sort of impact it would have.

These suggestions aren't really helpful. The caches are created to speed up parsing. If you clear them, you will get much higher parse times and the memory footprint doesn't really go down, because on the next parse run these caches are built again.

Have you instrumented your code to see where the memory is spent on? What is the biggest structure (and how big is it actually)?

Mike
--
www.soft-gems.net

Ben Griffin

unread,

Sep 19, 2018, 5:35:29 AM9/19/18

to antlr-di...@googlegroups.com

On 19 Sep 2018, at 08:24, 'Mike Lischke' via antlr-discussion <antlr-di...@googlegroups.com> wrote:

Hi Ben,

We are implementing a macro language, which processes upwards of 15,000 nodes (think files) at a time, each one of which will be composed of a core set of macros and source-code fields (which themselves are visitable and are, in effect, unparameterised macros). There are about 2,000 core macros and another 50,000 instances of visitable source-code in any given instance.

Some macro definitions are visited over 100k times - and some visitable source is likewise visited 15k or more times.

My first foray on this group was because we weren't aware of the impact of the parse versus the visit. So we now ensure that all source and macros are parsed just once, and then the parsed code is stored in a map as follows (each macro has a unique name):

std::unordered_map<std::string, std::shared_ptr<macro> > macroLibrary;

Using this method, we have brought process execution time to 200% the original hand-rolled parse, which we consider to be acceptable.

The problem is that the map is massive, and the memory cost is prohibitive.

How much is "massive" here? 2000 macros, each consisting of the 6 pointers makes it 96K RAM. Nothing I'd call massive.

Yeah, it's not the pointers that are taking up the space - but what the pointers are pointing to...

We are seeing 10-20 Gigs memory being used.

I don't know what you store in the macros, but don't duplicate the input. It's usually enough to store a macro name and start and end positions of the replacement text.

I'm not sure I understand. We keep the name as a key, and the class includes the 6 pointers, and a few other things (a few bools and size_t - nothing significant).

We do keep the expansion text, which we could probably lose - but it's maybe 200mb of text altogether, which we can easily absorb.

Our servers are killing these processes on our larger instances due to prohibitive memory allocation. FYI our own code uses shared_ptr everywhere we can and we aren't seeing significant memory leaks.

What controls the life time of the macros? Who is the owner? If the macroLibrary manages them you don't need shared pointers. Just make the map: `std::unordered_map<std::string, macro> macroLibrary;` instead and pass around references.

The lifetime of the macros is during a single compile of a node-tree. We deal with one node-tree per process, and a 'compile' is a single standalone process.

Predefined macros are are loaded at the start, and are parsed on their first expansion. We then store the parse-tree in a (static) cache and use a visitor on subsequent expansions.

This cache is the unordered_map as mentioned.

Each macro instance consists of the following ANTLR4 objects (namespaces removed):

ANTLRInputStream *_input;
CommonTokenStream *_cts;
mLexer *_mLexer;
mParser *_mParser;
mParserVisitor *_mParserVisitor;
mParser::CodeContext *_mParseTree;

QUESTION 1: Can we delete or remove any of these objects once we have parsed the source-code?
Obviously we need to keep enough to visit the code, but that is all.

First we'd need to know if all these instances are created per macro or are just references to instances stored somewhere else. There's certainly no need to create an own input stream, lexer + parser for each macro. After all, you have a file containing these macros, which is parsed by a single parser instance. You could remove the references and provide access by the outer class that holds the macroLibrary, but as I have shown above, that wouldn't make a really big difference.

Okay, so since yesterday (writing out questions is always a good way of thinking about things) we've taken out both the Visitor and the Lexer from the class - we instantiate the visitor when we revisit the macro - the impact seems to be minimal. Same goes with the lexer, which we use just for the parse. So the list of pointers is now the four (per macro definition).

This is how each macro instance is parsed (ready for subsequent visits).

_input = new ANTLRInputStream(expansion);

mLexer  _mLexer(_input);

cts = new CommonTokenStream(&_mLexer);
_mParser = new mParser(cts);
_mParser->setBuildParseTree(true);
mParseTree = _mParser->code();

There's certainly no need to create an own input stream, lexer + parser for each macro. After all, you have a file containing these macros, which is parsed by a single parser instance. 

I don't know how you say that. There is no file. There are hundreds of thousands - even millions - of strings, many of which are context dependent, all of which have to be treated as re-visitable macro definitions.

I guess what you are suggesting is that we could serialise everything out into a massive file, and store pointers for each macro boundary, using some form of distinguished file separator between each macro - but...

(1) There are many cases where we would be parsing dead code - different compiles generate different code uses depending upon context. Imagine we have something like (where '@foo()' is a macro call) switch(day of the week,monday(),tuesday(),wednesday()....) we would be parsing all seven macros every day, regardless of the fact that we will only ever use one in any given run. That may feel contrived, but we have many instances like that. (Yes, the code could be refactored - but we are dealing with a massive pre-existing code-base and we cannot refactor everything without risk of errors creeping in. Refactoring isn't an option.)

(2) Reporting ParseErrors would be hard too - the entire 200mb "file" would have to be lexically valid and parse correctly before we could do anything - this, again, isn't really suitable to us.

Also, when we visit the code, we need to get fragments of the macro (the text parts that define it), in other words, it does something, corresponding to the parseTree.

Ideally, we would like to be able to solely store each parsetree for each macro and use the visitor to access it - but it seems that the parsetree depends upon it's assigned parser, and likewise the input-stream.

It's the internal interconnectivity of Antlr4 which is not so clear to me.

Even more useful would be to be able to serialise the parsetree so that we can store (and reload) parsetrees without having to re-parse them. Is that something that ANTLR can do?

One of the elements of building the parsetree I don't understand is that the InputStream must be kept, and the CommonTokenStream, but the Lexer - not so much. My guess is that the TokenStream, after Lexing, is a set of position pointers to the InputStream. Is that correct?

QUESTION 2: Are there any other ways that we can reduce the memory impact of the ANTLR code? StackExchange suggests tweaking things like clearing the DFA with ParserATNSimulator.clearDFA(), but my concern is that it may have prohibitively negative performance impact being a static. Someone also mentioned resetting the PredictionContextCache - but again, I don't know if that is used in visits, and I don't know what sort of impact it would have.

These suggestions aren't really helpful. The caches are created to speed up parsing. If you clear them, you will get much higher parse times and the memory footprint doesn't really go down, because on the next parse run these caches are built again.

This is exactly as I feared. Thanks for the confirmation.

Have you instrumented your code to see where the memory is spent on? What is the biggest structure (and how big is it actually)?

Yes - but my instruments get overloaded. We need to do more work on that. But the actual expenditure is 10-20+ Gigs memory being used.

Ben Griffin

unread,

Sep 19, 2018, 5:56:19 AM9/19/18

to antlr-discussion

I must be missing something. If the antlr4::Parser were to be used just once across a file, there would be no purpose in maintaining statics for eg _decisionToDFA, _sharedContextCache, etc.
Therefore, I got confused by what you were suggesting!

Mike Lischke

unread,

Sep 21, 2018, 10:14:50 AM9/21/18

to antlr-di...@googlegroups.com

The problem is that the map is massive, and the memory cost is prohibitive.

How much is "massive" here? 2000 macros, each consisting of the 6 pointers makes it 96K RAM. Nothing I'd call massive.
Yeah, it's not the pointers that are taking up the space - but what the pointers are pointing to...
We are seeing 10-20 Gigs memory being used.

OK, that confirms you are using shared lexer, parser, visitor and code context. I assume you checked that your code context instance isn't the culprit.

I don't know what you store in the macros, but don't duplicate the input. It's usually enough to store a macro name and start and end positions of the replacement text.
I'm not sure I understand. We keep the name as a key, and the class includes the 6 pointers, and a few other things (a few bools and size_t - nothing significant).

Yes, that's what I wanted to clarify. Not that you have own instances in each macro. Still, it is not necessary to store references to all these shared instances. The owner of the macro map can do that once.

We do keep the expansion text, which we could probably lose - but it's maybe 200mb of text altogether, which we can easily absorb.

Still it's duplicating something you already have. You can always easily get the original text from your input stream.

First we'd need to know if all these instances are created per macro or are just references to instances stored somewhere else. There's certainly no need to create an own input stream, lexer + parser for each macro. After all, you have a file containing these macros, which is parsed by a single parser instance. You could remove the references and provide access by the outer class that holds the macroLibrary, but as I have shown above, that wouldn't make a really big difference.

Okay, so since yesterday (writing out questions is always a good way of thinking about things) we've taken out both the Visitor and the Lexer from the class - we instantiate the visitor when we revisit the macro - the impact seems to be minimal. Same goes with the lexer, which we use just for the parse.

Hint: don't destroy the lexer if you need to access tokens. The parse tree otherwise contains pointers that point nowhere.

There's certainly no need to create an own input stream, lexer + parser for each macro. After all, you have a file containing these macros, which is parsed by a single parser instance.

I don't know how you say that. There is no file. There are hundreds of thousands - even millions - of strings, many of which are context dependent, all of which have to be treated as re-visitable macro definitions.
I guess what you are suggesting is that we could serialise everything out into a massive file, and store pointers for each macro boundary, using some form of distinguished file separator between each macro - but...

No, that was not my intention - just bad wording. What I meant by "file" is the input you parse (which often comes from a file). I just wanted to make sure you don't create the parser pipeline for each single macro individually - and even keep them (which certainly will require a lot of RAM). I wasn't sure if all macros come in one big input stream or are parsed each separately (now I know). Still, you can reuse all the parse pipeline objects like I do here: https://github.com/mysql/mysql-workbench/blob/8.0/modules/db.mysql.parser/src/mysql_parser_module.cpp#L274.

Also, when we visit the code, we need to get fragments of the macro (the text parts that define it), in other words, it does something, corresponding to the parseTree.
Ideally, we would like to be able to solely store each parsetree for each macro and use the visitor to access it - but it seems that the parsetree depends upon it's assigned parser, and likewise the input-stream.
It's the internal interconnectivity of Antlr4 which is not so clear to me.

I assume the true culprit of your high memory consumption is the DFA + ATN which are created during parsing. They are used to speed up further lookup of the same input (parts). If you input is constantly changing this can grow really large. There's no guarantee for a maximum size of these structures and, as mentioned before, you cannot just clean them or your parse times will be much higher.

Even more useful would be to be able to serialise the parsetree so that we can store (and reload) parsetrees without having to re-parse them. Is that something that ANTLR can do?

The parse tree is nothing in terms of memory usage. It represents the current input only and it makes no sense to save and restore it. What you probably mean is to store the DFA + ATN. However, that won't help with the memory consumption as they are built out of all so far seen input.

One of the elements of building the parsetree I don't understand is that the InputStream must be kept, and the CommonTokenStream, but the Lexer - not so much.

As mentioned above the parse tree contains nodes that reference tokens, which are managed by the token stream. And when you retrieve text the token stream will reach through to the char input stream. But these are all not really eating that much memory, unless you have a single 1 GB input string/file.

Mike
--
www.soft-gems.net

Mike Lischke

unread,

Sep 21, 2018, 10:23:29 AM9/21/18

to antlr-di...@googlegroups.com

>
> I must be missing something. If the antlr4::Parser were to be used just once across a file, there would be no purpose in maintaining statics for eg _decisionToDFA, _sharedContextCache, etc.
> Therefore, I got confused by what you were suggesting!

The generated DFA + ATN (configs) are static and shared among all lexers and parsers of the same type (i.e. for the same grammar). So, even if you delete your current parser and create a new one it will not start from scratch to build the internal cache, but use what has been created before.

Mike
--
www.soft-gems.net

Reply all

Reply to author

Forward

0 new messages